Blame - docs/CodeGenerator.html - fp2-dev/platform/external/llvm

blob: 50036f2ed73f94a9fce340e3900a75eb7ec06415 [file] [log] [blame]

Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	8
				9	<style type="text/css">
				10	.unknown { background-color: #C0C0C0; text-align: center; }
				11	.unknown:before { content: "?" }
				12	.no { background-color: #C11B17 }
				13	.no:before { content: "N" }
				14	.partial { background-color: #F88017 }
				15	.yes { background-color: #0F0; }
				16	.yes:before { content: "Y" }
				17	</style>
				18
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	19	</head>
				20	<body>
				21
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	22	<h1>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	23	The LLVM Target-Independent Code Generator
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	24	</h1>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	25
				26	<ol>
				27	<li><a href="#introduction">Introduction</a>
				28	<ul>
				29	<li><a href="#required">Required components in the code generator</a></li>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	30	<li><a href="#high-level-design">The high-level design of the code
				31	generator</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	32	<li><a href="#tablegen">Using TableGen for target description</a></li>
				33	</ul>
				34	</li>
				35	<li><a href="#targetdesc">Target description classes</a>
				36	<ul>
				37	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				38	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	39	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	40	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	41	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				42	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	43	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	44	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				45	</ul>
				46	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	47	<li><a href="#codegendesc">The "Machine" Code Generator classes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	48	<ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	49	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	50	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				51	class</a></li>
				52	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	53	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	54	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	55	<li><a href="#mc">The "MC" Layer</a>
				56	<ul>
				57	<li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
				58	<li><a href="#mccontext">The <tt>MCContext</tt> class</a>
				59	<li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
				60	<li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
				61	<li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
				62	</ul>
				63	</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	64	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	65	<ul>
				66	<li><a href="#instselect">Instruction Selection</a>
				67	<ul>
				68	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				69	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				70	Process</a></li>
				71	<li><a href="#selectiondag_build">Initial SelectionDAG
				72	Construction</a></li>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	73	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	74	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				75	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	76	Phase: the DAG Combiner</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	77	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	78	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	79	Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	80	<li><a href="#selectiondag_future">Future directions for the
				81	SelectionDAG</a></li>
				82	</ul></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	83	<li><a href="#liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	84	<ul>
				85	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	86	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	87	</ul></li>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	88	<li><a href="#regalloc">Register Allocation</a>
				89	<ul>
				90	<li><a href="#regAlloc_represent">How registers are represented in
				91	LLVM</a></li>
				92	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				93	registers</a></li>
				94	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				95	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				96	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				97	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				98	</ul></li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	99	<li><a href="#codeemit">Code Emission</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	100	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	101	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	102	<li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
				103
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	104	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	105	<ul>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	106	<li><a href="#targetfeatures">Target Feature Matrix</a></li>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	107	<li><a href="#tailcallopt">Tail call optimization</a></li>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	108	<li><a href="#sibcallopt">Sibling call optimization</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	109	<li><a href="#x86">The X86 backend</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	110	<li><a href="#ppc">The PowerPC backend</a>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	111	<ul>
				112	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				113	<li><a href="#ppc_frame">Frame Layout</a></li>
				114	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				115	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	116	</ul></li>
				117	</ul></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	118
				119	</ol>
				120
				121	<div class="doc_author">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	122	<p>Written by the LLVM Team.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	123	</div>
				124
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	125	<div class="doc_warning">
				126	<p>Warning: This is a work in progress.</p>
				127	</div>
				128
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	129	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	130	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	131	<a name="introduction">Introduction</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	132	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	133	<!-- *********************************************************************** -->
				134
				135	<div class="doc_text">
				136
				137	<p>The LLVM target-independent code generator is a framework that provides a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	138	suite of reusable components for translating the LLVM internal representation
				139	to the machine code for a specified target—either in assembly form
				140	(suitable for a static compiler) or in binary machine code format (usable for
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	141	a JIT compiler). The LLVM target-independent code generator consists of six
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	142	main components:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	143
				144	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	145	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				146	capture important properties about various aspects of the machine,
				147	independently of how they will be used. These interfaces are defined in
				148	<tt>include/llvm/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	149
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	150	<li>Classes used to represent the <a href="#codegendesc">code being
				151	generated</a> for a target. These classes are intended to be abstract
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	152	enough to represent the machine code for <i>any</i> target machine. These
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	153	classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
				154	concepts like "constant pool entries" and "jump tables" are explicitly
				155	exposed.</li>
				156
				157	<li>Classes and algorithms used to represent code as the object file level,
				158	the <a href="#mc">MC Layer</a>. These classes represent assembly level
				159	constructs like labels, sections, and instructions. At this level,
				160	concepts like "constant pool entries" and "jump tables" don't exist.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	161
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	162	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				163	various phases of native code generation (register allocation, scheduling,
				164	stack frame representation, etc). This code lives
				165	in <tt>lib/CodeGen/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	166
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	167	<li><a href="#targetimpls">Implementations of the abstract target description
				168	interfaces</a> for particular targets. These machine descriptions make
				169	use of the components provided by LLVM, and can optionally provide custom
				170	target-specific passes, to build complete code generators for a specific
				171	target. Target descriptions live in <tt>lib/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	172
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	173	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				174	completely target independent (it uses the <tt>TargetJITInfo</tt>
				175	structure to interface for target-specific issues. The code for the
				176	target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	177	</ol>
				178
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	179	<p>Depending on which part of the code generator you are interested in working
				180	on, different pieces of this will be useful to you. In any case, you should
				181	be familiar with the <a href="#targetdesc">target description</a>
				182	and <a href="#codegendesc">machine code representation</a> classes. If you
				183	want to add a backend for a new target, you will need
				184	to <a href="#targetimpls">implement the target description</a> classes for
				185	your new target and understand the <a href="LangRef.html">LLVM code
				186	representation</a>. If you are interested in implementing a
				187	new <a href="#codegenalgs">code generation algorithm</a>, it should only
				188	depend on the target-description and machine code representation classes,
				189	ensuring that it is portable.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	190
				191	</div>
				192
				193	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	194	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	195	<a name="required">Required components in the code generator</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	196	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	197
				198	<div class="doc_text">
				199
				200	<p>The two pieces of the LLVM code generator are the high-level interface to the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	201	code generator and the set of reusable components that can be used to build
				202	target-specific backends. The two most important interfaces
				203	(<a href="#targetmachine"><tt>TargetMachine</tt></a>
				204	and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				205	required to be defined for a backend to fit into the LLVM system, but the
				206	others must be defined if the reusable code generator components are going to
				207	be used.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	208
				209	<p>This design has two important implications. The first is that LLVM can
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	210	support completely non-traditional code generation targets. For example, the
				211	C backend does not require register allocation, instruction selection, or any
				212	of the other standard components provided by the system. As such, it only
				213	implements these two interfaces, and does its own thing. Another example of
				214	a code generator like this is a (purely hypothetical) backend that converts
				215	LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	216
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	217	<p>This design also implies that it is possible to design and implement
				218	radically different code generators in the LLVM system that do not make use
				219	of any of the built-in components. Doing so is not recommended at all, but
				220	could be required for radically different targets that do not fit into the
				221	LLVM machine description model: FPGAs for example.</p>
Chris Lattner	900bf8c	2004-06-02 07:06:06 +0000	[diff] [blame]	222
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	223	</div>
				224
				225	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	226	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	227	<a name="high-level-design">The high-level design of the code generator</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	228	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	229
				230	<div class="doc_text">
				231
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	232	<p>The LLVM target-independent code generator is designed to support efficient
				233	and quality code generation for standard register-based microprocessors.
				234	Code generation in this model is divided into the following stages:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	235
				236	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	237	<li><b><a href="#instselect">Instruction Selection</a></b> — This phase
				238	determines an efficient way to express the input LLVM code in the target
				239	instruction set. This stage produces the initial code for the program in
				240	the target instruction set, then makes use of virtual registers in SSA
				241	form and physical registers that represent any required register
				242	assignments due to target constraints or calling conventions. This step
				243	turns the LLVM code into a DAG of target instructions.</li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	244
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	245	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> —
				246	This phase takes the DAG of target instructions produced by the
				247	instruction selection phase, determines an ordering of the instructions,
				248	then emits the instructions
				249	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
				250	Note that we describe this in the <a href="#instselect">instruction
				251	selection section</a> because it operates on
				252	a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	253
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	254	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> —
				255	This optional stage consists of a series of machine-code optimizations
				256	that operate on the SSA-form produced by the instruction selector.
				257	Optimizations like modulo-scheduling or peephole optimization work
				258	here.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	259
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	260	<li><b><a href="#regalloc">Register Allocation</a></b> — The target code
				261	is transformed from an infinite virtual register file in SSA form to the
				262	concrete register file used by the target. This phase introduces spill
				263	code and eliminates all virtual register references from the program.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	264
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	265	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once
				266	the machine code has been generated for the function and the amount of
				267	stack space required is known (used for LLVM alloca's and spill slots),
				268	the prolog and epilog code for the function can be inserted and "abstract
				269	stack location references" can be eliminated. This stage is responsible
				270	for implementing optimizations like frame-pointer elimination and stack
				271	packing.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	272
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	273	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> —
				274	Optimizations that operate on "final" machine code can go here, such as
				275	spill code scheduling and peephole optimizations.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	276
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	277	<li><b><a href="#codeemit">Code Emission</a></b> — The final stage
				278	actually puts out the code for the current function, either in the target
				279	assembler format or in machine code.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	280	</ol>
				281
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	282	<p>The code generator is based on the assumption that the instruction selector
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	283	will use an optimal pattern matching selector to create high-quality
				284	sequences of native instructions. Alternative code generator designs based
				285	on pattern expansion and aggressive iterative peephole optimization are much
				286	slower. This design permits efficient compilation (important for JIT
				287	environments) and aggressive optimization (used when generating code offline)
				288	by allowing components of varying levels of sophistication to be used for any
				289	step of compilation.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	290
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	291	<p>In addition to these stages, target implementations can insert arbitrary
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	292	target-specific passes into the flow. For example, the X86 target uses a
				293	special pass to handle the 80x87 floating point stack architecture. Other
				294	targets with unusual requirements can be supported with custom passes as
				295	needed.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	296
				297	</div>
				298
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	299	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	300	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	301	<a name="tablegen">Using TableGen for target description</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	302	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	303
				304	<div class="doc_text">
				305
Chris Lattner	5489e93	2004-06-01 18:35:00 +0000	[diff] [blame]	306	<p>The target description classes require a detailed description of the target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	307	architecture. These target descriptions often have a large amount of common
				308	information (e.g., an <tt>add</tt> instruction is almost identical to a
				309	<tt>sub</tt> instruction). In order to allow the maximum amount of
				310	commonality to be factored out, the LLVM code generator uses
				311	the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
				312	chunks of the target machine, which allows the use of domain-specific and
				313	target-specific abstractions to reduce the amount of repetition.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	314
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	315	<p>As LLVM continues to be developed and refined, we plan to move more and more
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	316	of the target description to the <tt>.td</tt> form. Doing so gives us a
				317	number of advantages. The most important is that it makes it easier to port
				318	LLVM because it reduces the amount of C++ code that has to be written, and
				319	the surface area of the code generator that needs to be understood before
				320	someone can get something working. Second, it makes it easier to change
				321	things. In particular, if tables and other things are all emitted
				322	by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
				323	update all of the targets to a new interface.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	324
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	325	</div>
				326
				327	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	328	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	329	<a name="targetdesc">Target description classes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	330	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	331	<!-- *********************************************************************** -->
				332
				333	<div class="doc_text">
				334
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	335	<p>The LLVM target description classes (located in the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	336	<tt>include/llvm/Target</tt> directory) provide an abstract description of
				337	the target machine independent of any particular client. These classes are
				338	designed to capture the <i>abstract</i> properties of the target (such as the
				339	instructions and registers it has), and do not incorporate any particular
				340	pieces of code generation algorithms.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	341
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	342	<p>All of the target description classes (except the
				343	<tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
				344	subclassed by the concrete target implementation, and have virtual methods
				345	implemented. To get to these implementations, the
				346	<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
				347	that should be implemented by the target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	348
				349	</div>
				350
				351	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	352	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	353	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	354	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	355
				356	<div class="doc_text">
				357
				358	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	359	access the target-specific implementations of the various target description
				360	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				361	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				362	designed to be specialized by a concrete target implementation
				363	(e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
				364	methods. The only required target description class is
				365	the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
				366	generator components are to be used, the other interfaces should be
				367	implemented as well.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	368
				369	</div>
				370
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	371	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	372	<h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	373	<a name="targetdata">The <tt>TargetData</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	374	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	375
				376	<div class="doc_text">
				377
				378	<p>The <tt>TargetData</tt> class is the only required target description class,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	379	and it is the only class that is not extensible (you cannot derived a new
				380	class from it). <tt>TargetData</tt> specifies information about how the
				381	target lays out memory for structures, the alignment requirements for various
				382	data types, the size of pointers in the target, and whether the target is
				383	little-endian or big-endian.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	384
				385	</div>
				386
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	387	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	388	<h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	389	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	390	</h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	391
				392	<div class="doc_text">
				393
				394	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	395	selectors primarily to describe how LLVM code should be lowered to
				396	SelectionDAG operations. Among other things, this class indicates:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	397
				398	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	399	<li>an initial register class to use for various <tt>ValueType</tt>s,</li>
				400
				401	<li>which operations are natively supported by the target machine,</li>
				402
				403	<li>the return type of <tt>setcc</tt> operations,</li>
				404
				405	<li>the type to use for shift amounts, and</li>
				406
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	407	<li>various high-level characteristics, like whether it is profitable to turn
				408	division by a constant into a multiplication sequence</li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	409	</ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	410
				411	</div>
				412
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	413	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	414	<h3>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	415	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	416	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	417
				418	<div class="doc_text">
				419
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	420	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
				421	of the target and any interactions between the registers.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	422
				423	<p>Registers in the code generator are represented in the code generator by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	424	unsigned integers. Physical registers (those that actually exist in the
				425	target description) are unique small numbers, and virtual registers are
				426	generally large. Note that register #0 is reserved as a flag value.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	427
				428	<p>Each register in the processor description has an associated
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	429	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				430	register (used for assembly output and debugging dumps) and a set of aliases
				431	(used to indicate whether one register overlaps with another).</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	432
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	433	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	434	class exposes a set of processor specific register classes (instances of the
				435	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				436	registers that have the same properties (for example, they are all 32-bit
				437	integer registers). Each SSA virtual register created by the instruction
				438	selector has an associated register class. When the register allocator runs,
				439	it replaces virtual registers with a physical register in the set.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	440
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	441	<p>The target-specific implementations of these classes is auto-generated from
				442	a <a href="TableGenFundamentals.html">TableGen</a> description of the
				443	register file.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	444
				445	</div>
				446
				447	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	448	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	449	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	450	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	451
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	452	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	453
				454	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				455	instructions supported by the target. It is essentially an array of
				456	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				457	instruction the target supports. Descriptors define things like the mnemonic
				458	for the opcode, the number of operands, the list of implicit register uses
				459	and defs, whether the instruction has certain target-independent properties
				460	(accesses memory, is commutable, etc), and holds any target-specific
				461	flags.</p>
				462
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	463	</div>
				464
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	465	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	466	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	467	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	468	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	469
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	470	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	471
				472	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				473	stack frame layout of the target. It holds the direction of stack growth, the
				474	known stack alignment on entry to each function, and the offset to the local
				475	area. The offset to the local area is the offset from the stack pointer on
				476	function entry to the first location where function data (local variables,
				477	spill locations) can be stored.</p>
				478
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	479	</div>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	480
				481	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	482	<h3>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	483	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	484	</h3>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	485
				486	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	487
				488	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				489	specific chip set being targeted. A sub-target informs code generation of
				490	which instructions are supported, instruction latencies and instruction
				491	execution itinerary; i.e., which processing units are used, in what order,
				492	and for how long.</p>
				493
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	494	</div>
				495
				496
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	497	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	498	<h3>
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	499	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	500	</h3>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	501
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	502	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	503
				504	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				505	Just-In-Time code generator to perform target-specific activities, such as
				506	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				507	should provide one of these objects through the <tt>getJITInfo</tt>
				508	method.</p>
				509
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	510	</div>
				511
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	512	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	513	<h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	514	<a name="codegendesc">Machine code description classes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	515	</h2>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	516	<!-- *********************************************************************** -->
				517
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	518	<div class="doc_text">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	519
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	520	<p>At the high-level, LLVM code is translated to a machine specific
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	521	representation formed out of
				522	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				523	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
				524	and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
				525	in <tt>include/llvm/CodeGen</tt>). This representation is completely target
				526	agnostic, representing instructions in their most abstract form: an opcode
				527	and a series of operands. This representation is designed to support both an
				528	SSA representation for machine code, as well as a register allocated, non-SSA
				529	form.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	530
				531	</div>
				532
				533	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	534	<h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	535	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	536	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	537
				538	<div class="doc_text">
				539
				540	<p>Target machine instructions are represented as instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	541	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				542	representing machine instructions. In particular, it only keeps track of an
				543	opcode number and a set of operands.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	544
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	545	<p>The opcode number is a simple unsigned integer that only has meaning to a
				546	specific backend. All of the instructions for a target should be defined in
				547	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
				548	auto-generated from this description. The <tt>MachineInstr</tt> class does
				549	not have any information about how to interpret the instruction (i.e., what
				550	the semantics of the instruction are); for that you must refer to the
				551	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	552
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	553	<p>The operands of a machine instruction can be of several different types: a
				554	register reference, a constant integer, a basic block reference, etc. In
				555	addition, a machine operand should be marked as a def or a use of the value
				556	(though only registers are allowed to be defs).</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	557
				558	<p>By convention, the LLVM code generator orders instruction operands so that
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	559	all register definitions come before the register uses, even on architectures
				560	that are normally printed in other orders. For example, the SPARC add
				561	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				562	and stores the result into the "%i3" register. In the LLVM code generator,
				563	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
				564	destination first.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	565
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	566	<p>Keeping destination (definition) operands at the beginning of the operand
				567	list has several advantages. In particular, the debugging printer will print
				568	the instruction like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	569
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	570	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	571	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	572	%r3 = add %i1, %i2
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	573	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	574	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	575
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	576	<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
				577	instructions</a> whose only def is the first operand.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	578
				579	</div>
				580
				581	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	582	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	583	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	584	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	585
				586	<div class="doc_text">
				587
				588	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	589	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				590	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				591	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	592
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	593	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	594	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	595	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				596	// instruction. The '1' specifies how many operands will be added.
				597	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	598
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	599	// Create the same instr, but insert it at the end of a basic block.
				600	MachineBasicBlock &MBB = ...
				601	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	602
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	603	// Create the same instr, but insert it before a specified iterator point.
				604	MachineBasicBlock::iterator MBBI = ...
				605	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	606
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	607	// Create a 'cmp Reg, 0' instruction, no destination reg.
				608	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				609	// Create an 'sahf' instruction which takes no operands and stores nothing.
				610	MI = BuildMI(X86::SAHF, 0);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	611
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	612	// Create a self looping branch instruction.
				613	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	614	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	615	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	616
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	617	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	618	have to specify the number of operands that the machine instruction will
				619	take. This allows for efficient memory allocation. You also need to specify
				620	if operands default to be uses of values, not definitions. If you need to
				621	add a definition operand (other than the optional destination register), you
				622	must explicitly mark it as such:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	623
				624	<div class="doc_code">
				625	<pre>
Bill Wendling	587daed	2009-05-13 21:33:08 +0000	[diff] [blame]	626	MI.addReg(Reg, RegState::Define);
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	627	</pre>
				628	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	629
				630	</div>
				631
				632	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	633	<h4>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	634	<a name="fixedregs">Fixed (preassigned) registers</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	635	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	636
				637	<div class="doc_text">
				638
				639	<p>One important issue that the code generator needs to be aware of is the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	640	presence of fixed registers. In particular, there are often places in the
				641	instruction stream where the register allocator <em>must</em> arrange for a
				642	particular value to be in a particular register. This can occur due to
				643	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				644	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
				645	calling conventions. In any case, the instruction selector should emit code
				646	that copies a virtual register into or out of a physical register when
				647	needed.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	648
				649	<p>For example, consider this simple LLVM example:</p>
				650
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	651	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	652	<pre>
Matthijs Kooijman	61399af	2008-06-04 15:46:35 +0000	[diff] [blame]	653	define i32 @test(i32 %X, i32 %Y) {
				654	%Z = udiv i32 %X, %Y
				655	ret i32 %Z
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	656	}
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	657	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	658	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	659
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	660	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	661	and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
				662	get this):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	663
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	664	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	665	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	666	;; Start of div
				667	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				668	%reg1027 = sar %reg1024, 31
				669	%EDX = mov %reg1027 ;; Sign extend X into EDX
				670	idiv %reg1025 ;; Divide by Y (in reg1025)
				671	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	672
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	673	;; Start of ret
				674	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				675	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	676	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	677	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	678
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	679	<p>By the end of code generation, the register allocator has coalesced the
				680	registers and deleted the resultant identity moves producing the following
				681	code:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	682
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	683	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	684	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	685	;; X is in EAX, Y is in ECX
				686	mov %EAX, %EDX
				687	sar %EDX, 31
				688	idiv %ECX
				689	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	690	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	691	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	692
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	693	<p>This approach is extremely general (if it can handle the X86 architecture, it
				694	can handle anything!) and allows all of the target specific knowledge about
				695	the instruction stream to be isolated in the instruction selector. Note that
				696	physical registers should have a short lifetime for good code generation, and
				697	all physical registers are assumed dead on entry to and exit from basic
				698	blocks (before register allocation). Thus, if you need a value to be live
				699	across basic block boundaries, it <em>must</em> live in a virtual
				700	register.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	701
				702	</div>
				703
				704	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	705	<h4>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	706	<a name="ssa">Machine code in SSA form</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	707	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	708
				709	<div class="doc_text">
				710
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	711	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
				712	maintained in SSA-form until register allocation happens. For the most part,
				713	this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				714	become machine code PHI nodes, and virtual registers are only allowed to have
				715	a single definition.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	716
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	717	<p>After register allocation, machine code is no longer in SSA-form because
				718	there are no virtual registers left in the code.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	719
				720	</div>
				721
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	722	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	723	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	724	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	725	</h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	726
				727	<div class="doc_text">
				728
				729	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	730	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				731	corresponds to the LLVM code input to the instruction selector, but there can
				732	be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
				733	machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
				734	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				735	comes from.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	736
				737	</div>
				738
				739	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	740	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	741	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	742	</h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	743
				744	<div class="doc_text">
				745
				746	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	747	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				748	corresponds one-to-one with the LLVM function input to the instruction
				749	selector. In addition to a list of basic blocks,
				750	the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
				751	a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
				752	<tt>MachineRegisterInfo</tt>. See
				753	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	754
				755	</div>
				756
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	757
				758	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	759	<h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	760	<a name="mc">The "MC" Layer</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	761	</h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	762	<!-- *********************************************************************** -->
				763
				764	<div class="doc_text">
				765
				766	<p>
				767	The MC Layer is used to represent and process code at the raw machine code
				768	level, devoid of "high level" information like "constant pools", "jump tables",
				769	"global variables" or anything like that. At this level, LLVM handles things
				770	like label names, machine instructions, and sections in the object file. The
				771	code in this layer is used for a number of important purposes: the tail end of
				772	the code generator uses it to write a .s or .o file, and it is also used by the
Jay Foad	d61895a	2011-04-13 13:03:56 +0000	[diff] [blame]	773	llvm-mc tool to implement standalone machine code assemblers and disassemblers.
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	774	</p>
				775
				776	<p>
				777	This section describes some of the important classes. There are also a number
				778	of important subsystems that interact at this layer, they are described later
				779	in this manual.
				780	</p>
				781
				782	</div>
				783
				784
				785	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	786	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	787	<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	788	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	789
				790	<div class="doc_text">
				791
				792	<p>
				793	MCStreamer is best thought of as an assembler API. It is an abstract API which
				794	is <em>implemented</em> in different ways (e.g. to output a .s file, output an
				795	ELF .o file, etc) but whose API correspond directly to what you see in a .s
				796	file. MCStreamer has one method per directive, such as EmitLabel,
				797	EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
				798	directly correspond to assembly level directives. It also has an
				799	EmitInstruction method, which is used to output an MCInst to the streamer.
				800	</p>
				801
				802	<p>
				803	This API is most important for two clients: the llvm-mc stand-alone assembler is
				804	effectively a parser that parses a line, then invokes a method on MCStreamer. In
				805	the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
				806	generator lowers higher level LLVM IR and Machine* constructs down to the MC
				807	layer, emitting directives through MCStreamer.</p>
				808
				809	<p>
				810	On the implementation side of MCStreamer, there are two major implementations:
				811	one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
				812	file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
				813	that prints out a directive for each method (e.g. EmitValue -> .byte), but
				814	MCObjectStreamer implements a full assembler.
				815	</p>
				816
				817	</div>
				818
				819	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	820	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	821	<a name="mccontext">The <tt>MCContext</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	822	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	823
				824	<div class="doc_text">
				825
				826	<p>
				827	The MCContext class is the owner of a variety of uniqued data structures at the
				828	MC layer, including symbols, sections, etc. As such, this is the class that you
				829	interact with to create symbols and sections. This class can not be subclassed.
				830	</p>
				831
				832	</div>
				833
				834	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	835	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	836	<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	837	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	838
				839	<div class="doc_text">
				840
				841	<p>
				842	The MCSymbol class represents a symbol (aka label) in the assembly file. There
				843	are two interesting kinds of symbols: assembler temporary symbols, and normal
				844	symbols. Assembler temporary symbols are used and processed by the assembler
				845	but are discarded when the object file is produced. The distinction is usually
				846	represented by adding a prefix to the label, for example "L" labels are
				847	assembler temporary labels in MachO.
				848	</p>
				849
				850	<p>MCSymbols are created by MCContext and uniqued there. This means that
				851	MCSymbols can be compared for pointer equivalence to find out if they are the
				852	same symbol. Note that pointer inequality does not guarantee the labels will
				853	end up at different addresses though. It's perfectly legal to output something
				854	like this to the .s file:<p>
				855
				856	<pre>
				857	foo:
				858	bar:
				859	.byte 4
				860	</pre>
				861
				862	<p>In this case, both the foo and bar symbols will have the same address.</p>
				863
				864	</div>
				865
				866	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	867	<h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	868	<a name="mcsection">The <tt>MCSection</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	869	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	870
				871	<div class="doc_text">
				872
				873	<p>
				874	The MCSection class represents an object-file specific section. It is subclassed
				875	by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
				876	<tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
				877	by MCContext. The MCStreamer has a notion of the current section, which can be
				878	changed with the SwitchToSection method (which corresponds to a ".section"
				879	directive in a .s file).
				880	</p>
				881
				882	</div>
				883
				884	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	885	<h3>
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	886	<a name="mcinst">The <tt>MCInst</tt> class</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	887	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	888
				889	<div class="doc_text">
				890
				891	<p>
				892	The MCInst class is a target-independent representation of an instruction. It
				893	is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
				894	that holds a target-specific opcode and a vector of MCOperands. MCOperand, in
				895	turn, is a simple discriminated union of three cases: 1) a simple immediate,
				896	2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
				897	MCExpr.
				898	</p>
				899
				900	<p>MCInst is the common currency used to represent machine instructions at the
				901	MC layer. It is the type used by the instruction encoder, the instruction
				902	printer, and the type generated by the assembly parser and disassembler.
				903	</p>
				904
				905	</div>
				906
				907
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	908	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	909	<h2>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	910	<a name="codegenalgs">Target-independent code generation algorithms</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	911	</h2>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	912	<!-- *********************************************************************** -->
				913
				914	<div class="doc_text">
				915
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	916	<p>This section documents the phases described in the
				917	<a href="#high-level-design">high-level design of the code generator</a>.
				918	It explains how they work and some of the rationale behind their design.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	919
				920	</div>
				921
				922	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	923	<h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	924	<a name="instselect">Instruction Selection</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	925	</h3>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	926
				927	<div class="doc_text">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	928
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	929	<p>Instruction Selection is the process of translating LLVM code presented to
				930	the code generator into target-specific machine instructions. There are
				931	several well-known ways to do this in the literature. LLVM uses a
				932	SelectionDAG based instruction selector.</p>
				933
				934	<p>Portions of the DAG instruction selector are generated from the target
				935	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				936	selector to be generated from these <tt>.td</tt> files, though currently
				937	there are still things that require custom C++ code.</p>
				938
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	939	</div>
				940
				941	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	942	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	943	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	944	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	945
				946	<div class="doc_text">
				947
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	948	<p>The SelectionDAG provides an abstraction for code representation in a way
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	949	that is amenable to instruction selection using automatic techniques
				950	(e.g. dynamic-programming based optimal pattern matching selectors). It is
				951	also well-suited to other phases of code generation; in particular,
				952	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				953	post-selection). Additionally, the SelectionDAG provides a host
				954	representation where a large variety of very-low-level (but
				955	target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
				956	performed; ones which require extensive information about the instructions
				957	efficiently supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	958
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	959	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	960	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				961	operation code (Opcode) that indicates what operation the node performs and
				962	the operands to the operation. The various operation node types are
				963	described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
				964	file.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	965
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	966	<p>Although most operations define a single value, each node in the graph may
				967	define multiple values. For example, a combined div/rem operation will
				968	define both the dividend and the remainder. Many other situations require
				969	multiple values as well. Each node also has some number of operands, which
				970	are edges to the node defining the used value. Because nodes may define
				971	multiple values, edges are represented by instances of the <tt>SDValue</tt>
				972	class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node
				973	and result value being used, respectively. Each value produced by
				974	an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
				975	indicating what the type of the value is.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	976
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	977	<p>SelectionDAGs contain two different kinds of values: those that represent
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	978	data flow and those that represent control flow dependencies. Data values
				979	are simple edges with an integer or floating point value type. Control edges
				980	are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
				981	These edges provide an ordering between nodes that have side effects (such as
				982	loads, stores, calls, returns, etc). All nodes that have side effects should
				983	take a token chain as input and produce a new one as output. By convention,
				984	token chain inputs are always operand #0, and chain results are always the
				985	last value produced by an operation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	986
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	987	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	988	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root
				989	node is the final side-effecting node in the token chain. For example, in a
				990	single basic block function it would be the return node.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	991
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	992	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	993	"illegal" DAG. A legal DAG for a target is one that only uses supported
				994	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				995	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
				996	uses a SREM or UREM operation. The
				997	<a href="#selectinodag_legalize_types">legalize types</a> and
				998	<a href="#selectiondag_legalize">legalize operations</a> phases are
				999	responsible for turning an illegal DAG into a legal DAG.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1000
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1001	</div>
				1002
				1003	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1004	<h4>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1005	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1006	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1007
				1008	<div class="doc_text">
				1009
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1010	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1011
				1012	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1013	<li><a href="#selectiondag_build">Build initial DAG</a> — This stage
				1014	performs a simple translation from the input LLVM code to an illegal
				1015	SelectionDAG.</li>
				1016
				1017	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This
				1018	stage performs simple optimizations on the SelectionDAG to simplify it,
				1019	and recognize meta instructions (like rotates
				1020	and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
				1021	operations. This makes the resultant code more efficient and
				1022	the <a href="#selectiondag_select">select instructions from DAG</a> phase
				1023	(below) simpler.</li>
				1024
				1025	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
				1026	— This stage transforms SelectionDAG nodes to eliminate any types
				1027	that are unsupported on the target.</li>
				1028
				1029	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1030	SelectionDAG optimizer is run to clean up redundancies exposed by type
				1031	legalization.</li>
				1032
Chris Lattner	7138863	2010-12-12 02:42:57 +0000	[diff] [blame]	1033	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Ops</a> —
Chris Lattner	4c247f6	2010-12-13 00:17:12 +0000	[diff] [blame]	1034	This stage transforms SelectionDAG nodes to eliminate any operations
				1035	that are unsupported on the target.</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1036
				1037	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1038	SelectionDAG optimizer is run to eliminate inefficiencies introduced by
				1039	operation legalization.</li>
				1040
				1041	<li><a href="#selectiondag_select">Select instructions from DAG</a> —
				1042	Finally, the target instruction selector matches the DAG operations to
				1043	target instructions. This process translates the target-independent input
				1044	DAG into another DAG of target instructions.</li>
				1045
				1046	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				1047	— The last phase assigns a linear order to the instructions in the
				1048	target-instruction DAG and emits them into the MachineFunction being
				1049	compiled. This step uses traditional prepass scheduling techniques.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1050	</ol>
				1051
				1052	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1053	rest of the code generation passes are run.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1054
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1055	<p>One great way to visualize what is going on here is to take advantage of a
				1056	few LLC command line options. The following options pop up a window
				1057	displaying the SelectionDAG at specific times (if you only get errors printed
				1058	to the console while using this, you probably
				1059	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
				1060	to add support for it).</p>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1061
				1062	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1063	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
				1064	before the first optimization pass.</li>
				1065
				1066	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				1067
				1068	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				1069	optimization pass.</li>
				1070
				1071	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				1072
				1073	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1074	</ul>
				1075
				1076	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1077	This graph is based on the final SelectionDAG, with nodes that must be
				1078	scheduled together bundled into a single scheduling-unit node, and with
				1079	immediate operands and other nodes that aren't relevant for scheduling
				1080	omitted.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1081
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1082	</div>
				1083
				1084	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1085	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1086	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1087	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1088
				1089	<div class="doc_text">
				1090
Bill Wendling	1644877	2006-08-28 03:04:05 +0000	[diff] [blame]	1091	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1092	input by the <tt>SelectionDAGLowering</tt> class in the
				1093	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of
				1094	this pass is to expose as much low-level, target-specific details to the
				1095	SelectionDAG as possible. This pass is mostly hard-coded (e.g. an
				1096	LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
				1097	<tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
				1098	requires target-specific hooks to lower calls, returns, varargs, etc. For
				1099	these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
				1100	interface is used.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1101
				1102	</div>
				1103
				1104	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1105	<h4>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1106	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1107	</h4>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1108
				1109	<div class="doc_text">
				1110
				1111	<p>The Legalize phase is in charge of converting a DAG to only use the types
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1112	that are natively supported by the target.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1113
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1114	<p>There are two main ways of converting values of unsupported scalar types to
				1115	values of supported types: converting small types to larger types
				1116	("promoting"), and breaking up large integer types into smaller ones
				1117	("expanding"). For example, a target might require that all f32 values are
				1118	promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same
				1119	target might require that all i64 values be expanded into pairs of i32
				1120	values. These changes can insert sign and zero extensions as needed to make
				1121	sure that the final code has the same behavior as the input.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1122
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1123	<p>There are two main ways of converting values of unsupported vector types to
				1124	value of supported types: splitting vector types, multiple times if
				1125	necessary, until a legal type is found, and extending vector types by adding
				1126	elements to the end to round them out to legal types ("widening"). If a
				1127	vector gets split all the way down to single-element parts with no supported
				1128	vector type being found, the elements are converted to scalars
				1129	("scalarizing").</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1130
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1131	<p>A target implementation tells the legalizer which types are supported (and
				1132	which register class to use for them) by calling the
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1133	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				1134
				1135	</div>
				1136
				1137	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1138	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1139	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1140	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1141
				1142	<div class="doc_text">
				1143
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1144	<p>The Legalize phase is in charge of converting a DAG to only use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1145	operations that are natively supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1146
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1147	<p>Targets often have weird constraints, such as not supporting every operation
				1148	on every supported datatype (e.g. X86 does not support byte conditional moves
				1149	and PowerPC does not support sign-extending loads from a 16-bit memory
				1150	location). Legalize takes care of this by open-coding another sequence of
				1151	operations to emulate the operation ("expansion"), by promoting one type to a
				1152	larger type that supports the operation ("promotion"), or by using a
				1153	target-specific hook to implement the legalization ("custom").</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1154
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1155	<p>A target implementation tells the legalizer which operations are not
				1156	supported (and which of the above three actions to take) by calling the
				1157	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				1158	constructor.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1159
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1160	<p>Prior to the existence of the Legalize passes, we required that every target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1161	<a href="#selectiondag_optimize">selector</a> supported and handled every
				1162	operator and type even if they are not natively supported. The introduction
				1163	of the Legalize phases allows all of the canonicalization patterns to be
				1164	shared across targets, and makes it very easy to optimize the canonicalized
				1165	code because it is still in the form of a DAG.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1166
				1167	</div>
				1168
				1169	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1170	<h4>
				1171	<a name="selectiondag_optimize">
				1172	SelectionDAG Optimization Phase: the DAG Combiner
				1173	</a>
				1174	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1175
				1176	<div class="doc_text">
				1177
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1178	<p>The SelectionDAG optimization phase is run multiple times for code
				1179	generation, immediately after the DAG is built and once after each
				1180	legalization. The first run of the pass allows the initial code to be
				1181	cleaned up (e.g. performing optimizations that depend on knowing that the
				1182	operators have restricted type inputs). Subsequent runs of the pass clean up
				1183	the messy code generated by the Legalize passes, which allows Legalize to be
				1184	very simple (it can focus on making code legal instead of focusing on
				1185	generating <em>good</em> and legal code).</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1186
				1187	<p>One important class of optimizations performed is optimizing inserted sign
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1188	and zero extension instructions. We currently use ad-hoc techniques, but
				1189	could move to more rigorous techniques in the future. Here are some good
				1190	papers on the subject:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1191
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1192	<p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1193	integer arithmetic</a>"<br>
				1194	Kevin Redwine and Norman Ramsey<br>
				1195	International Conference on Compiler Construction (CC) 2004</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1196
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1197	<p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1198	sign extension elimination</a>"<br>
				1199	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1200	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1201	and Implementation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1202
				1203	</div>
				1204
				1205	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1206	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1207	<a name="selectiondag_select">SelectionDAG Select Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1208	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1209
				1210	<div class="doc_text">
				1211
				1212	<p>The Select phase is the bulk of the target-specific code for instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1213	selection. This phase takes a legal SelectionDAG as input, pattern matches
				1214	the instructions supported by the target to this DAG, and produces a new DAG
				1215	of target code. For example, consider the following LLVM fragment:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1216
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1217	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1218	<pre>
Dan Gohman	a9445e1	2010-03-02 01:11:08 +0000	[diff] [blame]	1219	%t1 = fadd float %W, %X
				1220	%t2 = fmul float %t1, %Y
				1221	%t3 = fadd float %t2, %Z
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1222	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1223	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1224
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1225	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1226	this:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1227
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1228	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1229	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1230	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1231	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1232	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1233
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1234	<p>If a target supports floating point multiply-and-add (FMA) operations, one of
				1235	the adds can be merged with the multiply. On the PowerPC, for example, the
				1236	output of the instruction selector might look like this DAG:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1237
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1238	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1239	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1240	(FMADDS (FADDS W, X), Y, Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1241	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1242	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1243
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1244	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1245	first two operands and adds the third (as single-precision floating-point
				1246	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1247	add instruction. To perform this pattern match, the PowerPC backend includes
				1248	the following instruction definitions:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1249
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1250	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1251	<pre>
				1252	def FMADDS : AForm_1<59, 29,
				1253	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1254	"fmadds $FRT, $FRA, $FRC, $FRB",
				1255	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1256	F4RC:$FRB))</b>]>;
				1257	def FADDS : AForm_2<59, 21,
				1258	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1259	"fadds $FRT, $FRA, $FRB",
				1260	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1261	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1262	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1263
				1264	<p>The portion of the instruction definition in bold indicates the pattern used
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1265	to match the instruction. The DAG operators
				1266	(like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
Dan Gohman	6a4824c	2010-03-25 00:03:04 +0000	[diff] [blame]	1267	the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. "
				1268	<tt>F4RC</tt>" is the register class of the input and result values.</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1269
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1270	<p>The TableGen DAG instruction selector generator reads the instruction
				1271	patterns in the <tt>.td</tt> file and automatically builds parts of the
				1272	pattern matching code for your target. It has the following strengths:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1273
				1274	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1275	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1276	you if your patterns make sense or not.</li>
				1277
				1278	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1279	particular, it is straight-forward to say things like "match any immediate
				1280	that is a 13-bit sign-extended value". For examples, see the
				1281	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1282	backend.</li>
				1283
				1284	<li>It knows several important identities for the patterns defined. For
				1285	example, it knows that addition is commutative, so it allows the
				1286	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1287	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1288	to specially handle this case.</li>
				1289
				1290	<li>It has a full-featured type-inferencing system. In particular, you should
				1291	rarely have to explicitly tell the system what type parts of your patterns
				1292	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1293	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
				1294	It was able to infer and propagate this knowledge from the fact that
				1295	<tt>F4RC</tt> has type 'f32'.</li>
				1296
				1297	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1298	Pattern fragments are chunks of reusable patterns that get inlined into
				1299	your patterns during compiler-compiler time. For example, the integer
				1300	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment
				1301	that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
				1302	have a native '<tt>not</tt>' operation. Targets can define their own
				1303	short-hand fragments as they see fit. See the definition of
				1304	'<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
				1305
				1306	<li>In addition to instructions, targets can specify arbitrary patterns that
				1307	map to one or more instructions using the 'Pat' class. For example, the
				1308	PowerPC has no way to load an arbitrary integer immediate into a register
				1309	in one instruction. To tell tblgen how to do this, it defines:
				1310	<br>
				1311	<br>
				1312	<div class="doc_code">
				1313	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1314	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1315	def : Pat<(i32 imm:$imm),
				1316	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1317	</pre>
				1318	</div>
				1319	<br>
				1320	If none of the single-instruction patterns for loading an immediate into a
				1321	register match, this will be used. This rule says "match an arbitrary i32
				1322	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
				1323	an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
				1324	the left 16 bits') instruction". To make this work, the
				1325	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
				1326	the input immediate (in this case, take the high or low 16-bits of the
				1327	immediate).</li>
				1328
				1329	<li>While the system does automate a lot, it still allows you to write custom
				1330	C++ code to match special cases if there is something that is hard to
				1331	express.</li>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1332	</ul>
				1333
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1334	<p>While it has many strengths, the system currently has some limitations,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1335	primarily because it is a work in progress and is not yet finished:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1336
				1337	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1338	<li>Overall, there is no way to define or match SelectionDAG nodes that define
Dan Gohman	e370c80	2009-04-22 15:55:31 +0000	[diff] [blame]	1339	multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1340	etc). This is the biggest reason that you currently still <em>have
				1341	to</em> write custom C++ code for your instruction selector.</li>
				1342
				1343	<li>There is no great way to support matching complex addressing modes yet.
				1344	In the future, we will extend pattern fragments to allow them to define
				1345	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1346	addressing mode</a>, which are currently matched with custom C++ code).
				1347	In addition, we'll extend fragments so that a fragment can match multiple
				1348	different patterns.</li>
				1349
				1350	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1351
				1352	<li>We don't automatically generate the set of supported registers and
				1353	operations for the <a href="#selectiondag_legalize">Legalizer</a>
				1354	yet.</li>
				1355
				1356	<li>We don't have a way of tying in custom legalized nodes yet.</li>
Chris Lattner	7d6915c	2005-10-17 04:18:41 +0000	[diff] [blame]	1357	</ul>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1358
				1359	<p>Despite these limitations, the instruction selector generator is still quite
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1360	useful for most of the binary and logical operations in typical instruction
				1361	sets. If you run into any problems or can't figure out how to do something,
				1362	please let Chris know!</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1363
				1364	</div>
				1365
				1366	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1367	<h4>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1368	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1369	</h4>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1370
				1371	<div class="doc_text">
				1372
				1373	<p>The scheduling phase takes the DAG of target instructions from the selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1374	phase and assigns an order. The scheduler can pick an order depending on
				1375	various constraints of the machines (i.e. order for minimal register pressure
				1376	or try to cover instruction latencies). Once an order is established, the
				1377	DAG is converted to a list
				1378	of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
				1379	destroyed.</p>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1380
Jeff Cohen	0b81cda	2005-10-24 16:54:55 +0000	[diff] [blame]	1381	<p>Note that this phase is logically separate from the instruction selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1382	phase, but is tied to it closely in the code because it operates on
				1383	SelectionDAGs.</p>
Chris Lattner	c38959f	2005-10-17 03:09:31 +0000	[diff] [blame]	1384
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1385	</div>
				1386
				1387	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1388	<h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1389	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1390	</h4>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1391
				1392	<div class="doc_text">
				1393
				1394	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1395	<li>Optional function-at-a-time selection.</li>
				1396
				1397	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1398	</ol>
				1399
				1400	</div>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1401
				1402	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1403	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1404	<a name="ssamco">SSA-based Machine Code Optimizations</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1405	</h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1406	<div class="doc_text"><p>To Be Written</p></div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1407
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1408	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1409	<h3>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1410	<a name="liveintervals">Live Intervals</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1411	</h3>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1412
				1413	<div class="doc_text">
				1414
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1415	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1416	They are used by some <a href="#regalloc">register allocator</a> passes to
				1417	determine if two or more virtual registers which require the same physical
				1418	register are live at the same point in the program (i.e., they conflict).
				1419	When this situation occurs, one virtual register must be <i>spilled</i>.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1420
				1421	</div>
				1422
				1423	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1424	<h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1425	<a name="livevariable_analysis">Live Variable Analysis</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1426	</h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1427
				1428	<div class="doc_text">
				1429
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1430	<p>The first step in determining the live intervals of variables is to calculate
				1431	the set of registers that are immediately dead after the instruction (i.e.,
				1432	the instruction calculates the value, but it is never used) and the set of
				1433	registers that are used by the instruction, but are never used after the
				1434	instruction (i.e., they are killed). Live variable information is computed
				1435	for each <i>virtual</i> register and <i>register allocatable</i> physical
				1436	register in the function. This is done in a very efficient manner because it
				1437	uses SSA to sparsely compute lifetime information for virtual registers
				1438	(which are in SSA form) and only has to track physical registers within a
				1439	block. Before register allocation, LLVM can assume that physical registers
				1440	are only live within a single basic block. This allows it to do a single,
				1441	local analysis to resolve physical register lifetimes within each basic
				1442	block. If a physical register is not register allocatable (e.g., a stack
				1443	pointer or condition codes), it is not tracked.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1444
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1445	<p>Physical registers may be live in to or out of a function. Live in values are
				1446	typically arguments in registers. Live out values are typically return values
				1447	in registers. Live in values are marked as such, and are given a dummy
				1448	"defining" instruction during live intervals analysis. If the last basic
				1449	block of a function is a <tt>return</tt>, then it's marked as using all live
				1450	out values in the function.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1451
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1452	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
				1453	the live variable information from a depth first traversal of the CFG of the
				1454	function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1455	node is defined before it's used. When a <tt>PHI</tt> node is encountered,
				1456	only the definition is handled, because the uses will be handled in other
				1457	basic blocks.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1458
				1459	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1460	assignment at the end of the current basic block and traverse the successor
				1461	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1462	the <tt>PHI</tt> node's operands is coming from the current basic block, then
				1463	the variable is marked as <i>alive</i> within the current basic block and all
				1464	of its predecessor basic blocks, until the basic block with the defining
				1465	instruction is encountered.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1466
				1467	</div>
				1468
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1469	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1470	<h4>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1471	<a name="liveintervals_analysis">Live Intervals Analysis</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1472	</h4>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1473
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1474	<div class="doc_text">
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1475
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1476	<p>We now have the information available to perform the live intervals analysis
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1477	and build the live intervals themselves. We start off by numbering the basic
				1478	blocks and machine instructions. We then handle the "live-in" values. These
				1479	are in physical registers, so the physical register is assumed to be killed
				1480	by the end of the basic block. Live intervals for virtual registers are
				1481	computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
				1482	live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
				1483	< N</tt>, for which a variable is live.</p>
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1484
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1485	<p><i><b>More to come...</b></i></p>
				1486
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1487	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1488
				1489	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1490	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1491	<a name="regalloc">Register Allocation</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1492	</h3>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1493
				1494	<div class="doc_text">
				1495
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1496	<p>The <i>Register Allocation problem</i> consists in mapping a program
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1497	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
				1498	to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
				1499	number of physical registers. Each target architecture has a different number
				1500	of physical registers. If the number of physical registers is not enough to
				1501	accommodate all the virtual registers, some of them will have to be mapped
				1502	into memory. These virtuals are called <i>spilled virtuals</i>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1503
				1504	</div>
				1505
				1506	<!-- _______________________________________________________________________ -->
				1507
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1508	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1509	<a name="regAlloc_represent">How registers are represented in LLVM</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1510	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1511
				1512	<div class="doc_text">
				1513
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1514	<p>In LLVM, physical registers are denoted by integer numbers that normally
				1515	range from 1 to 1023. To see how this numbering is defined for a particular
				1516	architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
				1517	architecture. For instance, by
				1518	inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
				1519	32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
				1520	<tt>MM0</tt> is mapped to 48.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1521
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1522	<p>Some architectures contain registers that share the same physical location. A
				1523	notable example is the X86 platform. For instance, in the X86 architecture,
				1524	the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
				1525	bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
				1526	particular architecture, you can check which registers are aliased by
				1527	inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
				1528	<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
				1529	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1530
				1531	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1532	Elements in the same register class are functionally equivalent, and can be
				1533	interchangeably used. Each virtual register can only be mapped to physical
				1534	registers of a particular class. For instance, in the X86 architecture, some
				1535	virtuals can only be allocated to 8 bit registers. A register class is
				1536	described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
				1537	register is compatible with a given physical, this code can be used:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1538
				1539	<div class="doc_code">
				1540	<pre>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	1541	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1542	unsigned v_reg,
				1543	unsigned p_reg) {
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	1544	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1545	"Target register must be physical");
Chris Lattner	534bcfb	2007-12-31 04:16:08 +0000	[diff] [blame]	1546	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1547	return trc->contains(p_reg);
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1548	}
				1549	</pre>
				1550	</div>
				1551
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1552	<p>Sometimes, mostly for debugging purposes, it is useful to change the number
				1553	of physical registers available in the target architecture. This must be done
				1554	statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
				1555	for <tt>RegisterClass</tt>, the last parameter of which is a list of
				1556	registers. Just commenting some out is one simple way to avoid them being
				1557	used. A more polite way is to explicitly exclude some registers from
Dan Gohman	d2cb3d2	2009-07-24 00:30:09 +0000	[diff] [blame]	1558	the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
				1559	class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
				1560	</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1561
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1562	<p>Virtual registers are also denoted by integer numbers. Contrary to physical
Jakob Stoklund Olesen	3ca2102	2011-01-08 23:10:59 +0000	[diff] [blame]	1563	registers, different virtual registers never share the same number. Whereas
				1564	physical registers are statically defined in a <tt>TargetRegisterInfo.td</tt>
				1565	file and cannot be created by the application developer, that is not the case
				1566	with virtual registers. In order to create new virtual registers, use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1567	method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
Jakob Stoklund Olesen	3ca2102	2011-01-08 23:10:59 +0000	[diff] [blame]	1568	will return a new virtual register. Use an <tt>IndexedMap<Foo,
				1569	VirtReg2IndexFunctor></tt> to hold information per virtual register. If you
				1570	need to enumerate all virtual registers, use the function
				1571	<tt>TargetRegisterInfo::index2VirtReg()</tt> to find the virtual register
				1572	numbers:</p>
				1573
				1574	<div class="doc_code">
				1575	<pre>
				1576	for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
				1577	unsigned VirtReg = TargetRegisterInfo::index2VirtReg(i);
				1578	stuff(VirtReg);
				1579	}
				1580	</pre>
				1581	</div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1582
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1583	<p>Before register allocation, the operands of an instruction are mostly virtual
				1584	registers, although physical registers may also be used. In order to check if
				1585	a given machine operand is a register, use the boolean
				1586	function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
				1587	a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
				1588	or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1589	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1590	register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
				1591	register is being used by the instruction. The
				1592	method <tt>MachineOperand::isDef()</tt> informs if that registers is being
				1593	defined.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1594
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1595	<p>We will call physical registers present in the LLVM bitcode before register
				1596	allocation <i>pre-colored registers</i>. Pre-colored registers are used in
				1597	many different situations, for instance, to pass parameters of functions
				1598	calls, and to store results of particular instructions. There are two types
				1599	of pre-colored registers: the ones <i>implicitly</i> defined, and
				1600	those <i>explicitly</i> defined. Explicitly defined registers are normal
				1601	operands, and can be accessed
				1602	with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
				1603	which registers are implicitly defined by an instruction, use
				1604	the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
				1605	where <tt>opcode</tt> is the opcode of the target instruction. One important
				1606	difference between explicit and implicit physical registers is that the
				1607	latter are defined statically for each instruction, whereas the former may
				1608	vary depending on the program being compiled. For example, an instruction
				1609	that represents a function call will always implicitly define or use the same
				1610	set of physical registers. To read the registers implicitly used by an
				1611	instruction,
				1612	use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1613	registers impose constraints on any register allocation algorithm. The
Bob Wilson	0473868	2010-04-09 18:39:54 +0000	[diff] [blame]	1614	register allocator must make sure that none of them are overwritten by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1615	the values of virtual registers while still alive.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1616
				1617	</div>
				1618
				1619	<!-- _______________________________________________________________________ -->
				1620
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1621	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1622	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1623	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1624
				1625	<div class="doc_text">
				1626
				1627	<p>There are two ways to map virtual registers to physical registers (or to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1628	memory slots). The first way, that we will call <i>direct mapping</i>, is
				1629	based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
				1630	and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
				1631	mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
				1632	and stores sending and getting values to and from memory.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1633
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1634	<p>The direct mapping provides more flexibility to the developer of the register
				1635	allocator; however, it is more error prone, and demands more implementation
				1636	work. Basically, the programmer will have to specify where load and store
				1637	instructions should be inserted in the target function being compiled in
				1638	order to get and store values in memory. To assign a physical register to a
				1639	virtual register present in a given operand,
				1640	use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
Jakob Stoklund Olesen	297907f	2010-08-31 22:01:07 +0000	[diff] [blame]	1641	use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
				1642	load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1643
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1644	<p>The indirect mapping shields the application developer from the complexities
				1645	of inserting load and store instructions. In order to map a virtual register
				1646	to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
				1647	order to map a certain virtual register to memory,
				1648	use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
				1649	the stack slot where <tt>vreg</tt>'s value will be located. If it is
				1650	necessary to map another virtual register to the same stack slot,
				1651	use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
				1652	important point to consider when using the indirect mapping, is that even if
				1653	a virtual register is mapped to memory, it still needs to be mapped to a
				1654	physical register. This physical register is the location where the virtual
				1655	register is supposed to be found before being stored or after being
				1656	reloaded.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1657
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1658	<p>If the indirect strategy is used, after all the virtual registers have been
				1659	mapped to physical registers or stack slots, it is necessary to use a spiller
				1660	object to place load and store instructions in the code. Every virtual that
				1661	has been mapped to a stack slot will be stored to memory after been defined
				1662	and will be loaded before being used. The implementation of the spiller tries
				1663	to recycle load/store instructions, avoiding unnecessary instructions. For an
				1664	example of how to invoke the spiller,
				1665	see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
				1666	in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1667
				1668	</div>
				1669
				1670	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1671	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1672	<a name="regAlloc_twoAddr">Handling two address instructions</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1673	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1674
				1675	<div class="doc_text">
				1676
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1677	<p>With very rare exceptions (e.g., function calls), the LLVM machine code
				1678	instructions are three address instructions. That is, each instruction is
				1679	expected to define at most one register, and to use at most two registers.
				1680	However, some architectures use two address instructions. In this case, the
				1681	defined register is also one of the used register. For instance, an
				1682	instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
				1683	to <tt>%EAX = %EAX + %EBX</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1684
				1685	<p>In order to produce correct code, LLVM must convert three address
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1686	instructions that represent two address instructions into true two address
				1687	instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
				1688	this specific purpose. It must be run before register allocation takes
				1689	place. After its execution, the resulting code may no longer be in SSA
				1690	form. This happens, for instance, in situations where an instruction such
				1691	as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1692
				1693	<div class="doc_code">
				1694	<pre>
				1695	%a = MOVE %b
Dan Gohman	03e5857	2008-06-13 17:55:57 +0000	[diff] [blame]	1696	%a = ADD %a %c
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1697	</pre>
				1698	</div>
				1699
				1700	<p>Notice that, internally, the second instruction is represented as
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1701	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
				1702	used and defined by the instruction.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1703
				1704	</div>
				1705
				1706	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1707	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1708	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1709	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1710
				1711	<div class="doc_text">
				1712
				1713	<p>An important transformation that happens during register allocation is called
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1714	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
				1715	that are performed on the control flow graph of programs. However,
				1716	traditional instruction sets do not implement PHI instructions. Thus, in
				1717	order to generate executable code, compilers must replace PHI instructions
				1718	with other instructions that preserve their semantics.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1719
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1720	<p>There are many ways in which PHI instructions can safely be removed from the
				1721	target code. The most traditional PHI deconstruction algorithm replaces PHI
				1722	instructions with copy instructions. That is the strategy adopted by
				1723	LLVM. The SSA deconstruction algorithm is implemented
				1724	in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
				1725	identifier <tt>PHIEliminationID</tt> must be marked as required in the code
				1726	of the register allocator.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1727
				1728	</div>
				1729
				1730	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1731	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1732	<a name="regAlloc_fold">Instruction folding</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1733	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1734
				1735	<div class="doc_text">
				1736
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1737	<p><i>Instruction folding</i> is an optimization performed during register
				1738	allocation that removes unnecessary copy instructions. For instance, a
				1739	sequence of instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1740
				1741	<div class="doc_code">
				1742	<pre>
				1743	%EBX = LOAD %mem_address
				1744	%EAX = COPY %EBX
				1745	</pre>
				1746	</div>
				1747
Dan Gohman	a7ab2bf	2008-11-24 16:35:31 +0000	[diff] [blame]	1748	<p>can be safely substituted by the single instruction:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1749
				1750	<div class="doc_code">
				1751	<pre>
				1752	%EAX = LOAD %mem_address
				1753	</pre>
				1754	</div>
				1755
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1756	<p>Instructions can be folded with
				1757	the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1758	taken when folding instructions; a folded instruction can be quite different
				1759	from the original
				1760	instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
				1761	in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
				1762	use.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1763
				1764	</div>
				1765
				1766	<!-- _______________________________________________________________________ -->
				1767
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1768	<h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1769	<a name="regAlloc_builtIn">Built in register allocators</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1770	</h4>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1771
				1772	<div class="doc_text">
				1773
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1774	<p>The LLVM infrastructure provides the application developer with three
				1775	different register allocators:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1776
				1777	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1778	<li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1779	well-know linear scan register allocator. Whereas the
				1780	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1781	implementation technique, the <i>Linear Scan</i> implementation
				1782	uses a spiller in order to place load and stores.</li>
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1783
				1784	<li><i>Fast</i> — This register allocator is the default for debug
				1785	builds. It allocates registers on a basic block level, attempting to keep
				1786	values in registers and reusing registers as appropriate.</li>
				1787
				1788	<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
				1789	based register allocator. This allocator works by constructing a PBQP
				1790	problem representing the register allocation problem under consideration,
				1791	solving this using a PBQP solver, and mapping the solution back to a
				1792	register assignment.</li>
				1793
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1794	</ul>
				1795
				1796	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1797	command line option <tt>-regalloc=...</tt>:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1798
				1799	<div class="doc_code">
				1800	<pre>
Dan Gohman	0cabaa5	2009-08-25 15:54:01 +0000	[diff] [blame]	1801	$ llc -regalloc=linearscan file.bc -o ln.s;
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1802	$ llc -regalloc=fast file.bc -o fa.s;
				1803	$ llc -regalloc=pbqp file.bc -o pbqp.s;
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1804	</pre>
				1805	</div>
				1806
				1807	</div>
				1808
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1809	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1810	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1811	<a name="proepicode">Prolog/Epilog Code Insertion</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1812	</h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1813	<div class="doc_text"><p>To Be Written</p></div>
				1814	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1815	<h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1816	<a name="latemco">Late Machine Code Optimizations</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1817	</h3>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1818	<div class="doc_text"><p>To Be Written</p></div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1819
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1820	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1821	<h3>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1822	<a name="codeemit">Code Emission</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1823	</h3>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1824
				1825	<div class="doc_text">
				1826
				1827	<p>The code emission step of code generation is responsible for lowering from
				1828	the code generator abstractions (like <a
				1829	href="#machinefunction">MachineFunction</a>, <a
				1830	href="#machineinstr">MachineInstr</a>, etc) down
				1831	to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
				1832	<a href="#mcstreamer">MCStreamer</a>, etc). This is
				1833	done with a combination of several different classes: the (misnamed)
				1834	target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
				1835	(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
				1836
				1837	<p>Since the MC layer works at the level of abstraction of object files, it
				1838	doesn't have a notion of functions, global variables etc. Instead, it thinks
				1839	about labels, directives, and instructions. A key class used at this time is
				1840	the MCStreamer class. This is an abstract API that is implemented in different
				1841	ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
				1842	an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
				1843	EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
				1844	level directives.
				1845	</p>
				1846
				1847	<p>If you are interested in implementing a code generator for a target, there
				1848	are three important things that you have to implement for your target:</p>
				1849
				1850	<ol>
				1851	<li>First, you need a subclass of AsmPrinter for your target. This class
				1852	implements the general lowering process converting MachineFunction's into MC
				1853	label constructs. The AsmPrinter base class provides a number of useful methods
				1854	and routines, and also allows you to override the lowering process in some
				1855	important ways. You should get much of the lowering for free if you are
				1856	implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
				1857	class implements much of the common logic.</li>
				1858
				1859	<li>Second, you need to implement an instruction printer for your target. The
				1860	instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
				1861	raw_ostream as text. Most of this is automatically generated from the .td file
				1862	(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
				1863	instructions), but you need to implement routines to print operands.</li>
				1864
				1865	<li>Third, you need to implement code that lowers a <a
				1866	href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
				1867	"<target>MCInstLower.cpp". This lowering process is often target
				1868	specific, and is responsible for turning jump table entries, constant pool
				1869	indices, global variable addresses, etc into MCLabels as appropriate. This
				1870	translation layer is also responsible for expanding pseudo ops used by the code
				1871	generator into the actual machine instructions they correspond to. The MCInsts
				1872	that are generated by this are fed into the instruction printer or the encoder.
				1873	</li>
				1874
				1875	</ol>
				1876
				1877	<p>Finally, at your choosing, you can also implement an subclass of
				1878	MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
				1879	This is important if you want to support direct .o file emission, or would like
				1880	to implement an assembler for your target.</p>
				1881
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1882	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1883
				1884
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1885	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1886	<h2>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1887	<a name="nativeassembler">Implementing a Native Assembler</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1888	</h2>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1889	<!-- *********************************************************************** -->
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1890
				1891	<div class="doc_text">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1892
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1893	<p>Though you're probably reading this because you want to write or maintain a
				1894	compiler backend, LLVM also fully supports building a native assemblers too.
				1895	We've tried hard to automate the generation of the assembler from the .td files
				1896	(in particular the instruction syntax and encodings), which means that a large
				1897	part of the manual and repetitive data entry can be factored and shared with the
				1898	compiler.</p>
				1899
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1900	</div>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1901
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1902	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1903	<h3 id="na_instparsing">Instruction Parsing</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1904
				1905	<div class="doc_text"><p>To Be Written</p></div>
				1906
				1907
				1908	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1909	<h3 id="na_instaliases">
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1910	Instruction Alias Processing
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1911	</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1912
				1913	<div class="doc_text">
				1914	<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
				1915	The MatchInstructionImpl function performs alias processing and then does
				1916	actual matching.</p>
				1917
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	1918	<p>Alias processing is the phase that canonicalizes different lexical forms of
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1919	the same instructions down to one representation. There are several different
				1920	kinds of alias that are possible to implement and they are listed below in the
				1921	order that they are processed (which is in order from simplest/weakest to most
				1922	complex/powerful). Generally you want to use the first alias mechanism that
				1923	meets the needs of your instruction, because it will allow a more concise
				1924	description.</p>
				1925
Chris Lattner	50e5972	2010-10-30 20:21:00 +0000	[diff] [blame]	1926	</div>
				1927
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1928	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1929	<h4>Mnemonic Aliases</h4>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1930
				1931	<div class="doc_text">
				1932
Chris Lattner	8cf8bcc	2010-10-30 19:47:49 +0000	[diff] [blame]	1933	<p>The first phase of alias processing is simple instruction mnemonic
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1934	remapping for classes of instructions which are allowed with two different
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	1935	mnemonics. This phase is a simple and unconditionally remapping from one input
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1936	mnemonic to one output mnemonic. It isn't possible for this form of alias to
				1937	look at the operands at all, so the remapping must apply for all forms of a
				1938	given mnemonic. Mnemonic aliases are defined simply, for example X86 has:
				1939	</p>
				1940
				1941	<div class="doc_code">
				1942	<pre>
				1943	def : MnemonicAlias<"cbw", "cbtw">;
				1944	def : MnemonicAlias<"smovq", "movsq">;
				1945	def : MnemonicAlias<"fldcww", "fldcw">;
				1946	def : MnemonicAlias<"fucompi", "fucomip">;
				1947	def : MnemonicAlias<"ud2a", "ud2">;
				1948	</pre>
				1949	</div>
				1950
				1951	<p>... and many others. With a MnemonicAlias definition, the mnemonic is
Chris Lattner	693173f	2010-10-30 19:23:13 +0000	[diff] [blame]	1952	remapped simply and directly. Though MnemonicAlias's can't look at any aspect
				1953	of the instruction (such as the operands) they can depend on global modes (the
				1954	same ones supported by the matcher), through a Requires clause:</p>
				1955
				1956	<div class="doc_code">
				1957	<pre>
				1958	def : MnemonicAlias<"pushf", "pushfq">, Requires<[In64BitMode]>;
				1959	def : MnemonicAlias<"pushf", "pushfl">, Requires<[In32BitMode]>;
				1960	</pre>
				1961	</div>
				1962
				1963	<p>In this example, the mnemonic gets mapped into different a new one depending
				1964	on the current instruction set.</p>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1965
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1966	</div>
				1967
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	1968	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	1969	<h4>Instruction Aliases</h4>
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	1970
				1971	<div class="doc_text">
				1972
				1973	<p>The most general phase of alias processing occurs while matching is
				1974	happening: it provides new forms for the matcher to match along with a specific
				1975	instruction to generate. An instruction alias has two parts: the string to
				1976	match and the instruction to generate. For example:
				1977	</p>
				1978
				1979	<div class="doc_code">
				1980	<pre>
				1981	def : InstAlias<"movsx $src, $dst", (MOVSX16rr8W GR16:$dst, GR8 :$src)>;
				1982	def : InstAlias<"movsx $src, $dst", (MOVSX16rm8W GR16:$dst, i8mem:$src)>;
				1983	def : InstAlias<"movsx $src, $dst", (MOVSX32rr8 GR32:$dst, GR8 :$src)>;
				1984	def : InstAlias<"movsx $src, $dst", (MOVSX32rr16 GR32:$dst, GR16 :$src)>;
				1985	def : InstAlias<"movsx $src, $dst", (MOVSX64rr8 GR64:$dst, GR8 :$src)>;
				1986	def : InstAlias<"movsx $src, $dst", (MOVSX64rr16 GR64:$dst, GR16 :$src)>;
				1987	def : InstAlias<"movsx $src, $dst", (MOVSX64rr32 GR64:$dst, GR32 :$src)>;
				1988	</pre>
				1989	</div>
				1990
				1991	<p>This shows a powerful example of the instruction aliases, matching the
				1992	same mnemonic in multiple different ways depending on what operands are present
				1993	in the assembly. The result of instruction aliases can include operands in a
				1994	different order than the destination instruction, and can use an input
				1995	multiple times, for example:</p>
				1996
				1997	<div class="doc_code">
				1998	<pre>
				1999	def : InstAlias<"clrb $reg", (XOR8rr GR8 :$reg, GR8 :$reg)>;
				2000	def : InstAlias<"clrw $reg", (XOR16rr GR16:$reg, GR16:$reg)>;
				2001	def : InstAlias<"clrl $reg", (XOR32rr GR32:$reg, GR32:$reg)>;
				2002	def : InstAlias<"clrq $reg", (XOR64rr GR64:$reg, GR64:$reg)>;
				2003	</pre>
				2004	</div>
				2005
				2006	<p>This example also shows that tied operands are only listed once. In the X86
				2007	backend, XOR8rr has two input GR8's and one output GR8 (where an input is tied
				2008	to the output). InstAliases take a flattened operand list without duplicates
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2009	for tied operands. The result of an instruction alias can also use immediates
				2010	and fixed physical registers which are added as simple immediate operands in the
				2011	result, for example:</p>
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2012
				2013	<div class="doc_code">
				2014	<pre>
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2015	// Fixed Immediate operand.
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2016	def : InstAlias<"aad", (AAD8i8 10)>;
Chris Lattner	90fd797	2010-11-06 19:57:21 +0000	[diff] [blame]	2017
				2018	// Fixed register operand.
				2019	def : InstAlias<"fcomi", (COM_FIr ST1)>;
				2020
				2021	// Simple alias.
				2022	def : InstAlias<"fcomi $reg", (COM_FIr RST:$reg)>;
Chris Lattner	98c870f	2010-11-06 19:25:43 +0000	[diff] [blame]	2023	</pre>
				2024	</div>
				2025
Chris Lattner	c7a03fb	2010-11-06 08:30:26 +0000	[diff] [blame]	2026
				2027	<p>Instruction aliases can also have a Requires clause to make them
				2028	subtarget specific.</p>
				2029
				2030	</div>
				2031
				2032
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2033
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2034	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2035	<h3 id="na_matching">Instruction Matching</h3>
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	2036
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	2037	<div class="doc_text"><p>To Be Written</p></div>
				2038
				2039
				2040
				2041
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	2042	<!-- *********************************************************************** -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2043	<h2>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	2044	<a name="targetimpls">Target-specific Implementation Notes</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2045	</h2>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2046	<!-- *********************************************************************** -->
				2047
				2048	<div class="doc_text">
				2049
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2050	<p>This section of the document explains features or design decisions that are
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2051	specific to the code generator for a particular target. First we start
				2052	with a table that summarizes what features are supported by each target.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2053
				2054	</div>
				2055
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2056	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2057	<h3>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2058	<a name="targetfeatures">Target Feature Matrix</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2059	</h3>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2060
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2061	<div class="doc_text">
				2062
				2063	<p>Note that this table does not include the C backend or Cpp backends, since
				2064	they do not use the target independent code generator infrastructure. It also
				2065	doesn't list features that are not supported fully by any target yet. It
				2066	considers a feature to be supported if at least one subtarget supports it. A
				2067	feature being supported means that it is useful and works for most cases, it
				2068	does not indicate that there are zero known bugs in the implementation. Here
				2069	is the key:</p>
				2070
				2071
				2072	<table border="1" cellspacing="0">
				2073	<tr>
				2074	<th>Unknown</th>
				2075	<th>No support</th>
				2076	<th>Partial Support</th>
				2077	<th>Complete Support</th>
				2078	</tr>
				2079	<tr>
				2080	<td class="unknown"></td>
				2081	<td class="no"></td>
				2082	<td class="partial"></td>
				2083	<td class="yes"></td>
				2084	</tr>
				2085	</table>
				2086
				2087	<p>Here is the table:</p>
				2088
				2089	<table width="689" border="1" cellspacing="0">
				2090	<tr><td></td>
Benjamin Kramer	943beeb	2010-10-30 21:07:28 +0000	[diff] [blame]	2091	<td colspan="13" align="center" style="background-color:#ffc">Target</td>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2092	</tr>
				2093	<tr>
				2094	<th>Feature</th>
				2095	<th>ARM</th>
				2096	<th>Alpha</th>
				2097	<th>Blackfin</th>
				2098	<th>CellSPU</th>
				2099	<th>MBlaze</th>
				2100	<th>MSP430</th>
				2101	<th>Mips</th>
				2102	<th>PTX</th>
				2103	<th>PowerPC</th>
				2104	<th>Sparc</th>
				2105	<th>SystemZ</th>
				2106	<th>X86</th>
				2107	<th>XCore</th>
				2108	</tr>
				2109
				2110	<tr>
				2111	<td><a href="#feat_reliable">is generally reliable</a></td>
				2112	<td class="yes"></td> <!-- ARM -->
				2113	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2114	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2115	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2116	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2117	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2118	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2119	<td class="no"></td> <!-- PTX -->
				2120	<td class="yes"></td> <!-- PowerPC -->
				2121	<td class="yes"></td> <!-- Sparc -->
				2122	<td class="unknown"></td> <!-- SystemZ -->
				2123	<td class="yes"></td> <!-- X86 -->
				2124	<td class="unknown"></td> <!-- XCore -->
				2125	</tr>
				2126
				2127	<tr>
				2128	<td><a href="#feat_asmparser">assembly parser</a></td>
				2129	<td class="no"></td> <!-- ARM -->
				2130	<td class="no"></td> <!-- Alpha -->
				2131	<td class="no"></td> <!-- Blackfin -->
				2132	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2133	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2134	<td class="no"></td> <!-- MSP430 -->
				2135	<td class="no"></td> <!-- Mips -->
				2136	<td class="no"></td> <!-- PTX -->
				2137	<td class="no"></td> <!-- PowerPC -->
				2138	<td class="no"></td> <!-- Sparc -->
				2139	<td class="no"></td> <!-- SystemZ -->
				2140	<td class="yes"></td> <!-- X86 -->
				2141	<td class="no"></td> <!-- XCore -->
				2142	</tr>
				2143
				2144	<tr>
				2145	<td><a href="#feat_disassembler">disassembler</a></td>
				2146	<td class="yes"></td> <!-- ARM -->
				2147	<td class="no"></td> <!-- Alpha -->
				2148	<td class="no"></td> <!-- Blackfin -->
				2149	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2150	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2151	<td class="no"></td> <!-- MSP430 -->
				2152	<td class="no"></td> <!-- Mips -->
				2153	<td class="no"></td> <!-- PTX -->
				2154	<td class="no"></td> <!-- PowerPC -->
				2155	<td class="no"></td> <!-- Sparc -->
				2156	<td class="no"></td> <!-- SystemZ -->
				2157	<td class="yes"></td> <!-- X86 -->
				2158	<td class="no"></td> <!-- XCore -->
				2159	</tr>
				2160
				2161	<tr>
				2162	<td><a href="#feat_inlineasm">inline asm</a></td>
				2163	<td class="yes"></td> <!-- ARM -->
				2164	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2165	<td class="yes"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2166	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2167	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2168	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2169	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2170	<td class="unknown"></td> <!-- PTX -->
				2171	<td class="yes"></td> <!-- PowerPC -->
				2172	<td class="unknown"></td> <!-- Sparc -->
				2173	<td class="unknown"></td> <!-- SystemZ -->
				2174	<td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 -->
				2175	<td class="unknown"></td> <!-- XCore -->
				2176	</tr>
				2177
				2178	<tr>
				2179	<td><a href="#feat_jit">jit</a></td>
				2180	<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
Chris Lattner	ac3031a	2010-11-14 18:25:50 +0000	[diff] [blame]	2181	<td class="no"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2182	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2183	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2184	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2185	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2186	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2187	<td class="unknown"></td> <!-- PTX -->
				2188	<td class="yes"></td> <!-- PowerPC -->
				2189	<td class="unknown"></td> <!-- Sparc -->
				2190	<td class="unknown"></td> <!-- SystemZ -->
				2191	<td class="yes"></td> <!-- X86 -->
				2192	<td class="unknown"></td> <!-- XCore -->
				2193	</tr>
				2194
				2195	<tr>
				2196	<td><a href="#feat_objectwrite">.o file writing</a></td>
				2197	<td class="no"></td> <!-- ARM -->
				2198	<td class="no"></td> <!-- Alpha -->
				2199	<td class="no"></td> <!-- Blackfin -->
				2200	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	d5fe3ef	2010-12-20 21:54:50 +0000	[diff] [blame]	2201	<td class="yes"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2202	<td class="no"></td> <!-- MSP430 -->
				2203	<td class="no"></td> <!-- Mips -->
				2204	<td class="no"></td> <!-- PTX -->
				2205	<td class="no"></td> <!-- PowerPC -->
				2206	<td class="no"></td> <!-- Sparc -->
				2207	<td class="no"></td> <!-- SystemZ -->
				2208	<td class="yes"></td> <!-- X86 -->
				2209	<td class="no"></td> <!-- XCore -->
				2210	</tr>
				2211
				2212	<tr>
				2213	<td><a href="#feat_tailcall">tail calls</a></td>
				2214	<td class="yes"></td> <!-- ARM -->
				2215	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2216	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2217	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2218	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2219	<td class="unknown"></td> <!-- MSP430 -->
Bruno Cardoso Lopes	48461f6	2010-12-19 22:41:43 +0000	[diff] [blame]	2220	<td class="no"></td> <!-- Mips -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2221	<td class="unknown"></td> <!-- PTX -->
				2222	<td class="yes"></td> <!-- PowerPC -->
				2223	<td class="unknown"></td> <!-- Sparc -->
				2224	<td class="unknown"></td> <!-- SystemZ -->
				2225	<td class="yes"></td> <!-- X86 -->
				2226	<td class="unknown"></td> <!-- XCore -->
				2227	</tr>
				2228
				2229
				2230	</table>
				2231
				2232	</div>
				2233
				2234	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2235	<h4 id="feat_reliable">Is Generally Reliable</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2236
				2237	<div class="doc_text">
				2238	<p>This box indicates whether the target is considered to be production quality.
				2239	This indicates that the target has been used as a static compiler to
				2240	compile large amounts of code by a variety of different people and is in
				2241	continuous use.</p>
				2242	</div>
				2243
				2244	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2245	<h4 id="feat_asmparser">Assembly Parser</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2246
				2247	<div class="doc_text">
				2248	<p>This box indicates whether the target supports parsing target specific .s
				2249	files by implementing the MCAsmParser interface. This is required for llvm-mc
				2250	to be able to act as a native assembler and is required for inline assembly
				2251	support in the native .o file writer.</p>
				2252
				2253	</div>
				2254
				2255
				2256	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2257	<h4 id="feat_disassembler">Disassembler</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2258
				2259	<div class="doc_text">
				2260	<p>This box indicates whether the target supports the MCDisassembler API for
				2261	disassembling machine opcode bytes into MCInst's.</p>
				2262
				2263	</div>
				2264
				2265	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2266	<h4 id="feat_inlineasm">Inline Asm</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2267
				2268	<div class="doc_text">
				2269	<p>This box indicates whether the target supports most popular inline assembly
				2270	constraints and modifiers.</p>
				2271
				2272	<p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly
				2273	constraints relating to the X86 floating point stack.</p>
				2274
				2275	</div>
				2276
				2277	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2278	<h4 id="feat_jit">JIT Support</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2279
				2280	<div class="doc_text">
				2281	<p>This box indicates whether the target supports the JIT compiler through
				2282	the ExecutionEngine interface.</p>
				2283
Chris Lattner	6fb9955	2010-10-24 16:24:22 +0000	[diff] [blame]	2284	<p id="feat_jit_arm">The ARM backend has basic support for integer code
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2285	in ARM codegen mode, but lacks NEON and full Thumb support.</p>
				2286
				2287	</div>
				2288
				2289	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2290	<h4 id="feat_objectwrite">.o File Writing</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2291
				2292	<div class="doc_text">
				2293
				2294	<p>This box indicates whether the target supports writing .o files (e.g. MachO,
				2295	ELF, and/or COFF) files directly from the target. Note that the target also
				2296	must include an assembly parser and general inline assembly support for full
				2297	inline assembly support in the .o writer.</p>
				2298
Chris Lattner	219ddf5	2010-10-28 02:22:02 +0000	[diff] [blame]	2299	<p>Targets that don't support this feature can obviously still write out .o
				2300	files, they just rely on having an external assembler to translate from a .s
				2301	file to a .o file (as is the case for many C compilers).</p>
				2302
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2303	</div>
				2304
				2305	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2306	<h4 id="feat_tailcall">Tail Calls</h4>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2307
				2308	<div class="doc_text">
				2309
				2310	<p>This box indicates whether the target supports guaranteed tail calls. These
				2311	are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
				2312	calling convention. Please see the <a href="#tailcallopt">tail call section
				2313	more more details</a>.</p>
				2314
				2315	</div>
				2316
				2317
				2318
				2319
				2320	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2321	<h3>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2322	<a name="tailcallopt">Tail call optimization</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2323	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2324
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2325	<div class="doc_text">
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2326
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2327	<p>Tail call optimization, callee reusing the stack of the caller, is currently
				2328	supported on x86/x86-64 and PowerPC. It is performed if:</p>
				2329
				2330	<ul>
Chris Lattner	2968943	2010-03-11 00:22:57 +0000	[diff] [blame]	2331	<li>Caller and callee have the calling convention <tt>fastcc</tt> or
				2332	<tt>cc 10</tt> (GHC call convention).</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2333
				2334	<li>The call is a tail call - in tail position (ret immediately follows call
				2335	and ret uses value of call or is void).</li>
				2336
				2337	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				2338
				2339	<li>Platform specific constraints are met.</li>
				2340	</ul>
				2341
				2342	<p>x86/x86-64 constraints:</p>
				2343
				2344	<ul>
				2345	<li>No variable argument lists are used.</li>
				2346
				2347	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
				2348	= hidden or protected) are supported.</li>
				2349	</ul>
				2350
				2351	<p>PowerPC constraints:</p>
				2352
				2353	<ul>
				2354	<li>No variable argument lists are used.</li>
				2355
				2356	<li>No byval parameters are used.</li>
				2357
				2358	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				2359	</ul>
				2360
				2361	<p>Example:</p>
				2362
				2363	<p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
				2364
				2365	<div class="doc_code">
				2366	<pre>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2367	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				2368
				2369	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				2370	%l1 = add i32 %in1, %in2
				2371	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				2372	ret i32 %tmp
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2373	}
				2374	</pre>
				2375	</div>
				2376
				2377	<p>Implications of <tt>-tailcallopt</tt>:</p>
				2378
				2379	<p>To support tail call optimization in situations where the callee has more
				2380	arguments than the caller a 'callee pops arguments' convention is used. This
				2381	currently causes each <tt>fastcc</tt> call that is not tail call optimized
				2382	(because one or more of above constraints are not met) to be followed by a
				2383	readjustment of the stack. So performance might be worse in such cases.</p>
				2384
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2385	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2386	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2387	<h3>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2388	<a name="sibcallopt">Sibling call optimization</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2389	</h3>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2390
				2391	<div class="doc_text">
				2392
				2393	<p>Sibling call optimization is a restricted form of tail call optimization.
				2394	Unlike tail call optimization described in the previous section, it can be
				2395	performed automatically on any tail calls when <tt>-tailcallopt</tt> option
				2396	is not specified.</p>
				2397
				2398	<p>Sibling call optimization is currently performed on x86/x86-64 when the
				2399	following constraints are met:</p>
				2400
				2401	<ul>
				2402	<li>Caller and callee have the same calling convention. It can be either
				2403	<tt>c</tt> or <tt>fastcc</tt>.
				2404
				2405	<li>The call is a tail call - in tail position (ret immediately follows call
				2406	and ret uses value of call or is void).</li>
				2407
				2408	<li>Caller and callee have matching return type or the callee result is not
				2409	used.
				2410
				2411	<li>If any of the callee arguments are being passed in stack, they must be
				2412	available in caller's own incoming argument stack and the frame offsets
				2413	must be the same.
				2414	</ul>
				2415
				2416	<p>Example:</p>
				2417	<div class="doc_code">
				2418	<pre>
				2419	declare i32 @bar(i32, i32)
				2420
				2421	define i32 @foo(i32 %a, i32 %b, i32 %c) {
				2422	entry:
				2423	%0 = tail call i32 @bar(i32 %a, i32 %b)
				2424	ret i32 %0
				2425	}
				2426	</pre>
				2427	</div>
				2428
				2429	</div>
				2430	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2431	<h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2432	<a name="x86">The X86 backend</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2433	</h3>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2434
				2435	<div class="doc_text">
				2436
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2437	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2438	code generator is capable of targeting a variety of x86-32 and x86-64
				2439	processors, and includes support for ISA extensions such as MMX and SSE.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2440
				2441	</div>
				2442
				2443	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2444	<h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2445	<a name="x86_tt">X86 Target Triples supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2446	</h4>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2447
				2448	<div class="doc_text">
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2449
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2450	<p>The following are the known target triples that are supported by the X86
				2451	backend. This is not an exhaustive list, and it would be useful to add those
				2452	that people test.</p>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2453
				2454	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2455	<li><b>i686-pc-linux-gnu</b> — Linux</li>
				2456
				2457	<li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li>
				2458
				2459	<li><b>i686-pc-cygwin</b> — Cygwin on Win32</li>
				2460
				2461	<li><b>i686-pc-mingw32</b> — MingW on Win32</li>
				2462
				2463	<li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li>
				2464
				2465	<li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li>
Torok Edwin	c457b65	2009-06-15 12:17:44 +0000	[diff] [blame]	2466
				2467	<li><b>x86_64-unknown-linux-gnu</b> — Linux</li>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2468	</ul>
				2469
				2470	</div>
				2471
				2472	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2473	<h4>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2474	<a name="x86_cc">X86 Calling Conventions supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2475	</h4>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2476
				2477
				2478	<div class="doc_text">
				2479
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2480	<p>The following target-specific calling conventions are known to backend:</p>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2481
				2482	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2483	<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
				2484	Windows platform (CC ID = 64).</li>
				2485
				2486	<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
				2487	Windows platform (CC ID = 65).</li>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2488	</ul>
				2489
				2490	</div>
				2491
				2492	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2493	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2494	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2495	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2496
				2497	<div class="doc_text">
				2498
Misha Brukman	600df45	2005-02-17 22:22:24 +0000	[diff] [blame]	2499	<p>The x86 has a very flexible way of accessing memory. It is capable of
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2500	forming memory addresses of the following expression directly in integer
				2501	instructions (which use ModR/M addressing):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2502
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2503	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2504	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2505	SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2506	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2507	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2508
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2509	<p>In order to represent this, LLVM tracks no less than 5 operands for each
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2510	memory operand of this form. This means that the "load" form of
				2511	'<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2512
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2513	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2514	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2515	Index: 0 \| 1 2 3 4 5
				2516	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement Segment
				2517	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2518	</pre>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2519	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2520
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2521	<p>Stores, and all other instructions, treat the four memory operands in the
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2522	same way and in the same order. If the segment register is unspecified
				2523	(regno = 0), then no segment override is generated. "Lea" operations do not
				2524	have a segment register specified, so they only have 4 operands for their
				2525	memory reference.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2526
				2527	</div>
				2528
				2529	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2530	<h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2531	<a name="x86_memory">X86 address spaces supported</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2532	</h4>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2533
				2534	<div class="doc_text">
				2535
Jay Foad	cb88ec3	2011-04-06 07:55:30 +0000	[diff] [blame]	2536	<p>x86 has a feature which provides
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2537	the ability to perform loads and stores to different address spaces
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2538	via the x86 segment registers. A segment override prefix byte on an
				2539	instruction causes the instruction's memory access to go to the specified
				2540	segment. LLVM address space 0 is the default address space, which includes
				2541	the stack, and any unqualified memory accesses in a program. Address spaces
				2542	1-255 are currently reserved for user-defined code. The GS-segment is
Chris Lattner	1777d0c	2009-05-05 18:52:19 +0000	[diff] [blame]	2543	represented by address space 256, while the FS-segment is represented by
				2544	address space 257. Other x86 segments have yet to be allocated address space
				2545	numbers.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2546
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2547	<p>While these address spaces may seem similar to TLS via the
				2548	<tt>thread_local</tt> keyword, and often use the same underlying hardware,
				2549	there are some fundamental differences.</p>
				2550
				2551	<p>The <tt>thread_local</tt> keyword applies to global variables and
				2552	specifies that they are to be allocated in thread-local memory. There are
				2553	no type qualifiers involved, and these variables can be pointed to with
				2554	normal pointers and accessed with normal loads and stores.
				2555	The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
				2556	level (though LLVM doesn't yet have implementations of it for some
				2557	configurations).<p>
				2558
				2559	<p>Special address spaces, in contrast, apply to static types. Every
				2560	load and store has a particular address space in its address operand type,
				2561	and this is what determines which address space is accessed.
				2562	LLVM ignores these special address space qualifiers on global variables,
				2563	and does not provide a way to directly allocate storage in them.
				2564	At the LLVM IR level, the behavior of these special address spaces depends
				2565	in part on the underlying OS or runtime environment, and they are specific
				2566	to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
				2567
				2568	<p>Some operating systems and runtime environments use (or may in the future
				2569	use) the FS/GS-segment registers for various low-level purposes, so care
				2570	should be taken when considering them.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2571
				2572	</div>
				2573
				2574	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2575	<h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2576	<a name="x86_names">Instruction naming</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2577	</h4>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2578
				2579	<div class="doc_text">
				2580
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2581	<p>An instruction name consists of the base name, a default operand size, and a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2582	a character per operand with an optional special size. For example:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2583
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2584	<div class="doc_code">
				2585	<pre>
				2586	ADD8rr -> add, 8-bit register, 8-bit register
				2587	IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
				2588	IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
				2589	MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
				2590	</pre>
				2591	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2592
				2593	</div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2594
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2595	<!-- ======================================================================= -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2596	<h3>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2597	<a name="ppc">The PowerPC backend</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2598	</h3>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2599
				2600	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2601
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2602	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2603	code generation is retargetable to several variations or <i>subtargets</i> of
				2604	the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
				2605
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2606	</div>
				2607
				2608	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2609	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2610	<a name="ppc_abi">LLVM PowerPC ABI</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2611	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2612
				2613	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2614
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2615	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2616	relative (PIC) or static addressing for accessing global values, so no TOC
				2617	(r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
				2618	of a stack frame. LLVM takes advantage of having no TOC to provide space to
				2619	save the frame pointer in the PowerPC linkage area of the caller frame.
				2620	Other details of PowerPC ABI can be found at <a href=
				2621	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				2622	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
				2623	is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
				2624	for system use.</p>
				2625
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2626	</div>
				2627
				2628	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2629	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2630	<a name="ppc_frame">Frame Layout</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2631	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2632
				2633	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2634
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2635	<p>The size of a PowerPC frame is usually fixed for the duration of a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2636	function's invocation. Since the frame is fixed size, all references
				2637	into the frame can be accessed via fixed offsets from the stack pointer. The
				2638	exception to this is when dynamic alloca or variable sized arrays are
				2639	present, then a base pointer (r31) is used as a proxy for the stack pointer
				2640	and stack pointer is free to grow or shrink. A base pointer is also used if
				2641	llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
				2642	always aligned to 16 bytes, so that space allocated for altivec vectors will
				2643	be properly aligned.</p>
				2644
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2645	<p>An invocation frame is laid out as follows (low memory at top);</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2646
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2647	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2648	<tr>
				2649	<td>Linkage<br><br></td>
				2650	</tr>
				2651	<tr>
				2652	<td>Parameter area<br><br></td>
				2653	</tr>
				2654	<tr>
				2655	<td>Dynamic area<br><br></td>
				2656	</tr>
				2657	<tr>
				2658	<td>Locals area<br><br></td>
				2659	</tr>
				2660	<tr>
				2661	<td>Saved registers area<br><br></td>
				2662	</tr>
				2663	<tr style="border-style: none hidden none hidden;">
				2664	<td><br></td>
				2665	</tr>
				2666	<tr>
				2667	<td>Previous Frame<br><br></td>
				2668	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2669	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2670
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2671	<p>The <i>linkage</i> area is used by a callee to save special registers prior
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2672	to allocating its own frame. Only three entries are relevant to LLVM. The
				2673	first entry is the previous stack pointer (sp), aka link. This allows
				2674	probing tools like gdb or exception handlers to quickly scan the frames in
				2675	the stack. A function epilog can also use the link to pop the frame from the
				2676	stack. The third entry in the linkage area is used to save the return
				2677	address from the lr register. Finally, as mentioned above, the last entry is
				2678	used to save the previous frame pointer (r31.) The entries in the linkage
				2679	area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
				2680	mode and 48 bytes in 64 bit mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2681
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2682	<p>32 bit linkage area</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2683
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2684	<table class="layout">
				2685	<tr>
				2686	<td>0</td>
				2687	<td>Saved SP (r1)</td>
				2688	</tr>
				2689	<tr>
				2690	<td>4</td>
				2691	<td>Saved CR</td>
				2692	</tr>
				2693	<tr>
				2694	<td>8</td>
				2695	<td>Saved LR</td>
				2696	</tr>
				2697	<tr>
				2698	<td>12</td>
				2699	<td>Reserved</td>
				2700	</tr>
				2701	<tr>
				2702	<td>16</td>
				2703	<td>Reserved</td>
				2704	</tr>
				2705	<tr>
				2706	<td>20</td>
				2707	<td>Saved FP (r31)</td>
				2708	</tr>
				2709	</table>
				2710
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2711	<p>64 bit linkage area</p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2712
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2713	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2714	<tr>
				2715	<td>0</td>
				2716	<td>Saved SP (r1)</td>
				2717	</tr>
				2718	<tr>
				2719	<td>8</td>
				2720	<td>Saved CR</td>
				2721	</tr>
				2722	<tr>
				2723	<td>16</td>
				2724	<td>Saved LR</td>
				2725	</tr>
				2726	<tr>
				2727	<td>24</td>
				2728	<td>Reserved</td>
				2729	</tr>
				2730	<tr>
				2731	<td>32</td>
				2732	<td>Reserved</td>
				2733	</tr>
				2734	<tr>
				2735	<td>40</td>
				2736	<td>Saved FP (r31)</td>
				2737	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2738	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2739
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2740	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2741	function. Following the PowerPC ABI, the first few arguments are actually
				2742	passed in registers, with the space in the parameter area unused. However,
				2743	if there are not enough registers or the callee is a thunk or vararg
				2744	function, these register arguments can be spilled into the parameter area.
				2745	Thus, the parameter area must be large enough to store all the parameters for
				2746	the largest call sequence made by the caller. The size must also be
				2747	minimally large enough to spill registers r3-r10. This allows callees blind
				2748	to the call signature, such as thunks and vararg functions, enough space to
				2749	cache the argument registers. Therefore, the parameter area is minimally 32
				2750	bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
				2751	a fixed offset from the top of the frame, that a callee can access its spilt
				2752	arguments using fixed offsets from the stack pointer (or base pointer.)</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2753
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2754	<p>Combining the information about the linkage, parameter areas and alignment. A
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2755	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				2756	mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2757
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2758	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2759	alloca then space is added to the stack, the linkage and parameter areas are
				2760	shifted to top of stack, and the new space is available immediately below the
				2761	linkage and parameter areas. The cost of shifting the linkage and parameter
				2762	areas is minor since only the link value needs to be copied. The link value
				2763	can be easily fetched by adding the original frame size to the base pointer.
				2764	Note that allocations in the dynamic space need to observe 16 byte
				2765	alignment.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2766
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2767	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2768	variables.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2769
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2770	<p>The <i>saved registers area</i> is where the llvm compiler spills callee
				2771	saved registers on entry to the callee.</p>
				2772
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2773	</div>
				2774
				2775	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2776	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2777	<a name="ppc_prolog">Prolog/Epilog</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2778	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2779
				2780	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2781
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2782	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2783	the following exceptions. Callee saved registers are spilled after the frame
				2784	is created. This allows the llvm epilog/prolog support to be common with
				2785	other targets. The base pointer callee saved register r31 is saved in the
				2786	TOC slot of linkage area. This simplifies allocation of space for the base
				2787	pointer and makes it convenient to locate programatically and during
				2788	debugging.</p>
				2789
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2790	</div>
				2791
				2792	<!-- _______________________________________________________________________ -->
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2793	<h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2794	<a name="ppc_dynamic">Dynamic Allocation</a>
NAKAMURA Takumi	05d0265	2011-04-18 23:59:50 +0000	[diff] [blame]	2795	</h4>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2796
				2797	<div class="doc_text">
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2798
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2799	<p><i>TODO - More to come.</i></p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2800
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2801	</div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2802
				2803
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2804	<!-- *********************************************************************** -->
				2805	<hr>
				2806	<address>
				2807	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	2808	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2809	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	f00ddb0	2008-12-11 18:23:24 +0000	[diff] [blame]	2810	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2811
				2812	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
NAKAMURA Takumi	b9a3363	2011-04-09 02:13:37 +0000	[diff] [blame]	2813	<a href="http://llvm.org/">The LLVM Compiler Infrastructure</a><br>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2814	Last modified: $Date$
				2815	</address>
				2816
				2817	</body>
				2818	</html>