Blame - docs/CodeGenerator.html - fp2-dev/platform/external/llvm

blob: 7185f4d0a62f8953dbaae8fdc87206e49c349328 [file] [log] [blame]

Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
				8	</head>
				9	<body>
				10
				11	<div class="doc_title">
				12	The LLVM Target-Independent Code Generator
				13	</div>
				14
				15	<ol>
				16	<li><a href="#introduction">Introduction</a>
				17	<ul>
				18	<li><a href="#required">Required components in the code generator</a></li>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	19	<li><a href="#high-level-design">The high-level design of the code
				20	generator</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	21	<li><a href="#tablegen">Using TableGen for target description</a></li>
				22	</ul>
				23	</li>
				24	<li><a href="#targetdesc">Target description classes</a>
				25	<ul>
				26	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				27	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	28	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	29	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	30	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				31	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	32	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	33	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				34	</ul>
				35	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	36	<li><a href="#codegendesc">The "Machine" Code Generator classes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	37	<ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	38	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	39	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				40	class</a></li>
				41	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	42	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	43	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	44	<li><a href="#mc">The "MC" Layer</a>
				45	<ul>
				46	<li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
				47	<li><a href="#mccontext">The <tt>MCContext</tt> class</a>
				48	<li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
				49	<li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
				50	<li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
				51	</ul>
				52	</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	53	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	54	<ul>
				55	<li><a href="#instselect">Instruction Selection</a>
				56	<ul>
				57	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				58	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				59	Process</a></li>
				60	<li><a href="#selectiondag_build">Initial SelectionDAG
				61	Construction</a></li>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	62	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	63	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				64	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	65	Phase: the DAG Combiner</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	66	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	67	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	68	Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	69	<li><a href="#selectiondag_future">Future directions for the
				70	SelectionDAG</a></li>
				71	</ul></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	72	<li><a href="#liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	73	<ul>
				74	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	75	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	76	</ul></li>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	77	<li><a href="#regalloc">Register Allocation</a>
				78	<ul>
				79	<li><a href="#regAlloc_represent">How registers are represented in
				80	LLVM</a></li>
				81	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				82	registers</a></li>
				83	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				84	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				85	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				86	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				87	</ul></li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	88	<li><a href="#codeemit">Code Emission</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	89	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	90	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	91	<li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
				92
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	93	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	94	<ul>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	95	<li><a href="#tailcallopt">Tail call optimization</a></li>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	96	<li><a href="#sibcallopt">Sibling call optimization</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	97	<li><a href="#x86">The X86 backend</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	98	<li><a href="#ppc">The PowerPC backend</a>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	99	<ul>
				100	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				101	<li><a href="#ppc_frame">Frame Layout</a></li>
				102	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				103	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	104	</ul></li>
				105	</ul></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	106
				107	</ol>
				108
				109	<div class="doc_author">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	110	<p>Written by the LLVM Team.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	111	</div>
				112
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	113	<div class="doc_warning">
				114	<p>Warning: This is a work in progress.</p>
				115	</div>
				116
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	117	<!-- *********************************************************************** -->
				118	<div class="doc_section">
				119	<a name="introduction">Introduction</a>
				120	</div>
				121	<!-- *********************************************************************** -->
				122
				123	<div class="doc_text">
				124
				125	<p>The LLVM target-independent code generator is a framework that provides a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	126	suite of reusable components for translating the LLVM internal representation
				127	to the machine code for a specified target—either in assembly form
				128	(suitable for a static compiler) or in binary machine code format (usable for
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	129	a JIT compiler). The LLVM target-independent code generator consists of six
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	130	main components:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	131
				132	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	133	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				134	capture important properties about various aspects of the machine,
				135	independently of how they will be used. These interfaces are defined in
				136	<tt>include/llvm/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	137
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	138	<li>Classes used to represent the <a href="#codegendesc">code being
				139	generated</a> for a target. These classes are intended to be abstract
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	140	enough to represent the machine code for <i>any</i> target machine. These
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	141	classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
				142	concepts like "constant pool entries" and "jump tables" are explicitly
				143	exposed.</li>
				144
				145	<li>Classes and algorithms used to represent code as the object file level,
				146	the <a href="#mc">MC Layer</a>. These classes represent assembly level
				147	constructs like labels, sections, and instructions. At this level,
				148	concepts like "constant pool entries" and "jump tables" don't exist.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	149
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	150	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				151	various phases of native code generation (register allocation, scheduling,
				152	stack frame representation, etc). This code lives
				153	in <tt>lib/CodeGen/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	154
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	155	<li><a href="#targetimpls">Implementations of the abstract target description
				156	interfaces</a> for particular targets. These machine descriptions make
				157	use of the components provided by LLVM, and can optionally provide custom
				158	target-specific passes, to build complete code generators for a specific
				159	target. Target descriptions live in <tt>lib/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	160
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	161	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				162	completely target independent (it uses the <tt>TargetJITInfo</tt>
				163	structure to interface for target-specific issues. The code for the
				164	target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	165	</ol>
				166
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	167	<p>Depending on which part of the code generator you are interested in working
				168	on, different pieces of this will be useful to you. In any case, you should
				169	be familiar with the <a href="#targetdesc">target description</a>
				170	and <a href="#codegendesc">machine code representation</a> classes. If you
				171	want to add a backend for a new target, you will need
				172	to <a href="#targetimpls">implement the target description</a> classes for
				173	your new target and understand the <a href="LangRef.html">LLVM code
				174	representation</a>. If you are interested in implementing a
				175	new <a href="#codegenalgs">code generation algorithm</a>, it should only
				176	depend on the target-description and machine code representation classes,
				177	ensuring that it is portable.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	178
				179	</div>
				180
				181	<!-- ======================================================================= -->
				182	<div class="doc_subsection">
				183	<a name="required">Required components in the code generator</a>
				184	</div>
				185
				186	<div class="doc_text">
				187
				188	<p>The two pieces of the LLVM code generator are the high-level interface to the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	189	code generator and the set of reusable components that can be used to build
				190	target-specific backends. The two most important interfaces
				191	(<a href="#targetmachine"><tt>TargetMachine</tt></a>
				192	and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				193	required to be defined for a backend to fit into the LLVM system, but the
				194	others must be defined if the reusable code generator components are going to
				195	be used.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	196
				197	<p>This design has two important implications. The first is that LLVM can
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	198	support completely non-traditional code generation targets. For example, the
				199	C backend does not require register allocation, instruction selection, or any
				200	of the other standard components provided by the system. As such, it only
				201	implements these two interfaces, and does its own thing. Another example of
				202	a code generator like this is a (purely hypothetical) backend that converts
				203	LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	204
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	205	<p>This design also implies that it is possible to design and implement
				206	radically different code generators in the LLVM system that do not make use
				207	of any of the built-in components. Doing so is not recommended at all, but
				208	could be required for radically different targets that do not fit into the
				209	LLVM machine description model: FPGAs for example.</p>
Chris Lattner	900bf8c	2004-06-02 07:06:06 +0000	[diff] [blame]	210
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	211	</div>
				212
				213	<!-- ======================================================================= -->
				214	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	215	<a name="high-level-design">The high-level design of the code generator</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	216	</div>
				217
				218	<div class="doc_text">
				219
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	220	<p>The LLVM target-independent code generator is designed to support efficient
				221	and quality code generation for standard register-based microprocessors.
				222	Code generation in this model is divided into the following stages:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	223
				224	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	225	<li><b><a href="#instselect">Instruction Selection</a></b> — This phase
				226	determines an efficient way to express the input LLVM code in the target
				227	instruction set. This stage produces the initial code for the program in
				228	the target instruction set, then makes use of virtual registers in SSA
				229	form and physical registers that represent any required register
				230	assignments due to target constraints or calling conventions. This step
				231	turns the LLVM code into a DAG of target instructions.</li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	232
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	233	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> —
				234	This phase takes the DAG of target instructions produced by the
				235	instruction selection phase, determines an ordering of the instructions,
				236	then emits the instructions
				237	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
				238	Note that we describe this in the <a href="#instselect">instruction
				239	selection section</a> because it operates on
				240	a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	241
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	242	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> —
				243	This optional stage consists of a series of machine-code optimizations
				244	that operate on the SSA-form produced by the instruction selector.
				245	Optimizations like modulo-scheduling or peephole optimization work
				246	here.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	247
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	248	<li><b><a href="#regalloc">Register Allocation</a></b> — The target code
				249	is transformed from an infinite virtual register file in SSA form to the
				250	concrete register file used by the target. This phase introduces spill
				251	code and eliminates all virtual register references from the program.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	252
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	253	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once
				254	the machine code has been generated for the function and the amount of
				255	stack space required is known (used for LLVM alloca's and spill slots),
				256	the prolog and epilog code for the function can be inserted and "abstract
				257	stack location references" can be eliminated. This stage is responsible
				258	for implementing optimizations like frame-pointer elimination and stack
				259	packing.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	260
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	261	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> —
				262	Optimizations that operate on "final" machine code can go here, such as
				263	spill code scheduling and peephole optimizations.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	264
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	265	<li><b><a href="#codeemit">Code Emission</a></b> — The final stage
				266	actually puts out the code for the current function, either in the target
				267	assembler format or in machine code.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	268	</ol>
				269
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	270	<p>The code generator is based on the assumption that the instruction selector
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	271	will use an optimal pattern matching selector to create high-quality
				272	sequences of native instructions. Alternative code generator designs based
				273	on pattern expansion and aggressive iterative peephole optimization are much
				274	slower. This design permits efficient compilation (important for JIT
				275	environments) and aggressive optimization (used when generating code offline)
				276	by allowing components of varying levels of sophistication to be used for any
				277	step of compilation.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	278
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	279	<p>In addition to these stages, target implementations can insert arbitrary
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	280	target-specific passes into the flow. For example, the X86 target uses a
				281	special pass to handle the 80x87 floating point stack architecture. Other
				282	targets with unusual requirements can be supported with custom passes as
				283	needed.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	284
				285	</div>
				286
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	287	<!-- ======================================================================= -->
				288	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	289	<a name="tablegen">Using TableGen for target description</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	290	</div>
				291
				292	<div class="doc_text">
				293
Chris Lattner	5489e93	2004-06-01 18:35:00 +0000	[diff] [blame]	294	<p>The target description classes require a detailed description of the target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	295	architecture. These target descriptions often have a large amount of common
				296	information (e.g., an <tt>add</tt> instruction is almost identical to a
				297	<tt>sub</tt> instruction). In order to allow the maximum amount of
				298	commonality to be factored out, the LLVM code generator uses
				299	the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
				300	chunks of the target machine, which allows the use of domain-specific and
				301	target-specific abstractions to reduce the amount of repetition.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	302
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	303	<p>As LLVM continues to be developed and refined, we plan to move more and more
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	304	of the target description to the <tt>.td</tt> form. Doing so gives us a
				305	number of advantages. The most important is that it makes it easier to port
				306	LLVM because it reduces the amount of C++ code that has to be written, and
				307	the surface area of the code generator that needs to be understood before
				308	someone can get something working. Second, it makes it easier to change
				309	things. In particular, if tables and other things are all emitted
				310	by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
				311	update all of the targets to a new interface.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	312
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	313	</div>
				314
				315	<!-- *********************************************************************** -->
				316	<div class="doc_section">
				317	<a name="targetdesc">Target description classes</a>
				318	</div>
				319	<!-- *********************************************************************** -->
				320
				321	<div class="doc_text">
				322
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	323	<p>The LLVM target description classes (located in the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	324	<tt>include/llvm/Target</tt> directory) provide an abstract description of
				325	the target machine independent of any particular client. These classes are
				326	designed to capture the <i>abstract</i> properties of the target (such as the
				327	instructions and registers it has), and do not incorporate any particular
				328	pieces of code generation algorithms.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	329
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	330	<p>All of the target description classes (except the
				331	<tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
				332	subclassed by the concrete target implementation, and have virtual methods
				333	implemented. To get to these implementations, the
				334	<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
				335	that should be implemented by the target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	336
				337	</div>
				338
				339	<!-- ======================================================================= -->
				340	<div class="doc_subsection">
				341	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
				342	</div>
				343
				344	<div class="doc_text">
				345
				346	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	347	access the target-specific implementations of the various target description
				348	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				349	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				350	designed to be specialized by a concrete target implementation
				351	(e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
				352	methods. The only required target description class is
				353	the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
				354	generator components are to be used, the other interfaces should be
				355	implemented as well.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	356
				357	</div>
				358
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	359	<!-- ======================================================================= -->
				360	<div class="doc_subsection">
				361	<a name="targetdata">The <tt>TargetData</tt> class</a>
				362	</div>
				363
				364	<div class="doc_text">
				365
				366	<p>The <tt>TargetData</tt> class is the only required target description class,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	367	and it is the only class that is not extensible (you cannot derived a new
				368	class from it). <tt>TargetData</tt> specifies information about how the
				369	target lays out memory for structures, the alignment requirements for various
				370	data types, the size of pointers in the target, and whether the target is
				371	little-endian or big-endian.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	372
				373	</div>
				374
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	375	<!-- ======================================================================= -->
				376	<div class="doc_subsection">
				377	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
				378	</div>
				379
				380	<div class="doc_text">
				381
				382	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	383	selectors primarily to describe how LLVM code should be lowered to
				384	SelectionDAG operations. Among other things, this class indicates:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	385
				386	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	387	<li>an initial register class to use for various <tt>ValueType</tt>s,</li>
				388
				389	<li>which operations are natively supported by the target machine,</li>
				390
				391	<li>the return type of <tt>setcc</tt> operations,</li>
				392
				393	<li>the type to use for shift amounts, and</li>
				394
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	395	<li>various high-level characteristics, like whether it is profitable to turn
				396	division by a constant into a multiplication sequence</li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	397	</ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	398
				399	</div>
				400
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	401	<!-- ======================================================================= -->
				402	<div class="doc_subsection">
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	403	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	404	</div>
				405
				406	<div class="doc_text">
				407
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	408	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
				409	of the target and any interactions between the registers.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	410
				411	<p>Registers in the code generator are represented in the code generator by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	412	unsigned integers. Physical registers (those that actually exist in the
				413	target description) are unique small numbers, and virtual registers are
				414	generally large. Note that register #0 is reserved as a flag value.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	415
				416	<p>Each register in the processor description has an associated
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	417	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				418	register (used for assembly output and debugging dumps) and a set of aliases
				419	(used to indicate whether one register overlaps with another).</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	420
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	421	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	422	class exposes a set of processor specific register classes (instances of the
				423	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				424	registers that have the same properties (for example, they are all 32-bit
				425	integer registers). Each SSA virtual register created by the instruction
				426	selector has an associated register class. When the register allocator runs,
				427	it replaces virtual registers with a physical register in the set.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	428
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	429	<p>The target-specific implementations of these classes is auto-generated from
				430	a <a href="TableGenFundamentals.html">TableGen</a> description of the
				431	register file.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	432
				433	</div>
				434
				435	<!-- ======================================================================= -->
				436	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	437	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	438	</div>
				439
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	440	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	441
				442	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				443	instructions supported by the target. It is essentially an array of
				444	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				445	instruction the target supports. Descriptors define things like the mnemonic
				446	for the opcode, the number of operands, the list of implicit register uses
				447	and defs, whether the instruction has certain target-independent properties
				448	(accesses memory, is commutable, etc), and holds any target-specific
				449	flags.</p>
				450
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	451	</div>
				452
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	453	<!-- ======================================================================= -->
				454	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	455	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	456	</div>
				457
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	458	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	459
				460	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				461	stack frame layout of the target. It holds the direction of stack growth, the
				462	known stack alignment on entry to each function, and the offset to the local
				463	area. The offset to the local area is the offset from the stack pointer on
				464	function entry to the first location where function data (local variables,
				465	spill locations) can be stored.</p>
				466
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	467	</div>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	468
				469	<!-- ======================================================================= -->
				470	<div class="doc_subsection">
				471	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
				472	</div>
				473
				474	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	475
				476	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				477	specific chip set being targeted. A sub-target informs code generation of
				478	which instructions are supported, instruction latencies and instruction
				479	execution itinerary; i.e., which processing units are used, in what order,
				480	and for how long.</p>
				481
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	482	</div>
				483
				484
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	485	<!-- ======================================================================= -->
				486	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	487	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	488	</div>
				489
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	490	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	491
				492	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				493	Just-In-Time code generator to perform target-specific activities, such as
				494	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				495	should provide one of these objects through the <tt>getJITInfo</tt>
				496	method.</p>
				497
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	498	</div>
				499
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	500	<!-- *********************************************************************** -->
				501	<div class="doc_section">
				502	<a name="codegendesc">Machine code description classes</a>
				503	</div>
				504	<!-- *********************************************************************** -->
				505
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	506	<div class="doc_text">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	507
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	508	<p>At the high-level, LLVM code is translated to a machine specific
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	509	representation formed out of
				510	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				511	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
				512	and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
				513	in <tt>include/llvm/CodeGen</tt>). This representation is completely target
				514	agnostic, representing instructions in their most abstract form: an opcode
				515	and a series of operands. This representation is designed to support both an
				516	SSA representation for machine code, as well as a register allocated, non-SSA
				517	form.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	518
				519	</div>
				520
				521	<!-- ======================================================================= -->
				522	<div class="doc_subsection">
				523	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
				524	</div>
				525
				526	<div class="doc_text">
				527
				528	<p>Target machine instructions are represented as instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	529	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				530	representing machine instructions. In particular, it only keeps track of an
				531	opcode number and a set of operands.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	532
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	533	<p>The opcode number is a simple unsigned integer that only has meaning to a
				534	specific backend. All of the instructions for a target should be defined in
				535	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
				536	auto-generated from this description. The <tt>MachineInstr</tt> class does
				537	not have any information about how to interpret the instruction (i.e., what
				538	the semantics of the instruction are); for that you must refer to the
				539	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	540
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	541	<p>The operands of a machine instruction can be of several different types: a
				542	register reference, a constant integer, a basic block reference, etc. In
				543	addition, a machine operand should be marked as a def or a use of the value
				544	(though only registers are allowed to be defs).</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	545
				546	<p>By convention, the LLVM code generator orders instruction operands so that
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	547	all register definitions come before the register uses, even on architectures
				548	that are normally printed in other orders. For example, the SPARC add
				549	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				550	and stores the result into the "%i3" register. In the LLVM code generator,
				551	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
				552	destination first.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	553
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	554	<p>Keeping destination (definition) operands at the beginning of the operand
				555	list has several advantages. In particular, the debugging printer will print
				556	the instruction like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	557
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	558	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	559	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	560	%r3 = add %i1, %i2
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	561	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	562	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	563
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	564	<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
				565	instructions</a> whose only def is the first operand.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	566
				567	</div>
				568
				569	<!-- _______________________________________________________________________ -->
				570	<div class="doc_subsubsection">
				571	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
				572	</div>
				573
				574	<div class="doc_text">
				575
				576	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	577	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				578	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				579	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	580
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	581	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	582	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	583	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				584	// instruction. The '1' specifies how many operands will be added.
				585	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	586
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	587	// Create the same instr, but insert it at the end of a basic block.
				588	MachineBasicBlock &MBB = ...
				589	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	590
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	591	// Create the same instr, but insert it before a specified iterator point.
				592	MachineBasicBlock::iterator MBBI = ...
				593	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	594
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	595	// Create a 'cmp Reg, 0' instruction, no destination reg.
				596	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				597	// Create an 'sahf' instruction which takes no operands and stores nothing.
				598	MI = BuildMI(X86::SAHF, 0);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	599
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	600	// Create a self looping branch instruction.
				601	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	602	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	603	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	604
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	605	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	606	have to specify the number of operands that the machine instruction will
				607	take. This allows for efficient memory allocation. You also need to specify
				608	if operands default to be uses of values, not definitions. If you need to
				609	add a definition operand (other than the optional destination register), you
				610	must explicitly mark it as such:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	611
				612	<div class="doc_code">
				613	<pre>
Bill Wendling	587daed	2009-05-13 21:33:08 +0000	[diff] [blame]	614	MI.addReg(Reg, RegState::Define);
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	615	</pre>
				616	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	617
				618	</div>
				619
				620	<!-- _______________________________________________________________________ -->
				621	<div class="doc_subsubsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	622	<a name="fixedregs">Fixed (preassigned) registers</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	623	</div>
				624
				625	<div class="doc_text">
				626
				627	<p>One important issue that the code generator needs to be aware of is the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	628	presence of fixed registers. In particular, there are often places in the
				629	instruction stream where the register allocator <em>must</em> arrange for a
				630	particular value to be in a particular register. This can occur due to
				631	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				632	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
				633	calling conventions. In any case, the instruction selector should emit code
				634	that copies a virtual register into or out of a physical register when
				635	needed.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	636
				637	<p>For example, consider this simple LLVM example:</p>
				638
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	639	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	640	<pre>
Matthijs Kooijman	61399af	2008-06-04 15:46:35 +0000	[diff] [blame]	641	define i32 @test(i32 %X, i32 %Y) {
				642	%Z = udiv i32 %X, %Y
				643	ret i32 %Z
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	644	}
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	645	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	646	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	647
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	648	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	649	and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
				650	get this):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	651
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	652	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	653	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	654	;; Start of div
				655	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				656	%reg1027 = sar %reg1024, 31
				657	%EDX = mov %reg1027 ;; Sign extend X into EDX
				658	idiv %reg1025 ;; Divide by Y (in reg1025)
				659	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	660
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	661	;; Start of ret
				662	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				663	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	664	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	665	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	666
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	667	<p>By the end of code generation, the register allocator has coalesced the
				668	registers and deleted the resultant identity moves producing the following
				669	code:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	670
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	671	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	672	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	673	;; X is in EAX, Y is in ECX
				674	mov %EAX, %EDX
				675	sar %EDX, 31
				676	idiv %ECX
				677	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	678	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	679	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	680
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	681	<p>This approach is extremely general (if it can handle the X86 architecture, it
				682	can handle anything!) and allows all of the target specific knowledge about
				683	the instruction stream to be isolated in the instruction selector. Note that
				684	physical registers should have a short lifetime for good code generation, and
				685	all physical registers are assumed dead on entry to and exit from basic
				686	blocks (before register allocation). Thus, if you need a value to be live
				687	across basic block boundaries, it <em>must</em> live in a virtual
				688	register.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	689
				690	</div>
				691
				692	<!-- _______________________________________________________________________ -->
				693	<div class="doc_subsubsection">
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	694	<a name="ssa">Machine code in SSA form</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	695	</div>
				696
				697	<div class="doc_text">
				698
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	699	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
				700	maintained in SSA-form until register allocation happens. For the most part,
				701	this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				702	become machine code PHI nodes, and virtual registers are only allowed to have
				703	a single definition.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	704
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	705	<p>After register allocation, machine code is no longer in SSA-form because
				706	there are no virtual registers left in the code.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	707
				708	</div>
				709
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	710	<!-- ======================================================================= -->
				711	<div class="doc_subsection">
				712	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
				713	</div>
				714
				715	<div class="doc_text">
				716
				717	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	718	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				719	corresponds to the LLVM code input to the instruction selector, but there can
				720	be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
				721	machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
				722	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				723	comes from.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	724
				725	</div>
				726
				727	<!-- ======================================================================= -->
				728	<div class="doc_subsection">
				729	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
				730	</div>
				731
				732	<div class="doc_text">
				733
				734	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	735	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				736	corresponds one-to-one with the LLVM function input to the instruction
				737	selector. In addition to a list of basic blocks,
				738	the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
				739	a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
				740	<tt>MachineRegisterInfo</tt>. See
				741	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	742
				743	</div>
				744
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	745
				746	<!-- *********************************************************************** -->
				747	<div class="doc_section">
				748	<a name="mc">The "MC" Layer</a>
				749	</div>
				750	<!-- *********************************************************************** -->
				751
				752	<div class="doc_text">
				753
				754	<p>
				755	The MC Layer is used to represent and process code at the raw machine code
				756	level, devoid of "high level" information like "constant pools", "jump tables",
				757	"global variables" or anything like that. At this level, LLVM handles things
				758	like label names, machine instructions, and sections in the object file. The
				759	code in this layer is used for a number of important purposes: the tail end of
				760	the code generator uses it to write a .s or .o file, and it is also used by the
				761	llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
				762	</p>
				763
				764	<p>
				765	This section describes some of the important classes. There are also a number
				766	of important subsystems that interact at this layer, they are described later
				767	in this manual.
				768	</p>
				769
				770	</div>
				771
				772
				773	<!-- ======================================================================= -->
				774	<div class="doc_subsection">
				775	<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
				776	</div>
				777
				778	<div class="doc_text">
				779
				780	<p>
				781	MCStreamer is best thought of as an assembler API. It is an abstract API which
				782	is <em>implemented</em> in different ways (e.g. to output a .s file, output an
				783	ELF .o file, etc) but whose API correspond directly to what you see in a .s
				784	file. MCStreamer has one method per directive, such as EmitLabel,
				785	EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
				786	directly correspond to assembly level directives. It also has an
				787	EmitInstruction method, which is used to output an MCInst to the streamer.
				788	</p>
				789
				790	<p>
				791	This API is most important for two clients: the llvm-mc stand-alone assembler is
				792	effectively a parser that parses a line, then invokes a method on MCStreamer. In
				793	the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
				794	generator lowers higher level LLVM IR and Machine* constructs down to the MC
				795	layer, emitting directives through MCStreamer.</p>
				796
				797	<p>
				798	On the implementation side of MCStreamer, there are two major implementations:
				799	one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
				800	file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
				801	that prints out a directive for each method (e.g. EmitValue -> .byte), but
				802	MCObjectStreamer implements a full assembler.
				803	</p>
				804
				805	</div>
				806
				807	<!-- ======================================================================= -->
				808	<div class="doc_subsection">
				809	<a name="mccontext">The <tt>MCContext</tt> class</a>
				810	</div>
				811
				812	<div class="doc_text">
				813
				814	<p>
				815	The MCContext class is the owner of a variety of uniqued data structures at the
				816	MC layer, including symbols, sections, etc. As such, this is the class that you
				817	interact with to create symbols and sections. This class can not be subclassed.
				818	</p>
				819
				820	</div>
				821
				822	<!-- ======================================================================= -->
				823	<div class="doc_subsection">
				824	<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
				825	</div>
				826
				827	<div class="doc_text">
				828
				829	<p>
				830	The MCSymbol class represents a symbol (aka label) in the assembly file. There
				831	are two interesting kinds of symbols: assembler temporary symbols, and normal
				832	symbols. Assembler temporary symbols are used and processed by the assembler
				833	but are discarded when the object file is produced. The distinction is usually
				834	represented by adding a prefix to the label, for example "L" labels are
				835	assembler temporary labels in MachO.
				836	</p>
				837
				838	<p>MCSymbols are created by MCContext and uniqued there. This means that
				839	MCSymbols can be compared for pointer equivalence to find out if they are the
				840	same symbol. Note that pointer inequality does not guarantee the labels will
				841	end up at different addresses though. It's perfectly legal to output something
				842	like this to the .s file:<p>
				843
				844	<pre>
				845	foo:
				846	bar:
				847	.byte 4
				848	</pre>
				849
				850	<p>In this case, both the foo and bar symbols will have the same address.</p>
				851
				852	</div>
				853
				854	<!-- ======================================================================= -->
				855	<div class="doc_subsection">
				856	<a name="mcsection">The <tt>MCSection</tt> class</a>
				857	</div>
				858
				859	<div class="doc_text">
				860
				861	<p>
				862	The MCSection class represents an object-file specific section. It is subclassed
				863	by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
				864	<tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
				865	by MCContext. The MCStreamer has a notion of the current section, which can be
				866	changed with the SwitchToSection method (which corresponds to a ".section"
				867	directive in a .s file).
				868	</p>
				869
				870	</div>
				871
				872	<!-- ======================================================================= -->
				873	<div class="doc_subsection">
				874	<a name="mcinst">The <tt>MCInst</tt> class</a></li>
				875	</div>
				876
				877	<div class="doc_text">
				878
				879	<p>
				880	The MCInst class is a target-independent representation of an instruction. It
				881	is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
				882	that holds a target-specific opcode and a vector of MCOperands. MCOperand, in
				883	turn, is a simple discriminated union of three cases: 1) a simple immediate,
				884	2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
				885	MCExpr.
				886	</p>
				887
				888	<p>MCInst is the common currency used to represent machine instructions at the
				889	MC layer. It is the type used by the instruction encoder, the instruction
				890	printer, and the type generated by the assembly parser and disassembler.
				891	</p>
				892
				893	</div>
				894
				895
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	896	<!-- *********************************************************************** -->
				897	<div class="doc_section">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	898	<a name="codegenalgs">Target-independent code generation algorithms</a>
				899	</div>
				900	<!-- *********************************************************************** -->
				901
				902	<div class="doc_text">
				903
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	904	<p>This section documents the phases described in the
				905	<a href="#high-level-design">high-level design of the code generator</a>.
				906	It explains how they work and some of the rationale behind their design.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	907
				908	</div>
				909
				910	<!-- ======================================================================= -->
				911	<div class="doc_subsection">
				912	<a name="instselect">Instruction Selection</a>
				913	</div>
				914
				915	<div class="doc_text">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	916
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	917	<p>Instruction Selection is the process of translating LLVM code presented to
				918	the code generator into target-specific machine instructions. There are
				919	several well-known ways to do this in the literature. LLVM uses a
				920	SelectionDAG based instruction selector.</p>
				921
				922	<p>Portions of the DAG instruction selector are generated from the target
				923	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				924	selector to be generated from these <tt>.td</tt> files, though currently
				925	there are still things that require custom C++ code.</p>
				926
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	927	</div>
				928
				929	<!-- _______________________________________________________________________ -->
				930	<div class="doc_subsubsection">
				931	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
				932	</div>
				933
				934	<div class="doc_text">
				935
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	936	<p>The SelectionDAG provides an abstraction for code representation in a way
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	937	that is amenable to instruction selection using automatic techniques
				938	(e.g. dynamic-programming based optimal pattern matching selectors). It is
				939	also well-suited to other phases of code generation; in particular,
				940	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				941	post-selection). Additionally, the SelectionDAG provides a host
				942	representation where a large variety of very-low-level (but
				943	target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
				944	performed; ones which require extensive information about the instructions
				945	efficiently supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	946
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	947	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	948	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				949	operation code (Opcode) that indicates what operation the node performs and
				950	the operands to the operation. The various operation node types are
				951	described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
				952	file.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	953
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	954	<p>Although most operations define a single value, each node in the graph may
				955	define multiple values. For example, a combined div/rem operation will
				956	define both the dividend and the remainder. Many other situations require
				957	multiple values as well. Each node also has some number of operands, which
				958	are edges to the node defining the used value. Because nodes may define
				959	multiple values, edges are represented by instances of the <tt>SDValue</tt>
				960	class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node
				961	and result value being used, respectively. Each value produced by
				962	an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
				963	indicating what the type of the value is.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	964
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	965	<p>SelectionDAGs contain two different kinds of values: those that represent
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	966	data flow and those that represent control flow dependencies. Data values
				967	are simple edges with an integer or floating point value type. Control edges
				968	are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
				969	These edges provide an ordering between nodes that have side effects (such as
				970	loads, stores, calls, returns, etc). All nodes that have side effects should
				971	take a token chain as input and produce a new one as output. By convention,
				972	token chain inputs are always operand #0, and chain results are always the
				973	last value produced by an operation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	974
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	975	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	976	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root
				977	node is the final side-effecting node in the token chain. For example, in a
				978	single basic block function it would be the return node.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	979
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	980	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	981	"illegal" DAG. A legal DAG for a target is one that only uses supported
				982	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				983	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
				984	uses a SREM or UREM operation. The
				985	<a href="#selectinodag_legalize_types">legalize types</a> and
				986	<a href="#selectiondag_legalize">legalize operations</a> phases are
				987	responsible for turning an illegal DAG into a legal DAG.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	988
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	989	</div>
				990
				991	<!-- _______________________________________________________________________ -->
				992	<div class="doc_subsubsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	993	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	994	</div>
				995
				996	<div class="doc_text">
				997
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	998	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	999
				1000	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1001	<li><a href="#selectiondag_build">Build initial DAG</a> — This stage
				1002	performs a simple translation from the input LLVM code to an illegal
				1003	SelectionDAG.</li>
				1004
				1005	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This
				1006	stage performs simple optimizations on the SelectionDAG to simplify it,
				1007	and recognize meta instructions (like rotates
				1008	and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
				1009	operations. This makes the resultant code more efficient and
				1010	the <a href="#selectiondag_select">select instructions from DAG</a> phase
				1011	(below) simpler.</li>
				1012
				1013	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
				1014	— This stage transforms SelectionDAG nodes to eliminate any types
				1015	that are unsupported on the target.</li>
				1016
				1017	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1018	SelectionDAG optimizer is run to clean up redundancies exposed by type
				1019	legalization.</li>
				1020
				1021	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> —
				1022	This stage transforms SelectionDAG nodes to eliminate any types that are
				1023	unsupported on the target.</li>
				1024
				1025	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1026	SelectionDAG optimizer is run to eliminate inefficiencies introduced by
				1027	operation legalization.</li>
				1028
				1029	<li><a href="#selectiondag_select">Select instructions from DAG</a> —
				1030	Finally, the target instruction selector matches the DAG operations to
				1031	target instructions. This process translates the target-independent input
				1032	DAG into another DAG of target instructions.</li>
				1033
				1034	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				1035	— The last phase assigns a linear order to the instructions in the
				1036	target-instruction DAG and emits them into the MachineFunction being
				1037	compiled. This step uses traditional prepass scheduling techniques.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1038	</ol>
				1039
				1040	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1041	rest of the code generation passes are run.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1042
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1043	<p>One great way to visualize what is going on here is to take advantage of a
				1044	few LLC command line options. The following options pop up a window
				1045	displaying the SelectionDAG at specific times (if you only get errors printed
				1046	to the console while using this, you probably
				1047	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
				1048	to add support for it).</p>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1049
				1050	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1051	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
				1052	before the first optimization pass.</li>
				1053
				1054	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				1055
				1056	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				1057	optimization pass.</li>
				1058
				1059	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				1060
				1061	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1062	</ul>
				1063
				1064	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1065	This graph is based on the final SelectionDAG, with nodes that must be
				1066	scheduled together bundled into a single scheduling-unit node, and with
				1067	immediate operands and other nodes that aren't relevant for scheduling
				1068	omitted.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1069
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1070	</div>
				1071
				1072	<!-- _______________________________________________________________________ -->
				1073	<div class="doc_subsubsection">
				1074	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
				1075	</div>
				1076
				1077	<div class="doc_text">
				1078
Bill Wendling	1644877	2006-08-28 03:04:05 +0000	[diff] [blame]	1079	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1080	input by the <tt>SelectionDAGLowering</tt> class in the
				1081	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of
				1082	this pass is to expose as much low-level, target-specific details to the
				1083	SelectionDAG as possible. This pass is mostly hard-coded (e.g. an
				1084	LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
				1085	<tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
				1086	requires target-specific hooks to lower calls, returns, varargs, etc. For
				1087	these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
				1088	interface is used.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1089
				1090	</div>
				1091
				1092	<!-- _______________________________________________________________________ -->
				1093	<div class="doc_subsubsection">
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1094	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
				1095	</div>
				1096
				1097	<div class="doc_text">
				1098
				1099	<p>The Legalize phase is in charge of converting a DAG to only use the types
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1100	that are natively supported by the target.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1101
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1102	<p>There are two main ways of converting values of unsupported scalar types to
				1103	values of supported types: converting small types to larger types
				1104	("promoting"), and breaking up large integer types into smaller ones
				1105	("expanding"). For example, a target might require that all f32 values are
				1106	promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same
				1107	target might require that all i64 values be expanded into pairs of i32
				1108	values. These changes can insert sign and zero extensions as needed to make
				1109	sure that the final code has the same behavior as the input.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1110
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1111	<p>There are two main ways of converting values of unsupported vector types to
				1112	value of supported types: splitting vector types, multiple times if
				1113	necessary, until a legal type is found, and extending vector types by adding
				1114	elements to the end to round them out to legal types ("widening"). If a
				1115	vector gets split all the way down to single-element parts with no supported
				1116	vector type being found, the elements are converted to scalars
				1117	("scalarizing").</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1118
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1119	<p>A target implementation tells the legalizer which types are supported (and
				1120	which register class to use for them) by calling the
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1121	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				1122
				1123	</div>
				1124
				1125	<!-- _______________________________________________________________________ -->
				1126	<div class="doc_subsubsection">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1127	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
				1128	</div>
				1129
				1130	<div class="doc_text">
				1131
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1132	<p>The Legalize phase is in charge of converting a DAG to only use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1133	operations that are natively supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1134
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1135	<p>Targets often have weird constraints, such as not supporting every operation
				1136	on every supported datatype (e.g. X86 does not support byte conditional moves
				1137	and PowerPC does not support sign-extending loads from a 16-bit memory
				1138	location). Legalize takes care of this by open-coding another sequence of
				1139	operations to emulate the operation ("expansion"), by promoting one type to a
				1140	larger type that supports the operation ("promotion"), or by using a
				1141	target-specific hook to implement the legalization ("custom").</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1142
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1143	<p>A target implementation tells the legalizer which operations are not
				1144	supported (and which of the above three actions to take) by calling the
				1145	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				1146	constructor.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1147
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1148	<p>Prior to the existence of the Legalize passes, we required that every target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1149	<a href="#selectiondag_optimize">selector</a> supported and handled every
				1150	operator and type even if they are not natively supported. The introduction
				1151	of the Legalize phases allows all of the canonicalization patterns to be
				1152	shared across targets, and makes it very easy to optimize the canonicalized
				1153	code because it is still in the form of a DAG.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1154
				1155	</div>
				1156
				1157	<!-- _______________________________________________________________________ -->
				1158	<div class="doc_subsubsection">
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1159	<a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
				1160	Combiner</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1161	</div>
				1162
				1163	<div class="doc_text">
				1164
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1165	<p>The SelectionDAG optimization phase is run multiple times for code
				1166	generation, immediately after the DAG is built and once after each
				1167	legalization. The first run of the pass allows the initial code to be
				1168	cleaned up (e.g. performing optimizations that depend on knowing that the
				1169	operators have restricted type inputs). Subsequent runs of the pass clean up
				1170	the messy code generated by the Legalize passes, which allows Legalize to be
				1171	very simple (it can focus on making code legal instead of focusing on
				1172	generating <em>good</em> and legal code).</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1173
				1174	<p>One important class of optimizations performed is optimizing inserted sign
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1175	and zero extension instructions. We currently use ad-hoc techniques, but
				1176	could move to more rigorous techniques in the future. Here are some good
				1177	papers on the subject:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1178
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1179	<p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1180	integer arithmetic</a>"<br>
				1181	Kevin Redwine and Norman Ramsey<br>
				1182	International Conference on Compiler Construction (CC) 2004</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1183
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1184	<p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1185	sign extension elimination</a>"<br>
				1186	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1187	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1188	and Implementation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1189
				1190	</div>
				1191
				1192	<!-- _______________________________________________________________________ -->
				1193	<div class="doc_subsubsection">
				1194	<a name="selectiondag_select">SelectionDAG Select Phase</a>
				1195	</div>
				1196
				1197	<div class="doc_text">
				1198
				1199	<p>The Select phase is the bulk of the target-specific code for instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1200	selection. This phase takes a legal SelectionDAG as input, pattern matches
				1201	the instructions supported by the target to this DAG, and produces a new DAG
				1202	of target code. For example, consider the following LLVM fragment:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1203
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1204	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1205	<pre>
Dan Gohman	a9445e1	2010-03-02 01:11:08 +0000	[diff] [blame]	1206	%t1 = fadd float %W, %X
				1207	%t2 = fmul float %t1, %Y
				1208	%t3 = fadd float %t2, %Z
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1209	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1210	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1211
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1212	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1213	this:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1214
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1215	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1216	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1217	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1218	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1219	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1220
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1221	<p>If a target supports floating point multiply-and-add (FMA) operations, one of
				1222	the adds can be merged with the multiply. On the PowerPC, for example, the
				1223	output of the instruction selector might look like this DAG:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1224
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1225	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1226	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1227	(FMADDS (FADDS W, X), Y, Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1228	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1229	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1230
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1231	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1232	first two operands and adds the third (as single-precision floating-point
				1233	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1234	add instruction. To perform this pattern match, the PowerPC backend includes
				1235	the following instruction definitions:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1236
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1237	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1238	<pre>
				1239	def FMADDS : AForm_1<59, 29,
				1240	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1241	"fmadds $FRT, $FRA, $FRC, $FRB",
				1242	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1243	F4RC:$FRB))</b>]>;
				1244	def FADDS : AForm_2<59, 21,
				1245	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1246	"fadds $FRT, $FRA, $FRB",
				1247	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1248	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1249	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1250
				1251	<p>The portion of the instruction definition in bold indicates the pattern used
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1252	to match the instruction. The DAG operators
				1253	(like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
Dan Gohman	6a4824c	2010-03-25 00:03:04 +0000	[diff] [blame]	1254	the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. "
				1255	<tt>F4RC</tt>" is the register class of the input and result values.</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1256
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1257	<p>The TableGen DAG instruction selector generator reads the instruction
				1258	patterns in the <tt>.td</tt> file and automatically builds parts of the
				1259	pattern matching code for your target. It has the following strengths:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1260
				1261	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1262	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1263	you if your patterns make sense or not.</li>
				1264
				1265	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1266	particular, it is straight-forward to say things like "match any immediate
				1267	that is a 13-bit sign-extended value". For examples, see the
				1268	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1269	backend.</li>
				1270
				1271	<li>It knows several important identities for the patterns defined. For
				1272	example, it knows that addition is commutative, so it allows the
				1273	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1274	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1275	to specially handle this case.</li>
				1276
				1277	<li>It has a full-featured type-inferencing system. In particular, you should
				1278	rarely have to explicitly tell the system what type parts of your patterns
				1279	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1280	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
				1281	It was able to infer and propagate this knowledge from the fact that
				1282	<tt>F4RC</tt> has type 'f32'.</li>
				1283
				1284	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1285	Pattern fragments are chunks of reusable patterns that get inlined into
				1286	your patterns during compiler-compiler time. For example, the integer
				1287	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment
				1288	that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
				1289	have a native '<tt>not</tt>' operation. Targets can define their own
				1290	short-hand fragments as they see fit. See the definition of
				1291	'<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
				1292
				1293	<li>In addition to instructions, targets can specify arbitrary patterns that
				1294	map to one or more instructions using the 'Pat' class. For example, the
				1295	PowerPC has no way to load an arbitrary integer immediate into a register
				1296	in one instruction. To tell tblgen how to do this, it defines:
				1297	<br>
				1298	<br>
				1299	<div class="doc_code">
				1300	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1301	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1302	def : Pat<(i32 imm:$imm),
				1303	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1304	</pre>
				1305	</div>
				1306	<br>
				1307	If none of the single-instruction patterns for loading an immediate into a
				1308	register match, this will be used. This rule says "match an arbitrary i32
				1309	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
				1310	an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
				1311	the left 16 bits') instruction". To make this work, the
				1312	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
				1313	the input immediate (in this case, take the high or low 16-bits of the
				1314	immediate).</li>
				1315
				1316	<li>While the system does automate a lot, it still allows you to write custom
				1317	C++ code to match special cases if there is something that is hard to
				1318	express.</li>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1319	</ul>
				1320
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1321	<p>While it has many strengths, the system currently has some limitations,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1322	primarily because it is a work in progress and is not yet finished:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1323
				1324	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1325	<li>Overall, there is no way to define or match SelectionDAG nodes that define
Dan Gohman	e370c80	2009-04-22 15:55:31 +0000	[diff] [blame]	1326	multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1327	etc). This is the biggest reason that you currently still <em>have
				1328	to</em> write custom C++ code for your instruction selector.</li>
				1329
				1330	<li>There is no great way to support matching complex addressing modes yet.
				1331	In the future, we will extend pattern fragments to allow them to define
				1332	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1333	addressing mode</a>, which are currently matched with custom C++ code).
				1334	In addition, we'll extend fragments so that a fragment can match multiple
				1335	different patterns.</li>
				1336
				1337	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1338
				1339	<li>We don't automatically generate the set of supported registers and
				1340	operations for the <a href="#selectiondag_legalize">Legalizer</a>
				1341	yet.</li>
				1342
				1343	<li>We don't have a way of tying in custom legalized nodes yet.</li>
Chris Lattner	7d6915c	2005-10-17 04:18:41 +0000	[diff] [blame]	1344	</ul>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1345
				1346	<p>Despite these limitations, the instruction selector generator is still quite
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1347	useful for most of the binary and logical operations in typical instruction
				1348	sets. If you run into any problems or can't figure out how to do something,
				1349	please let Chris know!</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1350
				1351	</div>
				1352
				1353	<!-- _______________________________________________________________________ -->
				1354	<div class="doc_subsubsection">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1355	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1356	</div>
				1357
				1358	<div class="doc_text">
				1359
				1360	<p>The scheduling phase takes the DAG of target instructions from the selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1361	phase and assigns an order. The scheduler can pick an order depending on
				1362	various constraints of the machines (i.e. order for minimal register pressure
				1363	or try to cover instruction latencies). Once an order is established, the
				1364	DAG is converted to a list
				1365	of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
				1366	destroyed.</p>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1367
Jeff Cohen	0b81cda	2005-10-24 16:54:55 +0000	[diff] [blame]	1368	<p>Note that this phase is logically separate from the instruction selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1369	phase, but is tied to it closely in the code because it operates on
				1370	SelectionDAGs.</p>
Chris Lattner	c38959f	2005-10-17 03:09:31 +0000	[diff] [blame]	1371
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1372	</div>
				1373
				1374	<!-- _______________________________________________________________________ -->
				1375	<div class="doc_subsubsection">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1376	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
				1377	</div>
				1378
				1379	<div class="doc_text">
				1380
				1381	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1382	<li>Optional function-at-a-time selection.</li>
				1383
				1384	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1385	</ol>
				1386
				1387	</div>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1388
				1389	<!-- ======================================================================= -->
				1390	<div class="doc_subsection">
				1391	<a name="ssamco">SSA-based Machine Code Optimizations</a>
				1392	</div>
				1393	<div class="doc_text"><p>To Be Written</p></div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1394
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1395	<!-- ======================================================================= -->
				1396	<div class="doc_subsection">
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1397	<a name="liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1398	</div>
				1399
				1400	<div class="doc_text">
				1401
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1402	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1403	They are used by some <a href="#regalloc">register allocator</a> passes to
				1404	determine if two or more virtual registers which require the same physical
				1405	register are live at the same point in the program (i.e., they conflict).
				1406	When this situation occurs, one virtual register must be <i>spilled</i>.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1407
				1408	</div>
				1409
				1410	<!-- _______________________________________________________________________ -->
				1411	<div class="doc_subsubsection">
				1412	<a name="livevariable_analysis">Live Variable Analysis</a>
				1413	</div>
				1414
				1415	<div class="doc_text">
				1416
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1417	<p>The first step in determining the live intervals of variables is to calculate
				1418	the set of registers that are immediately dead after the instruction (i.e.,
				1419	the instruction calculates the value, but it is never used) and the set of
				1420	registers that are used by the instruction, but are never used after the
				1421	instruction (i.e., they are killed). Live variable information is computed
				1422	for each <i>virtual</i> register and <i>register allocatable</i> physical
				1423	register in the function. This is done in a very efficient manner because it
				1424	uses SSA to sparsely compute lifetime information for virtual registers
				1425	(which are in SSA form) and only has to track physical registers within a
				1426	block. Before register allocation, LLVM can assume that physical registers
				1427	are only live within a single basic block. This allows it to do a single,
				1428	local analysis to resolve physical register lifetimes within each basic
				1429	block. If a physical register is not register allocatable (e.g., a stack
				1430	pointer or condition codes), it is not tracked.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1431
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1432	<p>Physical registers may be live in to or out of a function. Live in values are
				1433	typically arguments in registers. Live out values are typically return values
				1434	in registers. Live in values are marked as such, and are given a dummy
				1435	"defining" instruction during live intervals analysis. If the last basic
				1436	block of a function is a <tt>return</tt>, then it's marked as using all live
				1437	out values in the function.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1438
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1439	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
				1440	the live variable information from a depth first traversal of the CFG of the
				1441	function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1442	node is defined before it's used. When a <tt>PHI</tt> node is encountered,
				1443	only the definition is handled, because the uses will be handled in other
				1444	basic blocks.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1445
				1446	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1447	assignment at the end of the current basic block and traverse the successor
				1448	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1449	the <tt>PHI</tt> node's operands is coming from the current basic block, then
				1450	the variable is marked as <i>alive</i> within the current basic block and all
				1451	of its predecessor basic blocks, until the basic block with the defining
				1452	instruction is encountered.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1453
				1454	</div>
				1455
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1456	<!-- _______________________________________________________________________ -->
				1457	<div class="doc_subsubsection">
				1458	<a name="liveintervals_analysis">Live Intervals Analysis</a>
				1459	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1460
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1461	<div class="doc_text">
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1462
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1463	<p>We now have the information available to perform the live intervals analysis
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1464	and build the live intervals themselves. We start off by numbering the basic
				1465	blocks and machine instructions. We then handle the "live-in" values. These
				1466	are in physical registers, so the physical register is assumed to be killed
				1467	by the end of the basic block. Live intervals for virtual registers are
				1468	computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
				1469	live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
				1470	< N</tt>, for which a variable is live.</p>
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1471
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1472	<p><i><b>More to come...</b></i></p>
				1473
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1474	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1475
				1476	<!-- ======================================================================= -->
				1477	<div class="doc_subsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1478	<a name="regalloc">Register Allocation</a>
				1479	</div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1480
				1481	<div class="doc_text">
				1482
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1483	<p>The <i>Register Allocation problem</i> consists in mapping a program
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1484	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
				1485	to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
				1486	number of physical registers. Each target architecture has a different number
				1487	of physical registers. If the number of physical registers is not enough to
				1488	accommodate all the virtual registers, some of them will have to be mapped
				1489	into memory. These virtuals are called <i>spilled virtuals</i>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1490
				1491	</div>
				1492
				1493	<!-- _______________________________________________________________________ -->
				1494
				1495	<div class="doc_subsubsection">
				1496	<a name="regAlloc_represent">How registers are represented in LLVM</a>
				1497	</div>
				1498
				1499	<div class="doc_text">
				1500
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1501	<p>In LLVM, physical registers are denoted by integer numbers that normally
				1502	range from 1 to 1023. To see how this numbering is defined for a particular
				1503	architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
				1504	architecture. For instance, by
				1505	inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
				1506	32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
				1507	<tt>MM0</tt> is mapped to 48.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1508
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1509	<p>Some architectures contain registers that share the same physical location. A
				1510	notable example is the X86 platform. For instance, in the X86 architecture,
				1511	the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
				1512	bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
				1513	particular architecture, you can check which registers are aliased by
				1514	inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
				1515	<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
				1516	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1517
				1518	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1519	Elements in the same register class are functionally equivalent, and can be
				1520	interchangeably used. Each virtual register can only be mapped to physical
				1521	registers of a particular class. For instance, in the X86 architecture, some
				1522	virtuals can only be allocated to 8 bit registers. A register class is
				1523	described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
				1524	register is compatible with a given physical, this code can be used:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1525
				1526	<div class="doc_code">
				1527	<pre>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	1528	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1529	unsigned v_reg,
				1530	unsigned p_reg) {
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	1531	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1532	"Target register must be physical");
Chris Lattner	534bcfb	2007-12-31 04:16:08 +0000	[diff] [blame]	1533	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1534	return trc->contains(p_reg);
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1535	}
				1536	</pre>
				1537	</div>
				1538
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1539	<p>Sometimes, mostly for debugging purposes, it is useful to change the number
				1540	of physical registers available in the target architecture. This must be done
				1541	statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
				1542	for <tt>RegisterClass</tt>, the last parameter of which is a list of
				1543	registers. Just commenting some out is one simple way to avoid them being
				1544	used. A more polite way is to explicitly exclude some registers from
Dan Gohman	d2cb3d2	2009-07-24 00:30:09 +0000	[diff] [blame]	1545	the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
				1546	class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
				1547	</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1548
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1549	<p>Virtual registers are also denoted by integer numbers. Contrary to physical
				1550	registers, different virtual registers never share the same number. The
				1551	smallest virtual register is normally assigned the number 1024. This may
				1552	change, so, in order to know which is the first virtual register, you should
				1553	access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
				1554	number is greater than or equal
				1555	to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
				1556	register. Whereas physical registers are statically defined in
				1557	a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
				1558	application developer, that is not the case with virtual registers. In order
				1559	to create new virtual registers, use the
				1560	method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
				1561	will return a virtual register with the highest code.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1562
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1563	<p>Before register allocation, the operands of an instruction are mostly virtual
				1564	registers, although physical registers may also be used. In order to check if
				1565	a given machine operand is a register, use the boolean
				1566	function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
				1567	a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
				1568	or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1569	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1570	register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
				1571	register is being used by the instruction. The
				1572	method <tt>MachineOperand::isDef()</tt> informs if that registers is being
				1573	defined.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1574
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1575	<p>We will call physical registers present in the LLVM bitcode before register
				1576	allocation <i>pre-colored registers</i>. Pre-colored registers are used in
				1577	many different situations, for instance, to pass parameters of functions
				1578	calls, and to store results of particular instructions. There are two types
				1579	of pre-colored registers: the ones <i>implicitly</i> defined, and
				1580	those <i>explicitly</i> defined. Explicitly defined registers are normal
				1581	operands, and can be accessed
				1582	with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
				1583	which registers are implicitly defined by an instruction, use
				1584	the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
				1585	where <tt>opcode</tt> is the opcode of the target instruction. One important
				1586	difference between explicit and implicit physical registers is that the
				1587	latter are defined statically for each instruction, whereas the former may
				1588	vary depending on the program being compiled. For example, an instruction
				1589	that represents a function call will always implicitly define or use the same
				1590	set of physical registers. To read the registers implicitly used by an
				1591	instruction,
				1592	use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1593	registers impose constraints on any register allocation algorithm. The
Bob Wilson	0473868	2010-04-09 18:39:54 +0000	[diff] [blame]	1594	register allocator must make sure that none of them are overwritten by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1595	the values of virtual registers while still alive.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1596
				1597	</div>
				1598
				1599	<!-- _______________________________________________________________________ -->
				1600
				1601	<div class="doc_subsubsection">
				1602	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
				1603	</div>
				1604
				1605	<div class="doc_text">
				1606
				1607	<p>There are two ways to map virtual registers to physical registers (or to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1608	memory slots). The first way, that we will call <i>direct mapping</i>, is
				1609	based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
				1610	and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
				1611	mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
				1612	and stores sending and getting values to and from memory.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1613
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1614	<p>The direct mapping provides more flexibility to the developer of the register
				1615	allocator; however, it is more error prone, and demands more implementation
				1616	work. Basically, the programmer will have to specify where load and store
				1617	instructions should be inserted in the target function being compiled in
				1618	order to get and store values in memory. To assign a physical register to a
				1619	virtual register present in a given operand,
				1620	use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
Jakob Stoklund Olesen	297907f	2010-08-31 22:01:07 +0000	[diff] [blame]	1621	use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
				1622	load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1623
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1624	<p>The indirect mapping shields the application developer from the complexities
				1625	of inserting load and store instructions. In order to map a virtual register
				1626	to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
				1627	order to map a certain virtual register to memory,
				1628	use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
				1629	the stack slot where <tt>vreg</tt>'s value will be located. If it is
				1630	necessary to map another virtual register to the same stack slot,
				1631	use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
				1632	important point to consider when using the indirect mapping, is that even if
				1633	a virtual register is mapped to memory, it still needs to be mapped to a
				1634	physical register. This physical register is the location where the virtual
				1635	register is supposed to be found before being stored or after being
				1636	reloaded.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1637
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1638	<p>If the indirect strategy is used, after all the virtual registers have been
				1639	mapped to physical registers or stack slots, it is necessary to use a spiller
				1640	object to place load and store instructions in the code. Every virtual that
				1641	has been mapped to a stack slot will be stored to memory after been defined
				1642	and will be loaded before being used. The implementation of the spiller tries
				1643	to recycle load/store instructions, avoiding unnecessary instructions. For an
				1644	example of how to invoke the spiller,
				1645	see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
				1646	in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1647
				1648	</div>
				1649
				1650	<!-- _______________________________________________________________________ -->
				1651	<div class="doc_subsubsection">
				1652	<a name="regAlloc_twoAddr">Handling two address instructions</a>
				1653	</div>
				1654
				1655	<div class="doc_text">
				1656
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1657	<p>With very rare exceptions (e.g., function calls), the LLVM machine code
				1658	instructions are three address instructions. That is, each instruction is
				1659	expected to define at most one register, and to use at most two registers.
				1660	However, some architectures use two address instructions. In this case, the
				1661	defined register is also one of the used register. For instance, an
				1662	instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
				1663	to <tt>%EAX = %EAX + %EBX</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1664
				1665	<p>In order to produce correct code, LLVM must convert three address
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1666	instructions that represent two address instructions into true two address
				1667	instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
				1668	this specific purpose. It must be run before register allocation takes
				1669	place. After its execution, the resulting code may no longer be in SSA
				1670	form. This happens, for instance, in situations where an instruction such
				1671	as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1672
				1673	<div class="doc_code">
				1674	<pre>
				1675	%a = MOVE %b
Dan Gohman	03e5857	2008-06-13 17:55:57 +0000	[diff] [blame]	1676	%a = ADD %a %c
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1677	</pre>
				1678	</div>
				1679
				1680	<p>Notice that, internally, the second instruction is represented as
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1681	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
				1682	used and defined by the instruction.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1683
				1684	</div>
				1685
				1686	<!-- _______________________________________________________________________ -->
				1687	<div class="doc_subsubsection">
				1688	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
				1689	</div>
				1690
				1691	<div class="doc_text">
				1692
				1693	<p>An important transformation that happens during register allocation is called
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1694	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
				1695	that are performed on the control flow graph of programs. However,
				1696	traditional instruction sets do not implement PHI instructions. Thus, in
				1697	order to generate executable code, compilers must replace PHI instructions
				1698	with other instructions that preserve their semantics.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1699
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1700	<p>There are many ways in which PHI instructions can safely be removed from the
				1701	target code. The most traditional PHI deconstruction algorithm replaces PHI
				1702	instructions with copy instructions. That is the strategy adopted by
				1703	LLVM. The SSA deconstruction algorithm is implemented
				1704	in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
				1705	identifier <tt>PHIEliminationID</tt> must be marked as required in the code
				1706	of the register allocator.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1707
				1708	</div>
				1709
				1710	<!-- _______________________________________________________________________ -->
				1711	<div class="doc_subsubsection">
				1712	<a name="regAlloc_fold">Instruction folding</a>
				1713	</div>
				1714
				1715	<div class="doc_text">
				1716
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1717	<p><i>Instruction folding</i> is an optimization performed during register
				1718	allocation that removes unnecessary copy instructions. For instance, a
				1719	sequence of instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1720
				1721	<div class="doc_code">
				1722	<pre>
				1723	%EBX = LOAD %mem_address
				1724	%EAX = COPY %EBX
				1725	</pre>
				1726	</div>
				1727
Dan Gohman	a7ab2bf	2008-11-24 16:35:31 +0000	[diff] [blame]	1728	<p>can be safely substituted by the single instruction:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1729
				1730	<div class="doc_code">
				1731	<pre>
				1732	%EAX = LOAD %mem_address
				1733	</pre>
				1734	</div>
				1735
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1736	<p>Instructions can be folded with
				1737	the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1738	taken when folding instructions; a folded instruction can be quite different
				1739	from the original
				1740	instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
				1741	in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
				1742	use.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1743
				1744	</div>
				1745
				1746	<!-- _______________________________________________________________________ -->
				1747
				1748	<div class="doc_subsubsection">
				1749	<a name="regAlloc_builtIn">Built in register allocators</a>
				1750	</div>
				1751
				1752	<div class="doc_text">
				1753
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1754	<p>The LLVM infrastructure provides the application developer with three
				1755	different register allocators:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1756
				1757	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1758	<li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1759	well-know linear scan register allocator. Whereas the
				1760	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1761	implementation technique, the <i>Linear Scan</i> implementation
				1762	uses a spiller in order to place load and stores.</li>
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1763
				1764	<li><i>Fast</i> — This register allocator is the default for debug
				1765	builds. It allocates registers on a basic block level, attempting to keep
				1766	values in registers and reusing registers as appropriate.</li>
				1767
				1768	<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
				1769	based register allocator. This allocator works by constructing a PBQP
				1770	problem representing the register allocation problem under consideration,
				1771	solving this using a PBQP solver, and mapping the solution back to a
				1772	register assignment.</li>
				1773
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1774	</ul>
				1775
				1776	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1777	command line option <tt>-regalloc=...</tt>:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1778
				1779	<div class="doc_code">
				1780	<pre>
Dan Gohman	0cabaa5	2009-08-25 15:54:01 +0000	[diff] [blame]	1781	$ llc -regalloc=linearscan file.bc -o ln.s;
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1782	$ llc -regalloc=fast file.bc -o fa.s;
				1783	$ llc -regalloc=pbqp file.bc -o pbqp.s;
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1784	</pre>
				1785	</div>
				1786
				1787	</div>
				1788
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1789	<!-- ======================================================================= -->
				1790	<div class="doc_subsection">
				1791	<a name="proepicode">Prolog/Epilog Code Insertion</a>
				1792	</div>
				1793	<div class="doc_text"><p>To Be Written</p></div>
				1794	<!-- ======================================================================= -->
				1795	<div class="doc_subsection">
				1796	<a name="latemco">Late Machine Code Optimizations</a>
				1797	</div>
				1798	<div class="doc_text"><p>To Be Written</p></div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1799
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1800	<!-- ======================================================================= -->
				1801	<div class="doc_subsection">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1802	<a name="codeemit">Code Emission</a>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1803	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1804
				1805	<div class="doc_text">
				1806
				1807	<p>The code emission step of code generation is responsible for lowering from
				1808	the code generator abstractions (like <a
				1809	href="#machinefunction">MachineFunction</a>, <a
				1810	href="#machineinstr">MachineInstr</a>, etc) down
				1811	to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
				1812	<a href="#mcstreamer">MCStreamer</a>, etc). This is
				1813	done with a combination of several different classes: the (misnamed)
				1814	target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
				1815	(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
				1816
				1817	<p>Since the MC layer works at the level of abstraction of object files, it
				1818	doesn't have a notion of functions, global variables etc. Instead, it thinks
				1819	about labels, directives, and instructions. A key class used at this time is
				1820	the MCStreamer class. This is an abstract API that is implemented in different
				1821	ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
				1822	an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
				1823	EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
				1824	level directives.
				1825	</p>
				1826
				1827	<p>If you are interested in implementing a code generator for a target, there
				1828	are three important things that you have to implement for your target:</p>
				1829
				1830	<ol>
				1831	<li>First, you need a subclass of AsmPrinter for your target. This class
				1832	implements the general lowering process converting MachineFunction's into MC
				1833	label constructs. The AsmPrinter base class provides a number of useful methods
				1834	and routines, and also allows you to override the lowering process in some
				1835	important ways. You should get much of the lowering for free if you are
				1836	implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
				1837	class implements much of the common logic.</li>
				1838
				1839	<li>Second, you need to implement an instruction printer for your target. The
				1840	instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
				1841	raw_ostream as text. Most of this is automatically generated from the .td file
				1842	(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
				1843	instructions), but you need to implement routines to print operands.</li>
				1844
				1845	<li>Third, you need to implement code that lowers a <a
				1846	href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
				1847	"<target>MCInstLower.cpp". This lowering process is often target
				1848	specific, and is responsible for turning jump table entries, constant pool
				1849	indices, global variable addresses, etc into MCLabels as appropriate. This
				1850	translation layer is also responsible for expanding pseudo ops used by the code
				1851	generator into the actual machine instructions they correspond to. The MCInsts
				1852	that are generated by this are fed into the instruction printer or the encoder.
				1853	</li>
				1854
				1855	</ol>
				1856
				1857	<p>Finally, at your choosing, you can also implement an subclass of
				1858	MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
				1859	This is important if you want to support direct .o file emission, or would like
				1860	to implement an assembler for your target.</p>
				1861
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1862	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1863
				1864
				1865	<!-- ======================================================================= -->
				1866	<div class="doc_section">
				1867	<a name="nativeassembler">Implementing a Native Assembler</a>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1868	</div>
				1869
				1870	<div class="doc_text">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1871
				1872	<p>TODO</p>
				1873
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1874	</div>
				1875
				1876
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1877	<!-- *********************************************************************** -->
				1878	<div class="doc_section">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1879	<a name="targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1880	</div>
				1881	<!-- *********************************************************************** -->
				1882
				1883	<div class="doc_text">
				1884
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1885	<p>This section of the document explains features or design decisions that are
				1886	specific to the code generator for a particular target.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1887
				1888	</div>
				1889
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1890	<!-- ======================================================================= -->
				1891	<div class="doc_subsection">
				1892	<a name="tailcallopt">Tail call optimization</a>
				1893	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1894
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1895	<div class="doc_text">
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1896
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1897	<p>Tail call optimization, callee reusing the stack of the caller, is currently
				1898	supported on x86/x86-64 and PowerPC. It is performed if:</p>
				1899
				1900	<ul>
Chris Lattner	2968943	2010-03-11 00:22:57 +0000	[diff] [blame]	1901	<li>Caller and callee have the calling convention <tt>fastcc</tt> or
				1902	<tt>cc 10</tt> (GHC call convention).</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1903
				1904	<li>The call is a tail call - in tail position (ret immediately follows call
				1905	and ret uses value of call or is void).</li>
				1906
				1907	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				1908
				1909	<li>Platform specific constraints are met.</li>
				1910	</ul>
				1911
				1912	<p>x86/x86-64 constraints:</p>
				1913
				1914	<ul>
				1915	<li>No variable argument lists are used.</li>
				1916
				1917	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
				1918	= hidden or protected) are supported.</li>
				1919	</ul>
				1920
				1921	<p>PowerPC constraints:</p>
				1922
				1923	<ul>
				1924	<li>No variable argument lists are used.</li>
				1925
				1926	<li>No byval parameters are used.</li>
				1927
				1928	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				1929	</ul>
				1930
				1931	<p>Example:</p>
				1932
				1933	<p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
				1934
				1935	<div class="doc_code">
				1936	<pre>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1937	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				1938
				1939	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				1940	%l1 = add i32 %in1, %in2
				1941	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				1942	ret i32 %tmp
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1943	}
				1944	</pre>
				1945	</div>
				1946
				1947	<p>Implications of <tt>-tailcallopt</tt>:</p>
				1948
				1949	<p>To support tail call optimization in situations where the callee has more
				1950	arguments than the caller a 'callee pops arguments' convention is used. This
				1951	currently causes each <tt>fastcc</tt> call that is not tail call optimized
				1952	(because one or more of above constraints are not met) to be followed by a
				1953	readjustment of the stack. So performance might be worse in such cases.</p>
				1954
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1955	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1956	<!-- ======================================================================= -->
				1957	<div class="doc_subsection">
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	1958	<a name="sibcallopt">Sibling call optimization</a>
				1959	</div>
				1960
				1961	<div class="doc_text">
				1962
				1963	<p>Sibling call optimization is a restricted form of tail call optimization.
				1964	Unlike tail call optimization described in the previous section, it can be
				1965	performed automatically on any tail calls when <tt>-tailcallopt</tt> option
				1966	is not specified.</p>
				1967
				1968	<p>Sibling call optimization is currently performed on x86/x86-64 when the
				1969	following constraints are met:</p>
				1970
				1971	<ul>
				1972	<li>Caller and callee have the same calling convention. It can be either
				1973	<tt>c</tt> or <tt>fastcc</tt>.
				1974
				1975	<li>The call is a tail call - in tail position (ret immediately follows call
				1976	and ret uses value of call or is void).</li>
				1977
				1978	<li>Caller and callee have matching return type or the callee result is not
				1979	used.
				1980
				1981	<li>If any of the callee arguments are being passed in stack, they must be
				1982	available in caller's own incoming argument stack and the frame offsets
				1983	must be the same.
				1984	</ul>
				1985
				1986	<p>Example:</p>
				1987	<div class="doc_code">
				1988	<pre>
				1989	declare i32 @bar(i32, i32)
				1990
				1991	define i32 @foo(i32 %a, i32 %b, i32 %c) {
				1992	entry:
				1993	%0 = tail call i32 @bar(i32 %a, i32 %b)
				1994	ret i32 %0
				1995	}
				1996	</pre>
				1997	</div>
				1998
				1999	</div>
				2000	<!-- ======================================================================= -->
				2001	<div class="doc_subsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2002	<a name="x86">The X86 backend</a>
				2003	</div>
				2004
				2005	<div class="doc_text">
				2006
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2007	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2008	code generator is capable of targeting a variety of x86-32 and x86-64
				2009	processors, and includes support for ISA extensions such as MMX and SSE.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2010
				2011	</div>
				2012
				2013	<!-- _______________________________________________________________________ -->
				2014	<div class="doc_subsubsection">
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2015	<a name="x86_tt">X86 Target Triples supported</a>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2016	</div>
				2017
				2018	<div class="doc_text">
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2019
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2020	<p>The following are the known target triples that are supported by the X86
				2021	backend. This is not an exhaustive list, and it would be useful to add those
				2022	that people test.</p>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2023
				2024	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2025	<li><b>i686-pc-linux-gnu</b> — Linux</li>
				2026
				2027	<li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li>
				2028
				2029	<li><b>i686-pc-cygwin</b> — Cygwin on Win32</li>
				2030
				2031	<li><b>i686-pc-mingw32</b> — MingW on Win32</li>
				2032
				2033	<li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li>
				2034
				2035	<li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li>
Torok Edwin	c457b65	2009-06-15 12:17:44 +0000	[diff] [blame]	2036
				2037	<li><b>x86_64-unknown-linux-gnu</b> — Linux</li>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2038	</ul>
				2039
				2040	</div>
				2041
				2042	<!-- _______________________________________________________________________ -->
				2043	<div class="doc_subsubsection">
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2044	<a name="x86_cc">X86 Calling Conventions supported</a>
				2045	</div>
				2046
				2047
				2048	<div class="doc_text">
				2049
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2050	<p>The following target-specific calling conventions are known to backend:</p>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2051
				2052	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2053	<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
				2054	Windows platform (CC ID = 64).</li>
				2055
				2056	<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
				2057	Windows platform (CC ID = 65).</li>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2058	</ul>
				2059
				2060	</div>
				2061
				2062	<!-- _______________________________________________________________________ -->
				2063	<div class="doc_subsubsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2064	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
				2065	</div>
				2066
				2067	<div class="doc_text">
				2068
Misha Brukman	600df45	2005-02-17 22:22:24 +0000	[diff] [blame]	2069	<p>The x86 has a very flexible way of accessing memory. It is capable of
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2070	forming memory addresses of the following expression directly in integer
				2071	instructions (which use ModR/M addressing):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2072
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2073	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2074	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2075	SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2076	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2077	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2078
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2079	<p>In order to represent this, LLVM tracks no less than 5 operands for each
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2080	memory operand of this form. This means that the "load" form of
				2081	'<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2082
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2083	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2084	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2085	Index: 0 \| 1 2 3 4 5
				2086	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement Segment
				2087	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2088	</pre>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2089	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2090
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2091	<p>Stores, and all other instructions, treat the four memory operands in the
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2092	same way and in the same order. If the segment register is unspecified
				2093	(regno = 0), then no segment override is generated. "Lea" operations do not
				2094	have a segment register specified, so they only have 4 operands for their
				2095	memory reference.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2096
				2097	</div>
				2098
				2099	<!-- _______________________________________________________________________ -->
				2100	<div class="doc_subsubsection">
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2101	<a name="x86_memory">X86 address spaces supported</a>
				2102	</div>
				2103
				2104	<div class="doc_text">
				2105
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2106	<p>x86 has an experimental feature which provides
				2107	the ability to perform loads and stores to different address spaces
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2108	via the x86 segment registers. A segment override prefix byte on an
				2109	instruction causes the instruction's memory access to go to the specified
				2110	segment. LLVM address space 0 is the default address space, which includes
				2111	the stack, and any unqualified memory accesses in a program. Address spaces
				2112	1-255 are currently reserved for user-defined code. The GS-segment is
Chris Lattner	1777d0c	2009-05-05 18:52:19 +0000	[diff] [blame]	2113	represented by address space 256, while the FS-segment is represented by
				2114	address space 257. Other x86 segments have yet to be allocated address space
				2115	numbers.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2116
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2117	<p>While these address spaces may seem similar to TLS via the
				2118	<tt>thread_local</tt> keyword, and often use the same underlying hardware,
				2119	there are some fundamental differences.</p>
				2120
				2121	<p>The <tt>thread_local</tt> keyword applies to global variables and
				2122	specifies that they are to be allocated in thread-local memory. There are
				2123	no type qualifiers involved, and these variables can be pointed to with
				2124	normal pointers and accessed with normal loads and stores.
				2125	The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
				2126	level (though LLVM doesn't yet have implementations of it for some
				2127	configurations).<p>
				2128
				2129	<p>Special address spaces, in contrast, apply to static types. Every
				2130	load and store has a particular address space in its address operand type,
				2131	and this is what determines which address space is accessed.
				2132	LLVM ignores these special address space qualifiers on global variables,
				2133	and does not provide a way to directly allocate storage in them.
				2134	At the LLVM IR level, the behavior of these special address spaces depends
				2135	in part on the underlying OS or runtime environment, and they are specific
				2136	to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
				2137
				2138	<p>Some operating systems and runtime environments use (or may in the future
				2139	use) the FS/GS-segment registers for various low-level purposes, so care
				2140	should be taken when considering them.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2141
				2142	</div>
				2143
				2144	<!-- _______________________________________________________________________ -->
				2145	<div class="doc_subsubsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2146	<a name="x86_names">Instruction naming</a>
				2147	</div>
				2148
				2149	<div class="doc_text">
				2150
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2151	<p>An instruction name consists of the base name, a default operand size, and a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2152	a character per operand with an optional special size. For example:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2153
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2154	<div class="doc_code">
				2155	<pre>
				2156	ADD8rr -> add, 8-bit register, 8-bit register
				2157	IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
				2158	IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
				2159	MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
				2160	</pre>
				2161	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2162
				2163	</div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2164
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2165	<!-- ======================================================================= -->
				2166	<div class="doc_subsection">
				2167	<a name="ppc">The PowerPC backend</a>
				2168	</div>
				2169
				2170	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2171
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2172	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2173	code generation is retargetable to several variations or <i>subtargets</i> of
				2174	the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
				2175
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2176	</div>
				2177
				2178	<!-- _______________________________________________________________________ -->
				2179	<div class="doc_subsubsection">
				2180	<a name="ppc_abi">LLVM PowerPC ABI</a>
				2181	</div>
				2182
				2183	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2184
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2185	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2186	relative (PIC) or static addressing for accessing global values, so no TOC
				2187	(r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
				2188	of a stack frame. LLVM takes advantage of having no TOC to provide space to
				2189	save the frame pointer in the PowerPC linkage area of the caller frame.
				2190	Other details of PowerPC ABI can be found at <a href=
				2191	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				2192	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
				2193	is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
				2194	for system use.</p>
				2195
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2196	</div>
				2197
				2198	<!-- _______________________________________________________________________ -->
				2199	<div class="doc_subsubsection">
				2200	<a name="ppc_frame">Frame Layout</a>
				2201	</div>
				2202
				2203	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2204
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2205	<p>The size of a PowerPC frame is usually fixed for the duration of a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2206	function's invocation. Since the frame is fixed size, all references
				2207	into the frame can be accessed via fixed offsets from the stack pointer. The
				2208	exception to this is when dynamic alloca or variable sized arrays are
				2209	present, then a base pointer (r31) is used as a proxy for the stack pointer
				2210	and stack pointer is free to grow or shrink. A base pointer is also used if
				2211	llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
				2212	always aligned to 16 bytes, so that space allocated for altivec vectors will
				2213	be properly aligned.</p>
				2214
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2215	<p>An invocation frame is laid out as follows (low memory at top);</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2216
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2217	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2218	<tr>
				2219	<td>Linkage<br><br></td>
				2220	</tr>
				2221	<tr>
				2222	<td>Parameter area<br><br></td>
				2223	</tr>
				2224	<tr>
				2225	<td>Dynamic area<br><br></td>
				2226	</tr>
				2227	<tr>
				2228	<td>Locals area<br><br></td>
				2229	</tr>
				2230	<tr>
				2231	<td>Saved registers area<br><br></td>
				2232	</tr>
				2233	<tr style="border-style: none hidden none hidden;">
				2234	<td><br></td>
				2235	</tr>
				2236	<tr>
				2237	<td>Previous Frame<br><br></td>
				2238	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2239	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2240
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2241	<p>The <i>linkage</i> area is used by a callee to save special registers prior
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2242	to allocating its own frame. Only three entries are relevant to LLVM. The
				2243	first entry is the previous stack pointer (sp), aka link. This allows
				2244	probing tools like gdb or exception handlers to quickly scan the frames in
				2245	the stack. A function epilog can also use the link to pop the frame from the
				2246	stack. The third entry in the linkage area is used to save the return
				2247	address from the lr register. Finally, as mentioned above, the last entry is
				2248	used to save the previous frame pointer (r31.) The entries in the linkage
				2249	area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
				2250	mode and 48 bytes in 64 bit mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2251
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2252	<p>32 bit linkage area</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2253
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2254	<table class="layout">
				2255	<tr>
				2256	<td>0</td>
				2257	<td>Saved SP (r1)</td>
				2258	</tr>
				2259	<tr>
				2260	<td>4</td>
				2261	<td>Saved CR</td>
				2262	</tr>
				2263	<tr>
				2264	<td>8</td>
				2265	<td>Saved LR</td>
				2266	</tr>
				2267	<tr>
				2268	<td>12</td>
				2269	<td>Reserved</td>
				2270	</tr>
				2271	<tr>
				2272	<td>16</td>
				2273	<td>Reserved</td>
				2274	</tr>
				2275	<tr>
				2276	<td>20</td>
				2277	<td>Saved FP (r31)</td>
				2278	</tr>
				2279	</table>
				2280
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2281	<p>64 bit linkage area</p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2282
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2283	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2284	<tr>
				2285	<td>0</td>
				2286	<td>Saved SP (r1)</td>
				2287	</tr>
				2288	<tr>
				2289	<td>8</td>
				2290	<td>Saved CR</td>
				2291	</tr>
				2292	<tr>
				2293	<td>16</td>
				2294	<td>Saved LR</td>
				2295	</tr>
				2296	<tr>
				2297	<td>24</td>
				2298	<td>Reserved</td>
				2299	</tr>
				2300	<tr>
				2301	<td>32</td>
				2302	<td>Reserved</td>
				2303	</tr>
				2304	<tr>
				2305	<td>40</td>
				2306	<td>Saved FP (r31)</td>
				2307	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2308	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2309
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2310	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2311	function. Following the PowerPC ABI, the first few arguments are actually
				2312	passed in registers, with the space in the parameter area unused. However,
				2313	if there are not enough registers or the callee is a thunk or vararg
				2314	function, these register arguments can be spilled into the parameter area.
				2315	Thus, the parameter area must be large enough to store all the parameters for
				2316	the largest call sequence made by the caller. The size must also be
				2317	minimally large enough to spill registers r3-r10. This allows callees blind
				2318	to the call signature, such as thunks and vararg functions, enough space to
				2319	cache the argument registers. Therefore, the parameter area is minimally 32
				2320	bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
				2321	a fixed offset from the top of the frame, that a callee can access its spilt
				2322	arguments using fixed offsets from the stack pointer (or base pointer.)</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2323
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2324	<p>Combining the information about the linkage, parameter areas and alignment. A
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2325	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				2326	mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2327
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2328	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2329	alloca then space is added to the stack, the linkage and parameter areas are
				2330	shifted to top of stack, and the new space is available immediately below the
				2331	linkage and parameter areas. The cost of shifting the linkage and parameter
				2332	areas is minor since only the link value needs to be copied. The link value
				2333	can be easily fetched by adding the original frame size to the base pointer.
				2334	Note that allocations in the dynamic space need to observe 16 byte
				2335	alignment.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2336
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2337	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2338	variables.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2339
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2340	<p>The <i>saved registers area</i> is where the llvm compiler spills callee
				2341	saved registers on entry to the callee.</p>
				2342
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2343	</div>
				2344
				2345	<!-- _______________________________________________________________________ -->
				2346	<div class="doc_subsubsection">
				2347	<a name="ppc_prolog">Prolog/Epilog</a>
				2348	</div>
				2349
				2350	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2351
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2352	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2353	the following exceptions. Callee saved registers are spilled after the frame
				2354	is created. This allows the llvm epilog/prolog support to be common with
				2355	other targets. The base pointer callee saved register r31 is saved in the
				2356	TOC slot of linkage area. This simplifies allocation of space for the base
				2357	pointer and makes it convenient to locate programatically and during
				2358	debugging.</p>
				2359
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2360	</div>
				2361
				2362	<!-- _______________________________________________________________________ -->
				2363	<div class="doc_subsubsection">
				2364	<a name="ppc_dynamic">Dynamic Allocation</a>
				2365	</div>
				2366
				2367	<div class="doc_text">
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2368
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2369	<p><i>TODO - More to come.</i></p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2370
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2371	</div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2372
				2373
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2374	<!-- *********************************************************************** -->
				2375	<hr>
				2376	<address>
				2377	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	2378	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2379	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	f00ddb0	2008-12-11 18:23:24 +0000	[diff] [blame]	2380	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2381
				2382	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer	05fe4b0	2006-03-14 05:39:39 +0000	[diff] [blame]	2383	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2384	Last modified: $Date$
				2385	</address>
				2386
				2387	</body>
				2388	</html>