Blame - docs/CodeGenerator.html - fp2-dev/platform/external/llvm

blob: de6a5c1ff4d8771ccfae876b0ccd7fceddd45373 [file] [log] [blame]

Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
				2	"http://www.w3.org/TR/html4/strict.dtd">
				3	<html>
				4	<head>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	5	<meta http-equiv="content-type" content="text/html; charset=utf-8">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	6	<title>The LLVM Target-Independent Code Generator</title>
				7	<link rel="stylesheet" href="llvm.css" type="text/css">
				8	</head>
				9	<body>
				10
				11	<div class="doc_title">
				12	The LLVM Target-Independent Code Generator
				13	</div>
				14
				15	<ol>
				16	<li><a href="#introduction">Introduction</a>
				17	<ul>
				18	<li><a href="#required">Required components in the code generator</a></li>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	19	<li><a href="#high-level-design">The high-level design of the code
				20	generator</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	21	<li><a href="#tablegen">Using TableGen for target description</a></li>
				22	</ul>
				23	</li>
				24	<li><a href="#targetdesc">Target description classes</a>
				25	<ul>
				26	<li><a href="#targetmachine">The <tt>TargetMachine</tt> class</a></li>
				27	<li><a href="#targetdata">The <tt>TargetData</tt> class</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	28	<li><a href="#targetlowering">The <tt>TargetLowering</tt> class</a></li>
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	29	<li><a href="#targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	30	<li><a href="#targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a></li>
				31	<li><a href="#targetframeinfo">The <tt>TargetFrameInfo</tt> class</a></li>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	32	<li><a href="#targetsubtarget">The <tt>TargetSubtarget</tt> class</a></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	33	<li><a href="#targetjitinfo">The <tt>TargetJITInfo</tt> class</a></li>
				34	</ul>
				35	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	36	<li><a href="#codegendesc">The "Machine" Code Generator classes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	37	<ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	38	<li><a href="#machineinstr">The <tt>MachineInstr</tt> class</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	39	<li><a href="#machinebasicblock">The <tt>MachineBasicBlock</tt>
				40	class</a></li>
				41	<li><a href="#machinefunction">The <tt>MachineFunction</tt> class</a></li>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	42	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	43	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	44	<li><a href="#mc">The "MC" Layer</a>
				45	<ul>
				46	<li><a href="#mcstreamer">The <tt>MCStreamer</tt> API</a></li>
				47	<li><a href="#mccontext">The <tt>MCContext</tt> class</a>
				48	<li><a href="#mcsymbol">The <tt>MCSymbol</tt> class</a></li>
				49	<li><a href="#mcsection">The <tt>MCSection</tt> class</a></li>
				50	<li><a href="#mcinst">The <tt>MCInst</tt> class</a></li>
				51	</ul>
				52	</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	53	<li><a href="#codegenalgs">Target-independent code generation algorithms</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	54	<ul>
				55	<li><a href="#instselect">Instruction Selection</a>
				56	<ul>
				57	<li><a href="#selectiondag_intro">Introduction to SelectionDAGs</a></li>
				58	<li><a href="#selectiondag_process">SelectionDAG Code Generation
				59	Process</a></li>
				60	<li><a href="#selectiondag_build">Initial SelectionDAG
				61	Construction</a></li>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	62	<li><a href="#selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	63	<li><a href="#selectiondag_legalize">SelectionDAG Legalize Phase</a></li>
				64	<li><a href="#selectiondag_optimize">SelectionDAG Optimization
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	65	Phase: the DAG Combiner</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	66	<li><a href="#selectiondag_select">SelectionDAG Select Phase</a></li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	67	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	68	Phase</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	69	<li><a href="#selectiondag_future">Future directions for the
				70	SelectionDAG</a></li>
				71	</ul></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	72	<li><a href="#liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	73	<ul>
				74	<li><a href="#livevariable_analysis">Live Variable Analysis</a></li>
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	75	<li><a href="#liveintervals_analysis">Live Intervals Analysis</a></li>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	76	</ul></li>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	77	<li><a href="#regalloc">Register Allocation</a>
				78	<ul>
				79	<li><a href="#regAlloc_represent">How registers are represented in
				80	LLVM</a></li>
				81	<li><a href="#regAlloc_howTo">Mapping virtual registers to physical
				82	registers</a></li>
				83	<li><a href="#regAlloc_twoAddr">Handling two address instructions</a></li>
				84	<li><a href="#regAlloc_ssaDecon">The SSA deconstruction phase</a></li>
				85	<li><a href="#regAlloc_fold">Instruction folding</a></li>
				86	<li><a href="#regAlloc_builtIn">Built in register allocators</a></li>
				87	</ul></li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	88	<li><a href="#codeemit">Code Emission</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	89	</ul>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	90	</li>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	91	<li><a href="#nativeassembler">Implementing a Native Assembler</a></li>
				92
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	93	<li><a href="#targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	94	<ul>
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	95	<li><a href="#targetfeatures">Target Feature Matrix</a></li>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	96	<li><a href="#tailcallopt">Tail call optimization</a></li>
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	97	<li><a href="#sibcallopt">Sibling call optimization</a></li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	98	<li><a href="#x86">The X86 backend</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	99	<li><a href="#ppc">The PowerPC backend</a>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	100	<ul>
				101	<li><a href="#ppc_abi">LLVM PowerPC ABI</a></li>
				102	<li><a href="#ppc_frame">Frame Layout</a></li>
				103	<li><a href="#ppc_prolog">Prolog/Epilog</a></li>
				104	<li><a href="#ppc_dynamic">Dynamic Allocation</a></li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	105	</ul></li>
				106	</ul></li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	107
				108	</ol>
				109
				110	<div class="doc_author">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	111	<p>Written by the LLVM Team.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	112	</div>
				113
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	114	<div class="doc_warning">
				115	<p>Warning: This is a work in progress.</p>
				116	</div>
				117
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	118	<!-- *********************************************************************** -->
				119	<div class="doc_section">
				120	<a name="introduction">Introduction</a>
				121	</div>
				122	<!-- *********************************************************************** -->
				123
				124	<div class="doc_text">
				125
				126	<p>The LLVM target-independent code generator is a framework that provides a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	127	suite of reusable components for translating the LLVM internal representation
				128	to the machine code for a specified target—either in assembly form
				129	(suitable for a static compiler) or in binary machine code format (usable for
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	130	a JIT compiler). The LLVM target-independent code generator consists of six
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	131	main components:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	132
				133	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	134	<li><a href="#targetdesc">Abstract target description</a> interfaces which
				135	capture important properties about various aspects of the machine,
				136	independently of how they will be used. These interfaces are defined in
				137	<tt>include/llvm/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	138
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	139	<li>Classes used to represent the <a href="#codegendesc">code being
				140	generated</a> for a target. These classes are intended to be abstract
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	141	enough to represent the machine code for <i>any</i> target machine. These
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	142	classes are defined in <tt>include/llvm/CodeGen/</tt>. At this level,
				143	concepts like "constant pool entries" and "jump tables" are explicitly
				144	exposed.</li>
				145
				146	<li>Classes and algorithms used to represent code as the object file level,
				147	the <a href="#mc">MC Layer</a>. These classes represent assembly level
				148	constructs like labels, sections, and instructions. At this level,
				149	concepts like "constant pool entries" and "jump tables" don't exist.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	150
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	151	<li><a href="#codegenalgs">Target-independent algorithms</a> used to implement
				152	various phases of native code generation (register allocation, scheduling,
				153	stack frame representation, etc). This code lives
				154	in <tt>lib/CodeGen/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	155
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	156	<li><a href="#targetimpls">Implementations of the abstract target description
				157	interfaces</a> for particular targets. These machine descriptions make
				158	use of the components provided by LLVM, and can optionally provide custom
				159	target-specific passes, to build complete code generators for a specific
				160	target. Target descriptions live in <tt>lib/Target/</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	161
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	162	<li><a href="#jit">The target-independent JIT components</a>. The LLVM JIT is
				163	completely target independent (it uses the <tt>TargetJITInfo</tt>
				164	structure to interface for target-specific issues. The code for the
				165	target-independent JIT lives in <tt>lib/ExecutionEngine/JIT</tt>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	166	</ol>
				167
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	168	<p>Depending on which part of the code generator you are interested in working
				169	on, different pieces of this will be useful to you. In any case, you should
				170	be familiar with the <a href="#targetdesc">target description</a>
				171	and <a href="#codegendesc">machine code representation</a> classes. If you
				172	want to add a backend for a new target, you will need
				173	to <a href="#targetimpls">implement the target description</a> classes for
				174	your new target and understand the <a href="LangRef.html">LLVM code
				175	representation</a>. If you are interested in implementing a
				176	new <a href="#codegenalgs">code generation algorithm</a>, it should only
				177	depend on the target-description and machine code representation classes,
				178	ensuring that it is portable.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	179
				180	</div>
				181
				182	<!-- ======================================================================= -->
				183	<div class="doc_subsection">
				184	<a name="required">Required components in the code generator</a>
				185	</div>
				186
				187	<div class="doc_text">
				188
				189	<p>The two pieces of the LLVM code generator are the high-level interface to the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	190	code generator and the set of reusable components that can be used to build
				191	target-specific backends. The two most important interfaces
				192	(<a href="#targetmachine"><tt>TargetMachine</tt></a>
				193	and <a href="#targetdata"><tt>TargetData</tt></a>) are the only ones that are
				194	required to be defined for a backend to fit into the LLVM system, but the
				195	others must be defined if the reusable code generator components are going to
				196	be used.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	197
				198	<p>This design has two important implications. The first is that LLVM can
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	199	support completely non-traditional code generation targets. For example, the
				200	C backend does not require register allocation, instruction selection, or any
				201	of the other standard components provided by the system. As such, it only
				202	implements these two interfaces, and does its own thing. Another example of
				203	a code generator like this is a (purely hypothetical) backend that converts
				204	LLVM to the GCC RTL form and uses GCC to emit machine code for a target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	205
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	206	<p>This design also implies that it is possible to design and implement
				207	radically different code generators in the LLVM system that do not make use
				208	of any of the built-in components. Doing so is not recommended at all, but
				209	could be required for radically different targets that do not fit into the
				210	LLVM machine description model: FPGAs for example.</p>
Chris Lattner	900bf8c	2004-06-02 07:06:06 +0000	[diff] [blame]	211
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	212	</div>
				213
				214	<!-- ======================================================================= -->
				215	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	216	<a name="high-level-design">The high-level design of the code generator</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	217	</div>
				218
				219	<div class="doc_text">
				220
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	221	<p>The LLVM target-independent code generator is designed to support efficient
				222	and quality code generation for standard register-based microprocessors.
				223	Code generation in this model is divided into the following stages:</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	224
				225	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	226	<li><b><a href="#instselect">Instruction Selection</a></b> — This phase
				227	determines an efficient way to express the input LLVM code in the target
				228	instruction set. This stage produces the initial code for the program in
				229	the target instruction set, then makes use of virtual registers in SSA
				230	form and physical registers that represent any required register
				231	assignments due to target constraints or calling conventions. This step
				232	turns the LLVM code into a DAG of target instructions.</li>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	233
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	234	<li><b><a href="#selectiondag_sched">Scheduling and Formation</a></b> —
				235	This phase takes the DAG of target instructions produced by the
				236	instruction selection phase, determines an ordering of the instructions,
				237	then emits the instructions
				238	as <tt><a href="#machineinstr">MachineInstr</a></tt>s with that ordering.
				239	Note that we describe this in the <a href="#instselect">instruction
				240	selection section</a> because it operates on
				241	a <a href="#selectiondag_intro">SelectionDAG</a>.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	242
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	243	<li><b><a href="#ssamco">SSA-based Machine Code Optimizations</a></b> —
				244	This optional stage consists of a series of machine-code optimizations
				245	that operate on the SSA-form produced by the instruction selector.
				246	Optimizations like modulo-scheduling or peephole optimization work
				247	here.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	248
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	249	<li><b><a href="#regalloc">Register Allocation</a></b> — The target code
				250	is transformed from an infinite virtual register file in SSA form to the
				251	concrete register file used by the target. This phase introduces spill
				252	code and eliminates all virtual register references from the program.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	253
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	254	<li><b><a href="#proepicode">Prolog/Epilog Code Insertion</a></b> — Once
				255	the machine code has been generated for the function and the amount of
				256	stack space required is known (used for LLVM alloca's and spill slots),
				257	the prolog and epilog code for the function can be inserted and "abstract
				258	stack location references" can be eliminated. This stage is responsible
				259	for implementing optimizations like frame-pointer elimination and stack
				260	packing.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	261
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	262	<li><b><a href="#latemco">Late Machine Code Optimizations</a></b> —
				263	Optimizations that operate on "final" machine code can go here, such as
				264	spill code scheduling and peephole optimizations.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	265
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	266	<li><b><a href="#codeemit">Code Emission</a></b> — The final stage
				267	actually puts out the code for the current function, either in the target
				268	assembler format or in machine code.</li>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	269	</ol>
				270
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	271	<p>The code generator is based on the assumption that the instruction selector
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	272	will use an optimal pattern matching selector to create high-quality
				273	sequences of native instructions. Alternative code generator designs based
				274	on pattern expansion and aggressive iterative peephole optimization are much
				275	slower. This design permits efficient compilation (important for JIT
				276	environments) and aggressive optimization (used when generating code offline)
				277	by allowing components of varying levels of sophistication to be used for any
				278	step of compilation.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	279
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	280	<p>In addition to these stages, target implementations can insert arbitrary
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	281	target-specific passes into the flow. For example, the X86 target uses a
				282	special pass to handle the 80x87 floating point stack architecture. Other
				283	targets with unusual requirements can be supported with custom passes as
				284	needed.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	285
				286	</div>
				287
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	288	<!-- ======================================================================= -->
				289	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	290	<a name="tablegen">Using TableGen for target description</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	291	</div>
				292
				293	<div class="doc_text">
				294
Chris Lattner	5489e93	2004-06-01 18:35:00 +0000	[diff] [blame]	295	<p>The target description classes require a detailed description of the target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	296	architecture. These target descriptions often have a large amount of common
				297	information (e.g., an <tt>add</tt> instruction is almost identical to a
				298	<tt>sub</tt> instruction). In order to allow the maximum amount of
				299	commonality to be factored out, the LLVM code generator uses
				300	the <a href="TableGenFundamentals.html">TableGen</a> tool to describe big
				301	chunks of the target machine, which allows the use of domain-specific and
				302	target-specific abstractions to reduce the amount of repetition.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	303
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	304	<p>As LLVM continues to be developed and refined, we plan to move more and more
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	305	of the target description to the <tt>.td</tt> form. Doing so gives us a
				306	number of advantages. The most important is that it makes it easier to port
				307	LLVM because it reduces the amount of C++ code that has to be written, and
				308	the surface area of the code generator that needs to be understood before
				309	someone can get something working. Second, it makes it easier to change
				310	things. In particular, if tables and other things are all emitted
				311	by <tt>tblgen</tt>, we only need a change in one place (<tt>tblgen</tt>) to
				312	update all of the targets to a new interface.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	313
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	314	</div>
				315
				316	<!-- *********************************************************************** -->
				317	<div class="doc_section">
				318	<a name="targetdesc">Target description classes</a>
				319	</div>
				320	<!-- *********************************************************************** -->
				321
				322	<div class="doc_text">
				323
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	324	<p>The LLVM target description classes (located in the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	325	<tt>include/llvm/Target</tt> directory) provide an abstract description of
				326	the target machine independent of any particular client. These classes are
				327	designed to capture the <i>abstract</i> properties of the target (such as the
				328	instructions and registers it has), and do not incorporate any particular
				329	pieces of code generation algorithms.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	330
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	331	<p>All of the target description classes (except the
				332	<tt><a href="#targetdata">TargetData</a></tt> class) are designed to be
				333	subclassed by the concrete target implementation, and have virtual methods
				334	implemented. To get to these implementations, the
				335	<tt><a href="#targetmachine">TargetMachine</a></tt> class provides accessors
				336	that should be implemented by the target.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	337
				338	</div>
				339
				340	<!-- ======================================================================= -->
				341	<div class="doc_subsection">
				342	<a name="targetmachine">The <tt>TargetMachine</tt> class</a>
				343	</div>
				344
				345	<div class="doc_text">
				346
				347	<p>The <tt>TargetMachine</tt> class provides virtual methods that are used to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	348	access the target-specific implementations of the various target description
				349	classes via the <tt>get*Info</tt> methods (<tt>getInstrInfo</tt>,
				350	<tt>getRegisterInfo</tt>, <tt>getFrameInfo</tt>, etc.). This class is
				351	designed to be specialized by a concrete target implementation
				352	(e.g., <tt>X86TargetMachine</tt>) which implements the various virtual
				353	methods. The only required target description class is
				354	the <a href="#targetdata"><tt>TargetData</tt></a> class, but if the code
				355	generator components are to be used, the other interfaces should be
				356	implemented as well.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	357
				358	</div>
				359
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	360	<!-- ======================================================================= -->
				361	<div class="doc_subsection">
				362	<a name="targetdata">The <tt>TargetData</tt> class</a>
				363	</div>
				364
				365	<div class="doc_text">
				366
				367	<p>The <tt>TargetData</tt> class is the only required target description class,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	368	and it is the only class that is not extensible (you cannot derived a new
				369	class from it). <tt>TargetData</tt> specifies information about how the
				370	target lays out memory for structures, the alignment requirements for various
				371	data types, the size of pointers in the target, and whether the target is
				372	little-endian or big-endian.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	373
				374	</div>
				375
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	376	<!-- ======================================================================= -->
				377	<div class="doc_subsection">
				378	<a name="targetlowering">The <tt>TargetLowering</tt> class</a>
				379	</div>
				380
				381	<div class="doc_text">
				382
				383	<p>The <tt>TargetLowering</tt> class is used by SelectionDAG based instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	384	selectors primarily to describe how LLVM code should be lowered to
				385	SelectionDAG operations. Among other things, this class indicates:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	386
				387	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	388	<li>an initial register class to use for various <tt>ValueType</tt>s,</li>
				389
				390	<li>which operations are natively supported by the target machine,</li>
				391
				392	<li>the return type of <tt>setcc</tt> operations,</li>
				393
				394	<li>the type to use for shift amounts, and</li>
				395
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	396	<li>various high-level characteristics, like whether it is profitable to turn
				397	division by a constant into a multiplication sequence</li>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	398	</ul>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	399
				400	</div>
				401
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	402	<!-- ======================================================================= -->
				403	<div class="doc_subsection">
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	404	<a name="targetregisterinfo">The <tt>TargetRegisterInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	405	</div>
				406
				407	<div class="doc_text">
				408
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	409	<p>The <tt>TargetRegisterInfo</tt> class is used to describe the register file
				410	of the target and any interactions between the registers.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	411
				412	<p>Registers in the code generator are represented in the code generator by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	413	unsigned integers. Physical registers (those that actually exist in the
				414	target description) are unique small numbers, and virtual registers are
				415	generally large. Note that register #0 is reserved as a flag value.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	416
				417	<p>Each register in the processor description has an associated
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	418	<tt>TargetRegisterDesc</tt> entry, which provides a textual name for the
				419	register (used for assembly output and debugging dumps) and a set of aliases
				420	(used to indicate whether one register overlaps with another).</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	421
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	422	<p>In addition to the per-register description, the <tt>TargetRegisterInfo</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	423	class exposes a set of processor specific register classes (instances of the
				424	<tt>TargetRegisterClass</tt> class). Each register class contains sets of
				425	registers that have the same properties (for example, they are all 32-bit
				426	integer registers). Each SSA virtual register created by the instruction
				427	selector has an associated register class. When the register allocator runs,
				428	it replaces virtual registers with a physical register in the set.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	429
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	430	<p>The target-specific implementations of these classes is auto-generated from
				431	a <a href="TableGenFundamentals.html">TableGen</a> description of the
				432	register file.</p>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	433
				434	</div>
				435
				436	<!-- ======================================================================= -->
				437	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	438	<a name="targetinstrinfo">The <tt>TargetInstrInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	439	</div>
				440
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	441	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	442
				443	<p>The <tt>TargetInstrInfo</tt> class is used to describe the machine
				444	instructions supported by the target. It is essentially an array of
				445	<tt>TargetInstrDescriptor</tt> objects, each of which describes one
				446	instruction the target supports. Descriptors define things like the mnemonic
				447	for the opcode, the number of operands, the list of implicit register uses
				448	and defs, whether the instruction has certain target-independent properties
				449	(accesses memory, is commutable, etc), and holds any target-specific
				450	flags.</p>
				451
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	452	</div>
				453
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	454	<!-- ======================================================================= -->
				455	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	456	<a name="targetframeinfo">The <tt>TargetFrameInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	457	</div>
				458
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	459	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	460
				461	<p>The <tt>TargetFrameInfo</tt> class is used to provide information about the
				462	stack frame layout of the target. It holds the direction of stack growth, the
				463	known stack alignment on entry to each function, and the offset to the local
				464	area. The offset to the local area is the offset from the stack pointer on
				465	function entry to the first location where function data (local variables,
				466	spill locations) can be stored.</p>
				467
Reid Spencer	627cd00	2005-07-19 01:36:35 +0000	[diff] [blame]	468	</div>
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	469
				470	<!-- ======================================================================= -->
				471	<div class="doc_subsection">
				472	<a name="targetsubtarget">The <tt>TargetSubtarget</tt> class</a>
				473	</div>
				474
				475	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	476
				477	<p>The <tt>TargetSubtarget</tt> class is used to provide information about the
				478	specific chip set being targeted. A sub-target informs code generation of
				479	which instructions are supported, instruction latencies and instruction
				480	execution itinerary; i.e., which processing units are used, in what order,
				481	and for how long.</p>
				482
Chris Lattner	47adebb	2005-10-16 17:06:07 +0000	[diff] [blame]	483	</div>
				484
				485
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	486	<!-- ======================================================================= -->
				487	<div class="doc_subsection">
Chris Lattner	10d6800	2004-06-01 17:18:11 +0000	[diff] [blame]	488	<a name="targetjitinfo">The <tt>TargetJITInfo</tt> class</a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	489	</div>
				490
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	491	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	492
				493	<p>The <tt>TargetJITInfo</tt> class exposes an abstract interface used by the
				494	Just-In-Time code generator to perform target-specific activities, such as
				495	emitting stubs. If a <tt>TargetMachine</tt> supports JIT code generation, it
				496	should provide one of these objects through the <tt>getJITInfo</tt>
				497	method.</p>
				498
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	499	</div>
				500
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	501	<!-- *********************************************************************** -->
				502	<div class="doc_section">
				503	<a name="codegendesc">Machine code description classes</a>
				504	</div>
				505	<!-- *********************************************************************** -->
				506
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	507	<div class="doc_text">
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	508
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	509	<p>At the high-level, LLVM code is translated to a machine specific
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	510	representation formed out of
				511	<a href="#machinefunction"><tt>MachineFunction</tt></a>,
				512	<a href="#machinebasicblock"><tt>MachineBasicBlock</tt></a>,
				513	and <a href="#machineinstr"><tt>MachineInstr</tt></a> instances (defined
				514	in <tt>include/llvm/CodeGen</tt>). This representation is completely target
				515	agnostic, representing instructions in their most abstract form: an opcode
				516	and a series of operands. This representation is designed to support both an
				517	SSA representation for machine code, as well as a register allocated, non-SSA
				518	form.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	519
				520	</div>
				521
				522	<!-- ======================================================================= -->
				523	<div class="doc_subsection">
				524	<a name="machineinstr">The <tt>MachineInstr</tt> class</a>
				525	</div>
				526
				527	<div class="doc_text">
				528
				529	<p>Target machine instructions are represented as instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	530	<tt>MachineInstr</tt> class. This class is an extremely abstract way of
				531	representing machine instructions. In particular, it only keeps track of an
				532	opcode number and a set of operands.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	533
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	534	<p>The opcode number is a simple unsigned integer that only has meaning to a
				535	specific backend. All of the instructions for a target should be defined in
				536	the <tt>*InstrInfo.td</tt> file for the target. The opcode enum values are
				537	auto-generated from this description. The <tt>MachineInstr</tt> class does
				538	not have any information about how to interpret the instruction (i.e., what
				539	the semantics of the instruction are); for that you must refer to the
				540	<tt><a href="#targetinstrinfo">TargetInstrInfo</a></tt> class.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	541
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	542	<p>The operands of a machine instruction can be of several different types: a
				543	register reference, a constant integer, a basic block reference, etc. In
				544	addition, a machine operand should be marked as a def or a use of the value
				545	(though only registers are allowed to be defs).</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	546
				547	<p>By convention, the LLVM code generator orders instruction operands so that
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	548	all register definitions come before the register uses, even on architectures
				549	that are normally printed in other orders. For example, the SPARC add
				550	instruction: "<tt>add %i1, %i2, %i3</tt>" adds the "%i1", and "%i2" registers
				551	and stores the result into the "%i3" register. In the LLVM code generator,
				552	the operands should be stored as "<tt>%i3, %i1, %i2</tt>": with the
				553	destination first.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	554
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	555	<p>Keeping destination (definition) operands at the beginning of the operand
				556	list has several advantages. In particular, the debugging printer will print
				557	the instruction like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	558
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	559	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	560	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	561	%r3 = add %i1, %i2
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	562	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	563	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	564
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	565	<p>Also if the first operand is a def, it is easier to <a href="#buildmi">create
				566	instructions</a> whose only def is the first operand.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	567
				568	</div>
				569
				570	<!-- _______________________________________________________________________ -->
				571	<div class="doc_subsubsection">
				572	<a name="buildmi">Using the <tt>MachineInstrBuilder.h</tt> functions</a>
				573	</div>
				574
				575	<div class="doc_text">
				576
				577	<p>Machine instructions are created by using the <tt>BuildMI</tt> functions,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	578	located in the <tt>include/llvm/CodeGen/MachineInstrBuilder.h</tt> file. The
				579	<tt>BuildMI</tt> functions make it easy to build arbitrary machine
				580	instructions. Usage of the <tt>BuildMI</tt> functions look like this:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	581
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	582	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	583	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	584	// Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42')
				585	// instruction. The '1' specifies how many operands will be added.
				586	MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	587
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	588	// Create the same instr, but insert it at the end of a basic block.
				589	MachineBasicBlock &MBB = ...
				590	BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	591
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	592	// Create the same instr, but insert it before a specified iterator point.
				593	MachineBasicBlock::iterator MBBI = ...
				594	BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	595
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	596	// Create a 'cmp Reg, 0' instruction, no destination reg.
				597	MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0);
				598	// Create an 'sahf' instruction which takes no operands and stores nothing.
				599	MI = BuildMI(X86::SAHF, 0);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	600
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	601	// Create a self looping branch instruction.
				602	BuildMI(MBB, X86::JNE, 1).addMBB(&MBB);
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	603	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	604	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	605
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	606	<p>The key thing to remember with the <tt>BuildMI</tt> functions is that you
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	607	have to specify the number of operands that the machine instruction will
				608	take. This allows for efficient memory allocation. You also need to specify
				609	if operands default to be uses of values, not definitions. If you need to
				610	add a definition operand (other than the optional destination register), you
				611	must explicitly mark it as such:</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	612
				613	<div class="doc_code">
				614	<pre>
Bill Wendling	587daed	2009-05-13 21:33:08 +0000	[diff] [blame]	615	MI.addReg(Reg, RegState::Define);
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	616	</pre>
				617	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	618
				619	</div>
				620
				621	<!-- _______________________________________________________________________ -->
				622	<div class="doc_subsubsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	623	<a name="fixedregs">Fixed (preassigned) registers</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	624	</div>
				625
				626	<div class="doc_text">
				627
				628	<p>One important issue that the code generator needs to be aware of is the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	629	presence of fixed registers. In particular, there are often places in the
				630	instruction stream where the register allocator <em>must</em> arrange for a
				631	particular value to be in a particular register. This can occur due to
				632	limitations of the instruction set (e.g., the X86 can only do a 32-bit divide
				633	with the <tt>EAX</tt>/<tt>EDX</tt> registers), or external factors like
				634	calling conventions. In any case, the instruction selector should emit code
				635	that copies a virtual register into or out of a physical register when
				636	needed.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	637
				638	<p>For example, consider this simple LLVM example:</p>
				639
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	640	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	641	<pre>
Matthijs Kooijman	61399af	2008-06-04 15:46:35 +0000	[diff] [blame]	642	define i32 @test(i32 %X, i32 %Y) {
				643	%Z = udiv i32 %X, %Y
				644	ret i32 %Z
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	645	}
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	646	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	647	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	648
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	649	<p>The X86 instruction selector produces this machine code for the <tt>div</tt>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	650	and <tt>ret</tt> (use "<tt>llc X.bc -march=x86 -print-machineinstrs</tt>" to
				651	get this):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	652
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	653	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	654	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	655	;; Start of div
				656	%EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX
				657	%reg1027 = sar %reg1024, 31
				658	%EDX = mov %reg1027 ;; Sign extend X into EDX
				659	idiv %reg1025 ;; Divide by Y (in reg1025)
				660	%reg1026 = mov %EAX ;; Read the result (Z) out of EAX
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	661
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	662	;; Start of ret
				663	%EAX = mov %reg1026 ;; 32-bit return value goes in EAX
				664	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	665	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	666	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	667
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	668	<p>By the end of code generation, the register allocator has coalesced the
				669	registers and deleted the resultant identity moves producing the following
				670	code:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	671
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	672	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	673	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	674	;; X is in EAX, Y is in ECX
				675	mov %EAX, %EDX
				676	sar %EDX, 31
				677	idiv %ECX
				678	ret
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	679	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	680	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	681
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	682	<p>This approach is extremely general (if it can handle the X86 architecture, it
				683	can handle anything!) and allows all of the target specific knowledge about
				684	the instruction stream to be isolated in the instruction selector. Note that
				685	physical registers should have a short lifetime for good code generation, and
				686	all physical registers are assumed dead on entry to and exit from basic
				687	blocks (before register allocation). Thus, if you need a value to be live
				688	across basic block boundaries, it <em>must</em> live in a virtual
				689	register.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	690
				691	</div>
				692
				693	<!-- _______________________________________________________________________ -->
				694	<div class="doc_subsubsection">
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	695	<a name="ssa">Machine code in SSA form</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	696	</div>
				697
				698	<div class="doc_text">
				699
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	700	<p><tt>MachineInstr</tt>'s are initially selected in SSA-form, and are
				701	maintained in SSA-form until register allocation happens. For the most part,
				702	this is trivially simple since LLVM is already in SSA form; LLVM PHI nodes
				703	become machine code PHI nodes, and virtual registers are only allowed to have
				704	a single definition.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	705
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	706	<p>After register allocation, machine code is no longer in SSA-form because
				707	there are no virtual registers left in the code.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	708
				709	</div>
				710
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	711	<!-- ======================================================================= -->
				712	<div class="doc_subsection">
				713	<a name="machinebasicblock">The <tt>MachineBasicBlock</tt> class</a>
				714	</div>
				715
				716	<div class="doc_text">
				717
				718	<p>The <tt>MachineBasicBlock</tt> class contains a list of machine instructions
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	719	(<tt><a href="#machineinstr">MachineInstr</a></tt> instances). It roughly
				720	corresponds to the LLVM code input to the instruction selector, but there can
				721	be a one-to-many mapping (i.e. one LLVM basic block can map to multiple
				722	machine basic blocks). The <tt>MachineBasicBlock</tt> class has a
				723	"<tt>getBasicBlock</tt>" method, which returns the LLVM basic block that it
				724	comes from.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	725
				726	</div>
				727
				728	<!-- ======================================================================= -->
				729	<div class="doc_subsection">
				730	<a name="machinefunction">The <tt>MachineFunction</tt> class</a>
				731	</div>
				732
				733	<div class="doc_text">
				734
				735	<p>The <tt>MachineFunction</tt> class contains a list of machine basic blocks
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	736	(<tt><a href="#machinebasicblock">MachineBasicBlock</a></tt> instances). It
				737	corresponds one-to-one with the LLVM function input to the instruction
				738	selector. In addition to a list of basic blocks,
				739	the <tt>MachineFunction</tt> contains a a <tt>MachineConstantPool</tt>,
				740	a <tt>MachineFrameInfo</tt>, a <tt>MachineFunctionInfo</tt>, and a
				741	<tt>MachineRegisterInfo</tt>. See
				742	<tt>include/llvm/CodeGen/MachineFunction.h</tt> for more information.</p>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	743
				744	</div>
				745
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	746
				747	<!-- *********************************************************************** -->
				748	<div class="doc_section">
				749	<a name="mc">The "MC" Layer</a>
				750	</div>
				751	<!-- *********************************************************************** -->
				752
				753	<div class="doc_text">
				754
				755	<p>
				756	The MC Layer is used to represent and process code at the raw machine code
				757	level, devoid of "high level" information like "constant pools", "jump tables",
				758	"global variables" or anything like that. At this level, LLVM handles things
				759	like label names, machine instructions, and sections in the object file. The
				760	code in this layer is used for a number of important purposes: the tail end of
				761	the code generator uses it to write a .s or .o file, and it is also used by the
				762	llvm-mc tool to implement standalone machine codeassemblers and disassemblers.
				763	</p>
				764
				765	<p>
				766	This section describes some of the important classes. There are also a number
				767	of important subsystems that interact at this layer, they are described later
				768	in this manual.
				769	</p>
				770
				771	</div>
				772
				773
				774	<!-- ======================================================================= -->
				775	<div class="doc_subsection">
				776	<a name="mcstreamer">The <tt>MCStreamer</tt> API</a>
				777	</div>
				778
				779	<div class="doc_text">
				780
				781	<p>
				782	MCStreamer is best thought of as an assembler API. It is an abstract API which
				783	is <em>implemented</em> in different ways (e.g. to output a .s file, output an
				784	ELF .o file, etc) but whose API correspond directly to what you see in a .s
				785	file. MCStreamer has one method per directive, such as EmitLabel,
				786	EmitSymbolAttribute, SwitchSection, EmitValue (for .byte, .word), etc, which
				787	directly correspond to assembly level directives. It also has an
				788	EmitInstruction method, which is used to output an MCInst to the streamer.
				789	</p>
				790
				791	<p>
				792	This API is most important for two clients: the llvm-mc stand-alone assembler is
				793	effectively a parser that parses a line, then invokes a method on MCStreamer. In
				794	the code generator, the <a href="#codeemit">Code Emission</a> phase of the code
				795	generator lowers higher level LLVM IR and Machine* constructs down to the MC
				796	layer, emitting directives through MCStreamer.</p>
				797
				798	<p>
				799	On the implementation side of MCStreamer, there are two major implementations:
				800	one for writing out a .s file (MCAsmStreamer), and one for writing out a .o
				801	file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation
				802	that prints out a directive for each method (e.g. EmitValue -> .byte), but
				803	MCObjectStreamer implements a full assembler.
				804	</p>
				805
				806	</div>
				807
				808	<!-- ======================================================================= -->
				809	<div class="doc_subsection">
				810	<a name="mccontext">The <tt>MCContext</tt> class</a>
				811	</div>
				812
				813	<div class="doc_text">
				814
				815	<p>
				816	The MCContext class is the owner of a variety of uniqued data structures at the
				817	MC layer, including symbols, sections, etc. As such, this is the class that you
				818	interact with to create symbols and sections. This class can not be subclassed.
				819	</p>
				820
				821	</div>
				822
				823	<!-- ======================================================================= -->
				824	<div class="doc_subsection">
				825	<a name="mcsymbol">The <tt>MCSymbol</tt> class</a>
				826	</div>
				827
				828	<div class="doc_text">
				829
				830	<p>
				831	The MCSymbol class represents a symbol (aka label) in the assembly file. There
				832	are two interesting kinds of symbols: assembler temporary symbols, and normal
				833	symbols. Assembler temporary symbols are used and processed by the assembler
				834	but are discarded when the object file is produced. The distinction is usually
				835	represented by adding a prefix to the label, for example "L" labels are
				836	assembler temporary labels in MachO.
				837	</p>
				838
				839	<p>MCSymbols are created by MCContext and uniqued there. This means that
				840	MCSymbols can be compared for pointer equivalence to find out if they are the
				841	same symbol. Note that pointer inequality does not guarantee the labels will
				842	end up at different addresses though. It's perfectly legal to output something
				843	like this to the .s file:<p>
				844
				845	<pre>
				846	foo:
				847	bar:
				848	.byte 4
				849	</pre>
				850
				851	<p>In this case, both the foo and bar symbols will have the same address.</p>
				852
				853	</div>
				854
				855	<!-- ======================================================================= -->
				856	<div class="doc_subsection">
				857	<a name="mcsection">The <tt>MCSection</tt> class</a>
				858	</div>
				859
				860	<div class="doc_text">
				861
				862	<p>
				863	The MCSection class represents an object-file specific section. It is subclassed
				864	by object file specific implementations (e.g. <tt>MCSectionMachO</tt>,
				865	<tt>MCSectionCOFF</tt>, <tt>MCSectionELF</tt>) and these are created and uniqued
				866	by MCContext. The MCStreamer has a notion of the current section, which can be
				867	changed with the SwitchToSection method (which corresponds to a ".section"
				868	directive in a .s file).
				869	</p>
				870
				871	</div>
				872
				873	<!-- ======================================================================= -->
				874	<div class="doc_subsection">
				875	<a name="mcinst">The <tt>MCInst</tt> class</a></li>
				876	</div>
				877
				878	<div class="doc_text">
				879
				880	<p>
				881	The MCInst class is a target-independent representation of an instruction. It
				882	is a simple class (much more so than <a href="#machineinstr">MachineInstr</a>)
				883	that holds a target-specific opcode and a vector of MCOperands. MCOperand, in
				884	turn, is a simple discriminated union of three cases: 1) a simple immediate,
				885	2) a target register ID, 3) a symbolic expression (e.g. "Lfoo-Lbar+42") as an
				886	MCExpr.
				887	</p>
				888
				889	<p>MCInst is the common currency used to represent machine instructions at the
				890	MC layer. It is the type used by the instruction encoder, the instruction
				891	printer, and the type generated by the assembly parser and disassembler.
				892	</p>
				893
				894	</div>
				895
				896
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	897	<!-- *********************************************************************** -->
				898	<div class="doc_section">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	899	<a name="codegenalgs">Target-independent code generation algorithms</a>
				900	</div>
				901	<!-- *********************************************************************** -->
				902
				903	<div class="doc_text">
				904
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	905	<p>This section documents the phases described in the
				906	<a href="#high-level-design">high-level design of the code generator</a>.
				907	It explains how they work and some of the rationale behind their design.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	908
				909	</div>
				910
				911	<!-- ======================================================================= -->
				912	<div class="doc_subsection">
				913	<a name="instselect">Instruction Selection</a>
				914	</div>
				915
				916	<div class="doc_text">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	917
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	918	<p>Instruction Selection is the process of translating LLVM code presented to
				919	the code generator into target-specific machine instructions. There are
				920	several well-known ways to do this in the literature. LLVM uses a
				921	SelectionDAG based instruction selector.</p>
				922
				923	<p>Portions of the DAG instruction selector are generated from the target
				924	description (<tt>*.td</tt>) files. Our goal is for the entire instruction
				925	selector to be generated from these <tt>.td</tt> files, though currently
				926	there are still things that require custom C++ code.</p>
				927
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	928	</div>
				929
				930	<!-- _______________________________________________________________________ -->
				931	<div class="doc_subsubsection">
				932	<a name="selectiondag_intro">Introduction to SelectionDAGs</a>
				933	</div>
				934
				935	<div class="doc_text">
				936
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	937	<p>The SelectionDAG provides an abstraction for code representation in a way
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	938	that is amenable to instruction selection using automatic techniques
				939	(e.g. dynamic-programming based optimal pattern matching selectors). It is
				940	also well-suited to other phases of code generation; in particular,
				941	instruction scheduling (SelectionDAG's are very close to scheduling DAGs
				942	post-selection). Additionally, the SelectionDAG provides a host
				943	representation where a large variety of very-low-level (but
				944	target-independent) <a href="#selectiondag_optimize">optimizations</a> may be
				945	performed; ones which require extensive information about the instructions
				946	efficiently supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	947
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	948	<p>The SelectionDAG is a Directed-Acyclic-Graph whose nodes are instances of the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	949	<tt>SDNode</tt> class. The primary payload of the <tt>SDNode</tt> is its
				950	operation code (Opcode) that indicates what operation the node performs and
				951	the operands to the operation. The various operation node types are
				952	described at the top of the <tt>include/llvm/CodeGen/SelectionDAGNodes.h</tt>
				953	file.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	954
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	955	<p>Although most operations define a single value, each node in the graph may
				956	define multiple values. For example, a combined div/rem operation will
				957	define both the dividend and the remainder. Many other situations require
				958	multiple values as well. Each node also has some number of operands, which
				959	are edges to the node defining the used value. Because nodes may define
				960	multiple values, edges are represented by instances of the <tt>SDValue</tt>
				961	class, which is a <tt><SDNode, unsigned></tt> pair, indicating the node
				962	and result value being used, respectively. Each value produced by
				963	an <tt>SDNode</tt> has an associated <tt>MVT</tt> (Machine Value Type)
				964	indicating what the type of the value is.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	965
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	966	<p>SelectionDAGs contain two different kinds of values: those that represent
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	967	data flow and those that represent control flow dependencies. Data values
				968	are simple edges with an integer or floating point value type. Control edges
				969	are represented as "chain" edges which are of type <tt>MVT::Other</tt>.
				970	These edges provide an ordering between nodes that have side effects (such as
				971	loads, stores, calls, returns, etc). All nodes that have side effects should
				972	take a token chain as input and produce a new one as output. By convention,
				973	token chain inputs are always operand #0, and chain results are always the
				974	last value produced by an operation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	975
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	976	<p>A SelectionDAG has designated "Entry" and "Root" nodes. The Entry node is
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	977	always a marker node with an Opcode of <tt>ISD::EntryToken</tt>. The Root
				978	node is the final side-effecting node in the token chain. For example, in a
				979	single basic block function it would be the return node.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	980
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	981	<p>One important concept for SelectionDAGs is the notion of a "legal" vs.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	982	"illegal" DAG. A legal DAG for a target is one that only uses supported
				983	operations and supported types. On a 32-bit PowerPC, for example, a DAG with
				984	a value of type i1, i8, i16, or i64 would be illegal, as would a DAG that
				985	uses a SREM or UREM operation. The
				986	<a href="#selectinodag_legalize_types">legalize types</a> and
				987	<a href="#selectiondag_legalize">legalize operations</a> phases are
				988	responsible for turning an illegal DAG into a legal DAG.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	989
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	990	</div>
				991
				992	<!-- _______________________________________________________________________ -->
				993	<div class="doc_subsubsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	994	<a name="selectiondag_process">SelectionDAG Instruction Selection Process</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	995	</div>
				996
				997	<div class="doc_text">
				998
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	999	<p>SelectionDAG-based instruction selection consists of the following steps:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1000
				1001	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1002	<li><a href="#selectiondag_build">Build initial DAG</a> — This stage
				1003	performs a simple translation from the input LLVM code to an illegal
				1004	SelectionDAG.</li>
				1005
				1006	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — This
				1007	stage performs simple optimizations on the SelectionDAG to simplify it,
				1008	and recognize meta instructions (like rotates
				1009	and <tt>div</tt>/<tt>rem</tt> pairs) for targets that support these meta
				1010	operations. This makes the resultant code more efficient and
				1011	the <a href="#selectiondag_select">select instructions from DAG</a> phase
				1012	(below) simpler.</li>
				1013
				1014	<li><a href="#selectiondag_legalize_types">Legalize SelectionDAG Types</a>
				1015	— This stage transforms SelectionDAG nodes to eliminate any types
				1016	that are unsupported on the target.</li>
				1017
				1018	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1019	SelectionDAG optimizer is run to clean up redundancies exposed by type
				1020	legalization.</li>
				1021
				1022	<li><a href="#selectiondag_legalize">Legalize SelectionDAG Types</a> —
				1023	This stage transforms SelectionDAG nodes to eliminate any types that are
				1024	unsupported on the target.</li>
				1025
				1026	<li><a href="#selectiondag_optimize">Optimize SelectionDAG</a> — The
				1027	SelectionDAG optimizer is run to eliminate inefficiencies introduced by
				1028	operation legalization.</li>
				1029
				1030	<li><a href="#selectiondag_select">Select instructions from DAG</a> —
				1031	Finally, the target instruction selector matches the DAG operations to
				1032	target instructions. This process translates the target-independent input
				1033	DAG into another DAG of target instructions.</li>
				1034
				1035	<li><a href="#selectiondag_sched">SelectionDAG Scheduling and Formation</a>
				1036	— The last phase assigns a linear order to the instructions in the
				1037	target-instruction DAG and emits them into the MachineFunction being
				1038	compiled. This step uses traditional prepass scheduling techniques.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1039	</ol>
				1040
				1041	<p>After all of these steps are complete, the SelectionDAG is destroyed and the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1042	rest of the code generation passes are run.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1043
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1044	<p>One great way to visualize what is going on here is to take advantage of a
				1045	few LLC command line options. The following options pop up a window
				1046	displaying the SelectionDAG at specific times (if you only get errors printed
				1047	to the console while using this, you probably
				1048	<a href="ProgrammersManual.html#ViewGraph">need to configure your system</a>
				1049	to add support for it).</p>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1050
				1051	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1052	<li><tt>-view-dag-combine1-dags</tt> displays the DAG after being built,
				1053	before the first optimization pass.</li>
				1054
				1055	<li><tt>-view-legalize-dags</tt> displays the DAG before Legalization.</li>
				1056
				1057	<li><tt>-view-dag-combine2-dags</tt> displays the DAG before the second
				1058	optimization pass.</li>
				1059
				1060	<li><tt>-view-isel-dags</tt> displays the DAG before the Select phase.</li>
				1061
				1062	<li><tt>-view-sched-dags</tt> displays the DAG before Scheduling.</li>
Dan Gohman	8c9c55f	2008-09-10 22:23:41 +0000	[diff] [blame]	1063	</ul>
				1064
				1065	<p>The <tt>-view-sunit-dags</tt> displays the Scheduler's dependency graph.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1066	This graph is based on the final SelectionDAG, with nodes that must be
				1067	scheduled together bundled into a single scheduling-unit node, and with
				1068	immediate operands and other nodes that aren't relevant for scheduling
				1069	omitted.</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1070
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1071	</div>
				1072
				1073	<!-- _______________________________________________________________________ -->
				1074	<div class="doc_subsubsection">
				1075	<a name="selectiondag_build">Initial SelectionDAG Construction</a>
				1076	</div>
				1077
				1078	<div class="doc_text">
				1079
Bill Wendling	1644877	2006-08-28 03:04:05 +0000	[diff] [blame]	1080	<p>The initial SelectionDAG is naïvely peephole expanded from the LLVM
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1081	input by the <tt>SelectionDAGLowering</tt> class in the
				1082	<tt>lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp</tt> file. The intent of
				1083	this pass is to expose as much low-level, target-specific details to the
				1084	SelectionDAG as possible. This pass is mostly hard-coded (e.g. an
				1085	LLVM <tt>add</tt> turns into an <tt>SDNode add</tt> while a
				1086	<tt>getelementptr</tt> is expanded into the obvious arithmetic). This pass
				1087	requires target-specific hooks to lower calls, returns, varargs, etc. For
				1088	these features, the <tt><a href="#targetlowering">TargetLowering</a></tt>
				1089	interface is used.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1090
				1091	</div>
				1092
				1093	<!-- _______________________________________________________________________ -->
				1094	<div class="doc_subsubsection">
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1095	<a name="selectiondag_legalize_types">SelectionDAG LegalizeTypes Phase</a>
				1096	</div>
				1097
				1098	<div class="doc_text">
				1099
				1100	<p>The Legalize phase is in charge of converting a DAG to only use the types
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1101	that are natively supported by the target.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1102
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1103	<p>There are two main ways of converting values of unsupported scalar types to
				1104	values of supported types: converting small types to larger types
				1105	("promoting"), and breaking up large integer types into smaller ones
				1106	("expanding"). For example, a target might require that all f32 values are
				1107	promoted to f64 and that all i1/i8/i16 values are promoted to i32. The same
				1108	target might require that all i64 values be expanded into pairs of i32
				1109	values. These changes can insert sign and zero extensions as needed to make
				1110	sure that the final code has the same behavior as the input.</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1111
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1112	<p>There are two main ways of converting values of unsupported vector types to
				1113	value of supported types: splitting vector types, multiple times if
				1114	necessary, until a legal type is found, and extending vector types by adding
				1115	elements to the end to round them out to legal types ("widening"). If a
				1116	vector gets split all the way down to single-element parts with no supported
				1117	vector type being found, the elements are converted to scalars
				1118	("scalarizing").</p>
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1119
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1120	<p>A target implementation tells the legalizer which types are supported (and
				1121	which register class to use for them) by calling the
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1122	<tt>addRegisterClass</tt> method in its TargetLowering constructor.</p>
				1123
				1124	</div>
				1125
				1126	<!-- _______________________________________________________________________ -->
				1127	<div class="doc_subsubsection">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1128	<a name="selectiondag_legalize">SelectionDAG Legalize Phase</a>
				1129	</div>
				1130
				1131	<div class="doc_text">
				1132
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1133	<p>The Legalize phase is in charge of converting a DAG to only use the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1134	operations that are natively supported by the target.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1135
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1136	<p>Targets often have weird constraints, such as not supporting every operation
				1137	on every supported datatype (e.g. X86 does not support byte conditional moves
				1138	and PowerPC does not support sign-extending loads from a 16-bit memory
				1139	location). Legalize takes care of this by open-coding another sequence of
				1140	operations to emulate the operation ("expansion"), by promoting one type to a
				1141	larger type that supports the operation ("promotion"), or by using a
				1142	target-specific hook to implement the legalization ("custom").</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1143
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1144	<p>A target implementation tells the legalizer which operations are not
				1145	supported (and which of the above three actions to take) by calling the
				1146	<tt>setOperationAction</tt> method in its <tt>TargetLowering</tt>
				1147	constructor.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1148
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	1149	<p>Prior to the existence of the Legalize passes, we required that every target
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1150	<a href="#selectiondag_optimize">selector</a> supported and handled every
				1151	operator and type even if they are not natively supported. The introduction
				1152	of the Legalize phases allows all of the canonicalization patterns to be
				1153	shared across targets, and makes it very easy to optimize the canonicalized
				1154	code because it is still in the form of a DAG.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1155
				1156	</div>
				1157
				1158	<!-- _______________________________________________________________________ -->
				1159	<div class="doc_subsubsection">
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1160	<a name="selectiondag_optimize">SelectionDAG Optimization Phase: the DAG
				1161	Combiner</a>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1162	</div>
				1163
				1164	<div class="doc_text">
				1165
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1166	<p>The SelectionDAG optimization phase is run multiple times for code
				1167	generation, immediately after the DAG is built and once after each
				1168	legalization. The first run of the pass allows the initial code to be
				1169	cleaned up (e.g. performing optimizations that depend on knowing that the
				1170	operators have restricted type inputs). Subsequent runs of the pass clean up
				1171	the messy code generated by the Legalize passes, which allows Legalize to be
				1172	very simple (it can focus on making code legal instead of focusing on
				1173	generating <em>good</em> and legal code).</p>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1174
				1175	<p>One important class of optimizations performed is optimizing inserted sign
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1176	and zero extension instructions. We currently use ad-hoc techniques, but
				1177	could move to more rigorous techniques in the future. Here are some good
				1178	papers on the subject:</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1179
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1180	<p>"<a href="http://www.eecs.harvard.edu/~nr/pubs/widen-abstract.html">Widening
				1181	integer arithmetic</a>"<br>
				1182	Kevin Redwine and Norman Ramsey<br>
				1183	International Conference on Compiler Construction (CC) 2004</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1184
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1185	<p>"<a href="http://portal.acm.org/citation.cfm?doid=512529.512552">Effective
				1186	sign extension elimination</a>"<br>
				1187	Motohiro Kawahito, Hideaki Komatsu, and Toshio Nakatani<br>
				1188	Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design
				1189	and Implementation.</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1190
				1191	</div>
				1192
				1193	<!-- _______________________________________________________________________ -->
				1194	<div class="doc_subsubsection">
				1195	<a name="selectiondag_select">SelectionDAG Select Phase</a>
				1196	</div>
				1197
				1198	<div class="doc_text">
				1199
				1200	<p>The Select phase is the bulk of the target-specific code for instruction
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1201	selection. This phase takes a legal SelectionDAG as input, pattern matches
				1202	the instructions supported by the target to this DAG, and produces a new DAG
				1203	of target code. For example, consider the following LLVM fragment:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1204
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1205	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1206	<pre>
Dan Gohman	a9445e1	2010-03-02 01:11:08 +0000	[diff] [blame]	1207	%t1 = fadd float %W, %X
				1208	%t2 = fmul float %t1, %Y
				1209	%t3 = fadd float %t2, %Z
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1210	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1211	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1212
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1213	<p>This LLVM code corresponds to a SelectionDAG that looks basically like
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1214	this:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1215
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1216	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1217	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1218	(fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1219	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1220	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1221
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1222	<p>If a target supports floating point multiply-and-add (FMA) operations, one of
				1223	the adds can be merged with the multiply. On the PowerPC, for example, the
				1224	output of the instruction selector might look like this DAG:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1225
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1226	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1227	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1228	(FMADDS (FADDS W, X), Y, Z)
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1229	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1230	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1231
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1232	<p>The <tt>FMADDS</tt> instruction is a ternary instruction that multiplies its
				1233	first two operands and adds the third (as single-precision floating-point
				1234	numbers). The <tt>FADDS</tt> instruction is a simple binary single-precision
				1235	add instruction. To perform this pattern match, the PowerPC backend includes
				1236	the following instruction definitions:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1237
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1238	<div class="doc_code">
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1239	<pre>
				1240	def FMADDS : AForm_1<59, 29,
				1241	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB),
				1242	"fmadds $FRT, $FRA, $FRC, $FRB",
				1243	[<b>(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC),
				1244	F4RC:$FRB))</b>]>;
				1245	def FADDS : AForm_2<59, 21,
				1246	(ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRB),
				1247	"fadds $FRT, $FRA, $FRB",
				1248	[<b>(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))</b>]>;
				1249	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1250	</div>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1251
				1252	<p>The portion of the instruction definition in bold indicates the pattern used
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1253	to match the instruction. The DAG operators
				1254	(like <tt>fmul</tt>/<tt>fadd</tt>) are defined in
Dan Gohman	6a4824c	2010-03-25 00:03:04 +0000	[diff] [blame]	1255	the <tt>include/llvm/Target/TargetSelectionDAG.td</tt> file. "
				1256	<tt>F4RC</tt>" is the register class of the input and result values.</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1257
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1258	<p>The TableGen DAG instruction selector generator reads the instruction
				1259	patterns in the <tt>.td</tt> file and automatically builds parts of the
				1260	pattern matching code for your target. It has the following strengths:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1261
				1262	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1263	<li>At compiler-compiler time, it analyzes your instruction patterns and tells
				1264	you if your patterns make sense or not.</li>
				1265
				1266	<li>It can handle arbitrary constraints on operands for the pattern match. In
				1267	particular, it is straight-forward to say things like "match any immediate
				1268	that is a 13-bit sign-extended value". For examples, see the
				1269	<tt>immSExt16</tt> and related <tt>tblgen</tt> classes in the PowerPC
				1270	backend.</li>
				1271
				1272	<li>It knows several important identities for the patterns defined. For
				1273	example, it knows that addition is commutative, so it allows the
				1274	<tt>FMADDS</tt> pattern above to match "<tt>(fadd X, (fmul Y, Z))</tt>" as
				1275	well as "<tt>(fadd (fmul X, Y), Z)</tt>", without the target author having
				1276	to specially handle this case.</li>
				1277
				1278	<li>It has a full-featured type-inferencing system. In particular, you should
				1279	rarely have to explicitly tell the system what type parts of your patterns
				1280	are. In the <tt>FMADDS</tt> case above, we didn't have to tell
				1281	<tt>tblgen</tt> that all of the nodes in the pattern are of type 'f32'.
				1282	It was able to infer and propagate this knowledge from the fact that
				1283	<tt>F4RC</tt> has type 'f32'.</li>
				1284
				1285	<li>Targets can define their own (and rely on built-in) "pattern fragments".
				1286	Pattern fragments are chunks of reusable patterns that get inlined into
				1287	your patterns during compiler-compiler time. For example, the integer
				1288	"<tt>(not x)</tt>" operation is actually defined as a pattern fragment
				1289	that expands as "<tt>(xor x, -1)</tt>", since the SelectionDAG does not
				1290	have a native '<tt>not</tt>' operation. Targets can define their own
				1291	short-hand fragments as they see fit. See the definition of
				1292	'<tt>not</tt>' and '<tt>ineg</tt>' for examples.</li>
				1293
				1294	<li>In addition to instructions, targets can specify arbitrary patterns that
				1295	map to one or more instructions using the 'Pat' class. For example, the
				1296	PowerPC has no way to load an arbitrary integer immediate into a register
				1297	in one instruction. To tell tblgen how to do this, it defines:
				1298	<br>
				1299	<br>
				1300	<div class="doc_code">
				1301	<pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1302	// Arbitrary immediate support. Implement in terms of LIS/ORI.
				1303	def : Pat<(i32 imm:$imm),
				1304	(ORI (LIS (HI16 imm:$imm)), (LO16 imm:$imm))>;
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1305	</pre>
				1306	</div>
				1307	<br>
				1308	If none of the single-instruction patterns for loading an immediate into a
				1309	register match, this will be used. This rule says "match an arbitrary i32
				1310	immediate, turning it into an <tt>ORI</tt> ('or a 16-bit immediate') and
				1311	an <tt>LIS</tt> ('load 16-bit immediate, where the immediate is shifted to
				1312	the left 16 bits') instruction". To make this work, the
				1313	<tt>LO16</tt>/<tt>HI16</tt> node transformations are used to manipulate
				1314	the input immediate (in this case, take the high or low 16-bits of the
				1315	immediate).</li>
				1316
				1317	<li>While the system does automate a lot, it still allows you to write custom
				1318	C++ code to match special cases if there is something that is hard to
				1319	express.</li>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1320	</ul>
				1321
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	1322	<p>While it has many strengths, the system currently has some limitations,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1323	primarily because it is a work in progress and is not yet finished:</p>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1324
				1325	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1326	<li>Overall, there is no way to define or match SelectionDAG nodes that define
Dan Gohman	e370c80	2009-04-22 15:55:31 +0000	[diff] [blame]	1327	multiple values (e.g. <tt>SMUL_LOHI</tt>, <tt>LOAD</tt>, <tt>CALL</tt>,
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1328	etc). This is the biggest reason that you currently still <em>have
				1329	to</em> write custom C++ code for your instruction selector.</li>
				1330
				1331	<li>There is no great way to support matching complex addressing modes yet.
				1332	In the future, we will extend pattern fragments to allow them to define
				1333	multiple values (e.g. the four operands of the <a href="#x86_memory">X86
				1334	addressing mode</a>, which are currently matched with custom C++ code).
				1335	In addition, we'll extend fragments so that a fragment can match multiple
				1336	different patterns.</li>
				1337
				1338	<li>We don't automatically infer flags like isStore/isLoad yet.</li>
				1339
				1340	<li>We don't automatically generate the set of supported registers and
				1341	operations for the <a href="#selectiondag_legalize">Legalizer</a>
				1342	yet.</li>
				1343
				1344	<li>We don't have a way of tying in custom legalized nodes yet.</li>
Chris Lattner	7d6915c	2005-10-17 04:18:41 +0000	[diff] [blame]	1345	</ul>
Chris Lattner	7a025c8	2005-10-16 20:02:19 +0000	[diff] [blame]	1346
				1347	<p>Despite these limitations, the instruction selector generator is still quite
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1348	useful for most of the binary and logical operations in typical instruction
				1349	sets. If you run into any problems or can't figure out how to do something,
				1350	please let Chris know!</p>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1351
				1352	</div>
				1353
				1354	<!-- _______________________________________________________________________ -->
				1355	<div class="doc_subsubsection">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1356	<a name="selectiondag_sched">SelectionDAG Scheduling and Formation Phase</a>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1357	</div>
				1358
				1359	<div class="doc_text">
				1360
				1361	<p>The scheduling phase takes the DAG of target instructions from the selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1362	phase and assigns an order. The scheduler can pick an order depending on
				1363	various constraints of the machines (i.e. order for minimal register pressure
				1364	or try to cover instruction latencies). Once an order is established, the
				1365	DAG is converted to a list
				1366	of <tt><a href="#machineinstr">MachineInstr</a></tt>s and the SelectionDAG is
				1367	destroyed.</p>
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1368
Jeff Cohen	0b81cda	2005-10-24 16:54:55 +0000	[diff] [blame]	1369	<p>Note that this phase is logically separate from the instruction selection
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1370	phase, but is tied to it closely in the code because it operates on
				1371	SelectionDAGs.</p>
Chris Lattner	c38959f	2005-10-17 03:09:31 +0000	[diff] [blame]	1372
Chris Lattner	e35d3bb	2005-10-16 00:36:38 +0000	[diff] [blame]	1373	</div>
				1374
				1375	<!-- _______________________________________________________________________ -->
				1376	<div class="doc_subsubsection">
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1377	<a name="selectiondag_future">Future directions for the SelectionDAG</a>
				1378	</div>
				1379
				1380	<div class="doc_text">
				1381
				1382	<ol>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1383	<li>Optional function-at-a-time selection.</li>
				1384
				1385	<li>Auto-generate entire selector from <tt>.td</tt> file.</li>
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1386	</ol>
				1387
				1388	</div>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1389
				1390	<!-- ======================================================================= -->
				1391	<div class="doc_subsection">
				1392	<a name="ssamco">SSA-based Machine Code Optimizations</a>
				1393	</div>
				1394	<div class="doc_text"><p>To Be Written</p></div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1395
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1396	<!-- ======================================================================= -->
				1397	<div class="doc_subsection">
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1398	<a name="liveintervals">Live Intervals</a>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1399	</div>
				1400
				1401	<div class="doc_text">
				1402
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1403	<p>Live Intervals are the ranges (intervals) where a variable is <i>live</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1404	They are used by some <a href="#regalloc">register allocator</a> passes to
				1405	determine if two or more virtual registers which require the same physical
				1406	register are live at the same point in the program (i.e., they conflict).
				1407	When this situation occurs, one virtual register must be <i>spilled</i>.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1408
				1409	</div>
				1410
				1411	<!-- _______________________________________________________________________ -->
				1412	<div class="doc_subsubsection">
				1413	<a name="livevariable_analysis">Live Variable Analysis</a>
				1414	</div>
				1415
				1416	<div class="doc_text">
				1417
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1418	<p>The first step in determining the live intervals of variables is to calculate
				1419	the set of registers that are immediately dead after the instruction (i.e.,
				1420	the instruction calculates the value, but it is never used) and the set of
				1421	registers that are used by the instruction, but are never used after the
				1422	instruction (i.e., they are killed). Live variable information is computed
				1423	for each <i>virtual</i> register and <i>register allocatable</i> physical
				1424	register in the function. This is done in a very efficient manner because it
				1425	uses SSA to sparsely compute lifetime information for virtual registers
				1426	(which are in SSA form) and only has to track physical registers within a
				1427	block. Before register allocation, LLVM can assume that physical registers
				1428	are only live within a single basic block. This allows it to do a single,
				1429	local analysis to resolve physical register lifetimes within each basic
				1430	block. If a physical register is not register allocatable (e.g., a stack
				1431	pointer or condition codes), it is not tracked.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1432
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1433	<p>Physical registers may be live in to or out of a function. Live in values are
				1434	typically arguments in registers. Live out values are typically return values
				1435	in registers. Live in values are marked as such, and are given a dummy
				1436	"defining" instruction during live intervals analysis. If the last basic
				1437	block of a function is a <tt>return</tt>, then it's marked as using all live
				1438	out values in the function.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1439
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1440	<p><tt>PHI</tt> nodes need to be handled specially, because the calculation of
				1441	the live variable information from a depth first traversal of the CFG of the
				1442	function won't guarantee that a virtual register used by the <tt>PHI</tt>
				1443	node is defined before it's used. When a <tt>PHI</tt> node is encountered,
				1444	only the definition is handled, because the uses will be handled in other
				1445	basic blocks.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1446
				1447	<p>For each <tt>PHI</tt> node of the current basic block, we simulate an
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1448	assignment at the end of the current basic block and traverse the successor
				1449	basic blocks. If a successor basic block has a <tt>PHI</tt> node and one of
				1450	the <tt>PHI</tt> node's operands is coming from the current basic block, then
				1451	the variable is marked as <i>alive</i> within the current basic block and all
				1452	of its predecessor basic blocks, until the basic block with the defining
				1453	instruction is encountered.</p>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1454
				1455	</div>
				1456
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1457	<!-- _______________________________________________________________________ -->
				1458	<div class="doc_subsubsection">
				1459	<a name="liveintervals_analysis">Live Intervals Analysis</a>
				1460	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1461
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1462	<div class="doc_text">
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1463
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1464	<p>We now have the information available to perform the live intervals analysis
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1465	and build the live intervals themselves. We start off by numbering the basic
				1466	blocks and machine instructions. We then handle the "live-in" values. These
				1467	are in physical registers, so the physical register is assumed to be killed
				1468	by the end of the basic block. Live intervals for virtual registers are
				1469	computed for some ordering of the machine instructions <tt>[1, N]</tt>. A
				1470	live interval is an interval <tt>[i, j)</tt>, where <tt>1 <= i <= j
				1471	< N</tt>, for which a variable is live.</p>
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1472
Bill Wendling	82e2eea	2006-10-11 18:00:22 +0000	[diff] [blame]	1473	<p><i><b>More to come...</b></i></p>
				1474
Bill Wendling	3fc488d	2006-09-06 18:42:41 +0000	[diff] [blame]	1475	</div>
Bill Wendling	2f87a88	2006-09-04 23:35:52 +0000	[diff] [blame]	1476
				1477	<!-- ======================================================================= -->
				1478	<div class="doc_subsection">
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1479	<a name="regalloc">Register Allocation</a>
				1480	</div>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1481
				1482	<div class="doc_text">
				1483
Bill Wendling	3cd5ca6	2006-10-11 06:30:10 +0000	[diff] [blame]	1484	<p>The <i>Register Allocation problem</i> consists in mapping a program
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1485	<i>P<sub>v</sub></i>, that can use an unbounded number of virtual registers,
				1486	to a program <i>P<sub>p</sub></i> that contains a finite (possibly small)
				1487	number of physical registers. Each target architecture has a different number
				1488	of physical registers. If the number of physical registers is not enough to
				1489	accommodate all the virtual registers, some of them will have to be mapped
				1490	into memory. These virtuals are called <i>spilled virtuals</i>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1491
				1492	</div>
				1493
				1494	<!-- _______________________________________________________________________ -->
				1495
				1496	<div class="doc_subsubsection">
				1497	<a name="regAlloc_represent">How registers are represented in LLVM</a>
				1498	</div>
				1499
				1500	<div class="doc_text">
				1501
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1502	<p>In LLVM, physical registers are denoted by integer numbers that normally
				1503	range from 1 to 1023. To see how this numbering is defined for a particular
				1504	architecture, you can read the <tt>GenRegisterNames.inc</tt> file for that
				1505	architecture. For instance, by
				1506	inspecting <tt>lib/Target/X86/X86GenRegisterNames.inc</tt> we see that the
				1507	32-bit register <tt>EAX</tt> is denoted by 15, and the MMX register
				1508	<tt>MM0</tt> is mapped to 48.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1509
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1510	<p>Some architectures contain registers that share the same physical location. A
				1511	notable example is the X86 platform. For instance, in the X86 architecture,
				1512	the registers <tt>EAX</tt>, <tt>AX</tt> and <tt>AL</tt> share the first eight
				1513	bits. These physical registers are marked as <i>aliased</i> in LLVM. Given a
				1514	particular architecture, you can check which registers are aliased by
				1515	inspecting its <tt>RegisterInfo.td</tt> file. Moreover, the method
				1516	<tt>TargetRegisterInfo::getAliasSet(p_reg)</tt> returns an array containing
				1517	all the physical registers aliased to the register <tt>p_reg</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1518
				1519	<p>Physical registers, in LLVM, are grouped in <i>Register Classes</i>.
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1520	Elements in the same register class are functionally equivalent, and can be
				1521	interchangeably used. Each virtual register can only be mapped to physical
				1522	registers of a particular class. For instance, in the X86 architecture, some
				1523	virtuals can only be allocated to 8 bit registers. A register class is
				1524	described by <tt>TargetRegisterClass</tt> objects. To discover if a virtual
				1525	register is compatible with a given physical, this code can be used:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1526
				1527	<div class="doc_code">
				1528	<pre>
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	1529	bool RegMapping_Fer::compatible_class(MachineFunction &mf,
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1530	unsigned v_reg,
				1531	unsigned p_reg) {
Dan Gohman	6f0d024	2008-02-10 18:45:23 +0000	[diff] [blame]	1532	assert(TargetRegisterInfo::isPhysicalRegister(p_reg) &&
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1533	"Target register must be physical");
Chris Lattner	534bcfb	2007-12-31 04:16:08 +0000	[diff] [blame]	1534	const TargetRegisterClass *trc = mf.getRegInfo().getRegClass(v_reg);
				1535	return trc->contains(p_reg);
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1536	}
				1537	</pre>
				1538	</div>
				1539
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1540	<p>Sometimes, mostly for debugging purposes, it is useful to change the number
				1541	of physical registers available in the target architecture. This must be done
				1542	statically, inside the <tt>TargetRegsterInfo.td</tt> file. Just <tt>grep</tt>
				1543	for <tt>RegisterClass</tt>, the last parameter of which is a list of
				1544	registers. Just commenting some out is one simple way to avoid them being
				1545	used. A more polite way is to explicitly exclude some registers from
Dan Gohman	d2cb3d2	2009-07-24 00:30:09 +0000	[diff] [blame]	1546	the <i>allocation order</i>. See the definition of the <tt>GR8</tt> register
				1547	class in <tt>lib/Target/X86/X86RegisterInfo.td</tt> for an example of this.
				1548	</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1549
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1550	<p>Virtual registers are also denoted by integer numbers. Contrary to physical
				1551	registers, different virtual registers never share the same number. The
				1552	smallest virtual register is normally assigned the number 1024. This may
				1553	change, so, in order to know which is the first virtual register, you should
				1554	access <tt>TargetRegisterInfo::FirstVirtualRegister</tt>. Any register whose
				1555	number is greater than or equal
				1556	to <tt>TargetRegisterInfo::FirstVirtualRegister</tt> is considered a virtual
				1557	register. Whereas physical registers are statically defined in
				1558	a <tt>TargetRegisterInfo.td</tt> file and cannot be created by the
				1559	application developer, that is not the case with virtual registers. In order
				1560	to create new virtual registers, use the
				1561	method <tt>MachineRegisterInfo::createVirtualRegister()</tt>. This method
				1562	will return a virtual register with the highest code.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1563
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1564	<p>Before register allocation, the operands of an instruction are mostly virtual
				1565	registers, although physical registers may also be used. In order to check if
				1566	a given machine operand is a register, use the boolean
				1567	function <tt>MachineOperand::isRegister()</tt>. To obtain the integer code of
				1568	a register, use <tt>MachineOperand::getReg()</tt>. An instruction may define
				1569	or use a register. For instance, <tt>ADD reg:1026 := reg:1025 reg:1024</tt>
				1570	defines the registers 1024, and uses registers 1025 and 1026. Given a
				1571	register operand, the method <tt>MachineOperand::isUse()</tt> informs if that
				1572	register is being used by the instruction. The
				1573	method <tt>MachineOperand::isDef()</tt> informs if that registers is being
				1574	defined.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1575
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1576	<p>We will call physical registers present in the LLVM bitcode before register
				1577	allocation <i>pre-colored registers</i>. Pre-colored registers are used in
				1578	many different situations, for instance, to pass parameters of functions
				1579	calls, and to store results of particular instructions. There are two types
				1580	of pre-colored registers: the ones <i>implicitly</i> defined, and
				1581	those <i>explicitly</i> defined. Explicitly defined registers are normal
				1582	operands, and can be accessed
				1583	with <tt>MachineInstr::getOperand(int)::getReg()</tt>. In order to check
				1584	which registers are implicitly defined by an instruction, use
				1585	the <tt>TargetInstrInfo::get(opcode)::ImplicitDefs</tt>,
				1586	where <tt>opcode</tt> is the opcode of the target instruction. One important
				1587	difference between explicit and implicit physical registers is that the
				1588	latter are defined statically for each instruction, whereas the former may
				1589	vary depending on the program being compiled. For example, an instruction
				1590	that represents a function call will always implicitly define or use the same
				1591	set of physical registers. To read the registers implicitly used by an
				1592	instruction,
				1593	use <tt>TargetInstrInfo::get(opcode)::ImplicitUses</tt>. Pre-colored
				1594	registers impose constraints on any register allocation algorithm. The
Bob Wilson	0473868	2010-04-09 18:39:54 +0000	[diff] [blame]	1595	register allocator must make sure that none of them are overwritten by
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1596	the values of virtual registers while still alive.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1597
				1598	</div>
				1599
				1600	<!-- _______________________________________________________________________ -->
				1601
				1602	<div class="doc_subsubsection">
				1603	<a name="regAlloc_howTo">Mapping virtual registers to physical registers</a>
				1604	</div>
				1605
				1606	<div class="doc_text">
				1607
				1608	<p>There are two ways to map virtual registers to physical registers (or to
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1609	memory slots). The first way, that we will call <i>direct mapping</i>, is
				1610	based on the use of methods of the classes <tt>TargetRegisterInfo</tt>,
				1611	and <tt>MachineOperand</tt>. The second way, that we will call <i>indirect
				1612	mapping</i>, relies on the <tt>VirtRegMap</tt> class in order to insert loads
				1613	and stores sending and getting values to and from memory.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1614
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1615	<p>The direct mapping provides more flexibility to the developer of the register
				1616	allocator; however, it is more error prone, and demands more implementation
				1617	work. Basically, the programmer will have to specify where load and store
				1618	instructions should be inserted in the target function being compiled in
				1619	order to get and store values in memory. To assign a physical register to a
				1620	virtual register present in a given operand,
				1621	use <tt>MachineOperand::setReg(p_reg)</tt>. To insert a store instruction,
Jakob Stoklund Olesen	297907f	2010-08-31 22:01:07 +0000	[diff] [blame]	1622	use <tt>TargetInstrInfo::storeRegToStackSlot(...)</tt>, and to insert a
				1623	load instruction, use <tt>TargetInstrInfo::loadRegFromStackSlot</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1624
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1625	<p>The indirect mapping shields the application developer from the complexities
				1626	of inserting load and store instructions. In order to map a virtual register
				1627	to a physical one, use <tt>VirtRegMap::assignVirt2Phys(vreg, preg)</tt>. In
				1628	order to map a certain virtual register to memory,
				1629	use <tt>VirtRegMap::assignVirt2StackSlot(vreg)</tt>. This method will return
				1630	the stack slot where <tt>vreg</tt>'s value will be located. If it is
				1631	necessary to map another virtual register to the same stack slot,
				1632	use <tt>VirtRegMap::assignVirt2StackSlot(vreg, stack_location)</tt>. One
				1633	important point to consider when using the indirect mapping, is that even if
				1634	a virtual register is mapped to memory, it still needs to be mapped to a
				1635	physical register. This physical register is the location where the virtual
				1636	register is supposed to be found before being stored or after being
				1637	reloaded.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1638
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1639	<p>If the indirect strategy is used, after all the virtual registers have been
				1640	mapped to physical registers or stack slots, it is necessary to use a spiller
				1641	object to place load and store instructions in the code. Every virtual that
				1642	has been mapped to a stack slot will be stored to memory after been defined
				1643	and will be loaded before being used. The implementation of the spiller tries
				1644	to recycle load/store instructions, avoiding unnecessary instructions. For an
				1645	example of how to invoke the spiller,
				1646	see <tt>RegAllocLinearScan::runOnMachineFunction</tt>
				1647	in <tt>lib/CodeGen/RegAllocLinearScan.cpp</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1648
				1649	</div>
				1650
				1651	<!-- _______________________________________________________________________ -->
				1652	<div class="doc_subsubsection">
				1653	<a name="regAlloc_twoAddr">Handling two address instructions</a>
				1654	</div>
				1655
				1656	<div class="doc_text">
				1657
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1658	<p>With very rare exceptions (e.g., function calls), the LLVM machine code
				1659	instructions are three address instructions. That is, each instruction is
				1660	expected to define at most one register, and to use at most two registers.
				1661	However, some architectures use two address instructions. In this case, the
				1662	defined register is also one of the used register. For instance, an
				1663	instruction such as <tt>ADD %EAX, %EBX</tt>, in X86 is actually equivalent
				1664	to <tt>%EAX = %EAX + %EBX</tt>.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1665
				1666	<p>In order to produce correct code, LLVM must convert three address
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1667	instructions that represent two address instructions into true two address
				1668	instructions. LLVM provides the pass <tt>TwoAddressInstructionPass</tt> for
				1669	this specific purpose. It must be run before register allocation takes
				1670	place. After its execution, the resulting code may no longer be in SSA
				1671	form. This happens, for instance, in situations where an instruction such
				1672	as <tt>%a = ADD %b %c</tt> is converted to two instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1673
				1674	<div class="doc_code">
				1675	<pre>
				1676	%a = MOVE %b
Dan Gohman	03e5857	2008-06-13 17:55:57 +0000	[diff] [blame]	1677	%a = ADD %a %c
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1678	</pre>
				1679	</div>
				1680
				1681	<p>Notice that, internally, the second instruction is represented as
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1682	<tt>ADD %a[def/use] %c</tt>. I.e., the register operand <tt>%a</tt> is both
				1683	used and defined by the instruction.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1684
				1685	</div>
				1686
				1687	<!-- _______________________________________________________________________ -->
				1688	<div class="doc_subsubsection">
				1689	<a name="regAlloc_ssaDecon">The SSA deconstruction phase</a>
				1690	</div>
				1691
				1692	<div class="doc_text">
				1693
				1694	<p>An important transformation that happens during register allocation is called
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1695	the <i>SSA Deconstruction Phase</i>. The SSA form simplifies many analyses
				1696	that are performed on the control flow graph of programs. However,
				1697	traditional instruction sets do not implement PHI instructions. Thus, in
				1698	order to generate executable code, compilers must replace PHI instructions
				1699	with other instructions that preserve their semantics.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1700
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1701	<p>There are many ways in which PHI instructions can safely be removed from the
				1702	target code. The most traditional PHI deconstruction algorithm replaces PHI
				1703	instructions with copy instructions. That is the strategy adopted by
				1704	LLVM. The SSA deconstruction algorithm is implemented
				1705	in <tt>lib/CodeGen/PHIElimination.cpp</tt>. In order to invoke this pass, the
				1706	identifier <tt>PHIEliminationID</tt> must be marked as required in the code
				1707	of the register allocator.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1708
				1709	</div>
				1710
				1711	<!-- _______________________________________________________________________ -->
				1712	<div class="doc_subsubsection">
				1713	<a name="regAlloc_fold">Instruction folding</a>
				1714	</div>
				1715
				1716	<div class="doc_text">
				1717
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1718	<p><i>Instruction folding</i> is an optimization performed during register
				1719	allocation that removes unnecessary copy instructions. For instance, a
				1720	sequence of instructions such as:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1721
				1722	<div class="doc_code">
				1723	<pre>
				1724	%EBX = LOAD %mem_address
				1725	%EAX = COPY %EBX
				1726	</pre>
				1727	</div>
				1728
Dan Gohman	a7ab2bf	2008-11-24 16:35:31 +0000	[diff] [blame]	1729	<p>can be safely substituted by the single instruction:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1730
				1731	<div class="doc_code">
				1732	<pre>
				1733	%EAX = LOAD %mem_address
				1734	</pre>
				1735	</div>
				1736
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1737	<p>Instructions can be folded with
				1738	the <tt>TargetRegisterInfo::foldMemoryOperand(...)</tt> method. Care must be
				1739	taken when folding instructions; a folded instruction can be quite different
				1740	from the original
				1741	instruction. See <tt>LiveIntervals::addIntervalsForSpills</tt>
				1742	in <tt>lib/CodeGen/LiveIntervalAnalysis.cpp</tt> for an example of its
				1743	use.</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1744
				1745	</div>
				1746
				1747	<!-- _______________________________________________________________________ -->
				1748
				1749	<div class="doc_subsubsection">
				1750	<a name="regAlloc_builtIn">Built in register allocators</a>
				1751	</div>
				1752
				1753	<div class="doc_text">
				1754
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1755	<p>The LLVM infrastructure provides the application developer with three
				1756	different register allocators:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1757
				1758	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1759	<li><i>Linear Scan</i> — <i>The default allocator</i>. This is the
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1760	well-know linear scan register allocator. Whereas the
				1761	<i>Simple</i> and <i>Local</i> algorithms use a direct mapping
				1762	implementation technique, the <i>Linear Scan</i> implementation
				1763	uses a spiller in order to place load and stores.</li>
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1764
				1765	<li><i>Fast</i> — This register allocator is the default for debug
				1766	builds. It allocates registers on a basic block level, attempting to keep
				1767	values in registers and reusing registers as appropriate.</li>
				1768
				1769	<li><i>PBQP</i> — A Partitioned Boolean Quadratic Programming (PBQP)
				1770	based register allocator. This allocator works by constructing a PBQP
				1771	problem representing the register allocation problem under consideration,
				1772	solving this using a PBQP solver, and mapping the solution back to a
				1773	register assignment.</li>
				1774
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1775	</ul>
				1776
				1777	<p>The type of register allocator used in <tt>llc</tt> can be chosen with the
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1778	command line option <tt>-regalloc=...</tt>:</p>
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1779
				1780	<div class="doc_code">
				1781	<pre>
Dan Gohman	0cabaa5	2009-08-25 15:54:01 +0000	[diff] [blame]	1782	$ llc -regalloc=linearscan file.bc -o ln.s;
Jakob Stoklund Olesen	8a3eab9	2010-06-15 21:58:33 +0000	[diff] [blame]	1783	$ llc -regalloc=fast file.bc -o fa.s;
				1784	$ llc -regalloc=pbqp file.bc -o pbqp.s;
Bill Wendling	a396ee8	2006-09-01 21:46:00 +0000	[diff] [blame]	1785	</pre>
				1786	</div>
				1787
				1788	</div>
				1789
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1790	<!-- ======================================================================= -->
				1791	<div class="doc_subsection">
				1792	<a name="proepicode">Prolog/Epilog Code Insertion</a>
				1793	</div>
				1794	<div class="doc_text"><p>To Be Written</p></div>
				1795	<!-- ======================================================================= -->
				1796	<div class="doc_subsection">
				1797	<a name="latemco">Late Machine Code Optimizations</a>
				1798	</div>
				1799	<div class="doc_text"><p>To Be Written</p></div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1800
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1801	<!-- ======================================================================= -->
				1802	<div class="doc_subsection">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1803	<a name="codeemit">Code Emission</a>
Reid Spencer	ad1f0cd	2005-04-24 20:56:18 +0000	[diff] [blame]	1804	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1805
				1806	<div class="doc_text">
				1807
				1808	<p>The code emission step of code generation is responsible for lowering from
				1809	the code generator abstractions (like <a
				1810	href="#machinefunction">MachineFunction</a>, <a
				1811	href="#machineinstr">MachineInstr</a>, etc) down
				1812	to the abstractions used by the MC layer (<a href="#mcinst">MCInst</a>,
				1813	<a href="#mcstreamer">MCStreamer</a>, etc). This is
				1814	done with a combination of several different classes: the (misnamed)
				1815	target-independent AsmPrinter class, target-specific subclasses of AsmPrinter
				1816	(such as SparcAsmPrinter), and the TargetLoweringObjectFile class.</p>
				1817
				1818	<p>Since the MC layer works at the level of abstraction of object files, it
				1819	doesn't have a notion of functions, global variables etc. Instead, it thinks
				1820	about labels, directives, and instructions. A key class used at this time is
				1821	the MCStreamer class. This is an abstract API that is implemented in different
				1822	ways (e.g. to output a .s file, output an ELF .o file, etc) that is effectively
				1823	an "assembler API". MCStreamer has one method per directive, such as EmitLabel,
				1824	EmitSymbolAttribute, SwitchSection, etc, which directly correspond to assembly
				1825	level directives.
				1826	</p>
				1827
				1828	<p>If you are interested in implementing a code generator for a target, there
				1829	are three important things that you have to implement for your target:</p>
				1830
				1831	<ol>
				1832	<li>First, you need a subclass of AsmPrinter for your target. This class
				1833	implements the general lowering process converting MachineFunction's into MC
				1834	label constructs. The AsmPrinter base class provides a number of useful methods
				1835	and routines, and also allows you to override the lowering process in some
				1836	important ways. You should get much of the lowering for free if you are
				1837	implementing an ELF, COFF, or MachO target, because the TargetLoweringObjectFile
				1838	class implements much of the common logic.</li>
				1839
				1840	<li>Second, you need to implement an instruction printer for your target. The
				1841	instruction printer takes an <a href="#mcinst">MCInst</a> and renders it to a
				1842	raw_ostream as text. Most of this is automatically generated from the .td file
				1843	(when you specify something like "<tt>add $dst, $src1, $src2</tt>" in the
				1844	instructions), but you need to implement routines to print operands.</li>
				1845
				1846	<li>Third, you need to implement code that lowers a <a
				1847	href="#machineinstr">MachineInstr</a> to an MCInst, usually implemented in
				1848	"<target>MCInstLower.cpp". This lowering process is often target
				1849	specific, and is responsible for turning jump table entries, constant pool
				1850	indices, global variable addresses, etc into MCLabels as appropriate. This
				1851	translation layer is also responsible for expanding pseudo ops used by the code
				1852	generator into the actual machine instructions they correspond to. The MCInsts
				1853	that are generated by this are fed into the instruction printer or the encoder.
				1854	</li>
				1855
				1856	</ol>
				1857
				1858	<p>Finally, at your choosing, you can also implement an subclass of
				1859	MCCodeEmitter which lowers MCInst's into machine code bytes and relocations.
				1860	This is important if you want to support direct .o file emission, or would like
				1861	to implement an assembler for your target.</p>
				1862
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1863	</div>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1864
				1865
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1866	<!-- *********************************************************************** -->
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1867	<div class="doc_section">
				1868	<a name="nativeassembler">Implementing a Native Assembler</a>
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1869	</div>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1870	<!-- *********************************************************************** -->
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1871
				1872	<div class="doc_text">
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1873
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1874	<p>Though you're probably reading this because you want to write or maintain a
				1875	compiler backend, LLVM also fully supports building a native assemblers too.
				1876	We've tried hard to automate the generation of the assembler from the .td files
				1877	(in particular the instruction syntax and encodings), which means that a large
				1878	part of the manual and repetitive data entry can be factored and shared with the
				1879	compiler.</p>
				1880
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1881	</div>
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1882
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1883	<!-- ======================================================================= -->
				1884	<div class="doc_subsection" id="na_instparsing">Instruction Parsing</div>
				1885
				1886	<div class="doc_text"><p>To Be Written</p></div>
				1887
				1888
				1889	<!-- ======================================================================= -->
				1890	<div class="doc_subsection" id="na_instaliases">
				1891	Instruction Alias Processing
				1892	</div>
				1893
				1894	<div class="doc_text">
				1895	<p>Once the instruction is parsed, it enters the MatchInstructionImpl function.
				1896	The MatchInstructionImpl function performs alias processing and then does
				1897	actual matching.</p>
				1898
				1899	<p>Alias processing if the phase that canonicalizes different lexical forms of
				1900	the same instructions down to one representation. There are several different
				1901	kinds of alias that are possible to implement and they are listed below in the
				1902	order that they are processed (which is in order from simplest/weakest to most
				1903	complex/powerful). Generally you want to use the first alias mechanism that
				1904	meets the needs of your instruction, because it will allow a more concise
				1905	description.</p>
				1906
				1907	<!-- _______________________________________________________________________ -->
				1908	<div class="doc_subsubsection">Mnemonic Aliases</div>
				1909
				1910	<div class="doc_text">
				1911
				1912	<p>The first phase of alias processing is simple instruction mneomonic
				1913	remapping for classes of instructions which are allowed with two different
				1914	mneomonics. This phase is a simple and unconditionally remapping from one input
				1915	mnemonic to one output mnemonic. It isn't possible for this form of alias to
				1916	look at the operands at all, so the remapping must apply for all forms of a
				1917	given mnemonic. Mnemonic aliases are defined simply, for example X86 has:
				1918	</p>
				1919
				1920	<div class="doc_code">
				1921	<pre>
				1922	def : MnemonicAlias<"cbw", "cbtw">;
				1923	def : MnemonicAlias<"smovq", "movsq">;
				1924	def : MnemonicAlias<"fldcww", "fldcw">;
				1925	def : MnemonicAlias<"fucompi", "fucomip">;
				1926	def : MnemonicAlias<"ud2a", "ud2">;
				1927	</pre>
				1928	</div>
				1929
				1930	<p>... and many others. With a MnemonicAlias definition, the mnemonic is
				1931	remapped simply and directly.</p>
Chris Lattner	e1b8345	2010-09-11 23:02:10 +0000	[diff] [blame]	1932
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1933	</div>
				1934
				1935
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1936	<!-- ======================================================================= -->
Chris Lattner	674c1dc	2010-10-30 17:36:36 +0000	[diff] [blame]	1937	<div class="doc_subsection" id="na_matching">Instruction Matching</div>
				1938
Chris Lattner	22481f2	2010-09-21 04:03:39 +0000	[diff] [blame]	1939	<div class="doc_text"><p>To Be Written</p></div>
				1940
				1941
				1942
				1943
Chris Lattner	aa5bcb5	2005-01-28 17:22:53 +0000	[diff] [blame]	1944	<!-- *********************************************************************** -->
				1945	<div class="doc_section">
Chris Lattner	32e89f2	2005-10-16 18:31:08 +0000	[diff] [blame]	1946	<a name="targetimpls">Target-specific Implementation Notes</a>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1947	</div>
				1948	<!-- *********************************************************************** -->
				1949
				1950	<div class="doc_text">
				1951
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	1952	<p>This section of the document explains features or design decisions that are
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	1953	specific to the code generator for a particular target. First we start
				1954	with a table that summarizes what features are supported by each target.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	1955
				1956	</div>
				1957
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	1958	<!-- ======================================================================= -->
				1959	<div class="doc_subsection">
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	1960	<a name="targetfeatures">Target Feature Matrix</a>
				1961	</div>
				1962
				1963	<style type="text/css">
				1964	.unknown { background-color: #C0C0C0; text-align: center; }
				1965	.unknown:before { content: "?" }
				1966	.no { background-color: #C11B17 }
				1967	.no:before { content: "N" }
				1968	.partial { background-color: #F88017 }
				1969	.yes { background-color: #00FF00; }
				1970	.yes:before { content: "Y" }
				1971	</style>
				1972
				1973
				1974	<div class="doc_text">
				1975
				1976	<p>Note that this table does not include the C backend or Cpp backends, since
				1977	they do not use the target independent code generator infrastructure. It also
				1978	doesn't list features that are not supported fully by any target yet. It
				1979	considers a feature to be supported if at least one subtarget supports it. A
				1980	feature being supported means that it is useful and works for most cases, it
				1981	does not indicate that there are zero known bugs in the implementation. Here
				1982	is the key:</p>
				1983
				1984
				1985	<table border="1" cellspacing="0">
				1986	<tr>
				1987	<th>Unknown</th>
				1988	<th>No support</th>
				1989	<th>Partial Support</th>
				1990	<th>Complete Support</th>
				1991	</tr>
				1992	<tr>
				1993	<td class="unknown"></td>
				1994	<td class="no"></td>
				1995	<td class="partial"></td>
				1996	<td class="yes"></td>
				1997	</tr>
				1998	</table>
				1999
				2000	<p>Here is the table:</p>
				2001
				2002	<table width="689" border="1" cellspacing="0">
				2003	<tr><td></td>
				2004	<td colspan="13" align="center" bgcolor="#ffffcc">Target</td>
				2005	</tr>
				2006	<tr>
				2007	<th>Feature</th>
				2008	<th>ARM</th>
				2009	<th>Alpha</th>
				2010	<th>Blackfin</th>
				2011	<th>CellSPU</th>
				2012	<th>MBlaze</th>
				2013	<th>MSP430</th>
				2014	<th>Mips</th>
				2015	<th>PTX</th>
				2016	<th>PowerPC</th>
				2017	<th>Sparc</th>
				2018	<th>SystemZ</th>
				2019	<th>X86</th>
				2020	<th>XCore</th>
				2021	</tr>
				2022
				2023	<tr>
				2024	<td><a href="#feat_reliable">is generally reliable</a></td>
				2025	<td class="yes"></td> <!-- ARM -->
				2026	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2027	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2028	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2029	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2030	<td class="unknown"></td> <!-- MSP430 -->
				2031	<td class="unknown"></td> <!-- Mips -->
				2032	<td class="no"></td> <!-- PTX -->
				2033	<td class="yes"></td> <!-- PowerPC -->
				2034	<td class="yes"></td> <!-- Sparc -->
				2035	<td class="unknown"></td> <!-- SystemZ -->
				2036	<td class="yes"></td> <!-- X86 -->
				2037	<td class="unknown"></td> <!-- XCore -->
				2038	</tr>
				2039
				2040	<tr>
				2041	<td><a href="#feat_asmparser">assembly parser</a></td>
				2042	<td class="no"></td> <!-- ARM -->
				2043	<td class="no"></td> <!-- Alpha -->
				2044	<td class="no"></td> <!-- Blackfin -->
				2045	<td class="no"></td> <!-- CellSPU -->
				2046	<td class="no"></td> <!-- MBlaze -->
				2047	<td class="no"></td> <!-- MSP430 -->
				2048	<td class="no"></td> <!-- Mips -->
				2049	<td class="no"></td> <!-- PTX -->
				2050	<td class="no"></td> <!-- PowerPC -->
				2051	<td class="no"></td> <!-- Sparc -->
				2052	<td class="no"></td> <!-- SystemZ -->
				2053	<td class="yes"></td> <!-- X86 -->
				2054	<td class="no"></td> <!-- XCore -->
				2055	</tr>
				2056
				2057	<tr>
				2058	<td><a href="#feat_disassembler">disassembler</a></td>
				2059	<td class="yes"></td> <!-- ARM -->
				2060	<td class="no"></td> <!-- Alpha -->
				2061	<td class="no"></td> <!-- Blackfin -->
				2062	<td class="no"></td> <!-- CellSPU -->
				2063	<td class="no"></td> <!-- MBlaze -->
				2064	<td class="no"></td> <!-- MSP430 -->
				2065	<td class="no"></td> <!-- Mips -->
				2066	<td class="no"></td> <!-- PTX -->
				2067	<td class="no"></td> <!-- PowerPC -->
				2068	<td class="no"></td> <!-- Sparc -->
				2069	<td class="no"></td> <!-- SystemZ -->
				2070	<td class="yes"></td> <!-- X86 -->
				2071	<td class="no"></td> <!-- XCore -->
				2072	</tr>
				2073
				2074	<tr>
				2075	<td><a href="#feat_inlineasm">inline asm</a></td>
				2076	<td class="yes"></td> <!-- ARM -->
				2077	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2078	<td class="yes"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2079	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2080	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2081	<td class="unknown"></td> <!-- MSP430 -->
				2082	<td class="unknown"></td> <!-- Mips -->
				2083	<td class="unknown"></td> <!-- PTX -->
				2084	<td class="yes"></td> <!-- PowerPC -->
				2085	<td class="unknown"></td> <!-- Sparc -->
				2086	<td class="unknown"></td> <!-- SystemZ -->
				2087	<td class="yes"><a href="#feat_inlineasm_x86">*</a></td> <!-- X86 -->
				2088	<td class="unknown"></td> <!-- XCore -->
				2089	</tr>
				2090
				2091	<tr>
				2092	<td><a href="#feat_jit">jit</a></td>
				2093	<td class="partial"><a href="#feat_jit_arm">*</a></td> <!-- ARM -->
				2094	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2095	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2096	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2097	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2098	<td class="unknown"></td> <!-- MSP430 -->
				2099	<td class="unknown"></td> <!-- Mips -->
				2100	<td class="unknown"></td> <!-- PTX -->
				2101	<td class="yes"></td> <!-- PowerPC -->
				2102	<td class="unknown"></td> <!-- Sparc -->
				2103	<td class="unknown"></td> <!-- SystemZ -->
				2104	<td class="yes"></td> <!-- X86 -->
				2105	<td class="unknown"></td> <!-- XCore -->
				2106	</tr>
				2107
				2108	<tr>
				2109	<td><a href="#feat_objectwrite">.o file writing</a></td>
				2110	<td class="no"></td> <!-- ARM -->
				2111	<td class="no"></td> <!-- Alpha -->
				2112	<td class="no"></td> <!-- Blackfin -->
				2113	<td class="no"></td> <!-- CellSPU -->
				2114	<td class="no"></td> <!-- MBlaze -->
				2115	<td class="no"></td> <!-- MSP430 -->
				2116	<td class="no"></td> <!-- Mips -->
				2117	<td class="no"></td> <!-- PTX -->
				2118	<td class="no"></td> <!-- PowerPC -->
				2119	<td class="no"></td> <!-- Sparc -->
				2120	<td class="no"></td> <!-- SystemZ -->
				2121	<td class="yes"></td> <!-- X86 -->
				2122	<td class="no"></td> <!-- XCore -->
				2123	</tr>
				2124
				2125	<tr>
				2126	<td><a href="#feat_tailcall">tail calls</a></td>
				2127	<td class="yes"></td> <!-- ARM -->
				2128	<td class="unknown"></td> <!-- Alpha -->
Jakob Stoklund Olesen	4e13612	2010-10-24 20:04:05 +0000	[diff] [blame]	2129	<td class="no"></td> <!-- Blackfin -->
Kalle Raiskila	94cc4fe	2010-10-25 08:57:30 +0000	[diff] [blame]	2130	<td class="no"></td> <!-- CellSPU -->
Wesley Peck	c6a4524	2010-10-24 18:50:12 +0000	[diff] [blame]	2131	<td class="no"></td> <!-- MBlaze -->
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2132	<td class="unknown"></td> <!-- MSP430 -->
				2133	<td class="unknown"></td> <!-- Mips -->
				2134	<td class="unknown"></td> <!-- PTX -->
				2135	<td class="yes"></td> <!-- PowerPC -->
				2136	<td class="unknown"></td> <!-- Sparc -->
				2137	<td class="unknown"></td> <!-- SystemZ -->
				2138	<td class="yes"></td> <!-- X86 -->
				2139	<td class="unknown"></td> <!-- XCore -->
				2140	</tr>
				2141
				2142
				2143	</table>
				2144
				2145	</div>
				2146
				2147	<!-- _______________________________________________________________________ -->
				2148	<div class="doc_subsubsection" id="feat_reliable">Is Generally Reliable</div>
				2149
				2150	<div class="doc_text">
				2151	<p>This box indicates whether the target is considered to be production quality.
				2152	This indicates that the target has been used as a static compiler to
				2153	compile large amounts of code by a variety of different people and is in
				2154	continuous use.</p>
				2155	</div>
				2156
				2157	<!-- _______________________________________________________________________ -->
				2158	<div class="doc_subsubsection" id="feat_asmparser">Assembly Parser</div>
				2159
				2160	<div class="doc_text">
				2161	<p>This box indicates whether the target supports parsing target specific .s
				2162	files by implementing the MCAsmParser interface. This is required for llvm-mc
				2163	to be able to act as a native assembler and is required for inline assembly
				2164	support in the native .o file writer.</p>
				2165
				2166	</div>
				2167
				2168
				2169	<!-- _______________________________________________________________________ -->
				2170	<div class="doc_subsubsection" id="feat_disassembler">Disassembler</div>
				2171
				2172	<div class="doc_text">
				2173	<p>This box indicates whether the target supports the MCDisassembler API for
				2174	disassembling machine opcode bytes into MCInst's.</p>
				2175
				2176	</div>
				2177
				2178	<!-- _______________________________________________________________________ -->
				2179	<div class="doc_subsubsection" id="feat_inlineasm">Inline Asm</div>
				2180
				2181	<div class="doc_text">
				2182	<p>This box indicates whether the target supports most popular inline assembly
				2183	constraints and modifiers.</p>
				2184
				2185	<p id="feat_inlineasm_x86">X86 lacks reliable support for inline assembly
				2186	constraints relating to the X86 floating point stack.</p>
				2187
				2188	</div>
				2189
				2190	<!-- _______________________________________________________________________ -->
				2191	<div class="doc_subsubsection" id="feat_jit">JIT Support</div>
				2192
				2193	<div class="doc_text">
				2194	<p>This box indicates whether the target supports the JIT compiler through
				2195	the ExecutionEngine interface.</p>
				2196
Chris Lattner	6fb9955	2010-10-24 16:24:22 +0000	[diff] [blame]	2197	<p id="feat_jit_arm">The ARM backend has basic support for integer code
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2198	in ARM codegen mode, but lacks NEON and full Thumb support.</p>
				2199
				2200	</div>
				2201
				2202	<!-- _______________________________________________________________________ -->
				2203	<div class="doc_subsubsection" id="feat_objectwrite">.o File Writing</div>
				2204
				2205	<div class="doc_text">
				2206
				2207	<p>This box indicates whether the target supports writing .o files (e.g. MachO,
				2208	ELF, and/or COFF) files directly from the target. Note that the target also
				2209	must include an assembly parser and general inline assembly support for full
				2210	inline assembly support in the .o writer.</p>
				2211
Chris Lattner	219ddf5	2010-10-28 02:22:02 +0000	[diff] [blame]	2212	<p>Targets that don't support this feature can obviously still write out .o
				2213	files, they just rely on having an external assembler to translate from a .s
				2214	file to a .o file (as is the case for many C compilers).</p>
				2215
Chris Lattner	68de602	2010-10-24 16:18:00 +0000	[diff] [blame]	2216	</div>
				2217
				2218	<!-- _______________________________________________________________________ -->
				2219	<div class="doc_subsubsection" id="feat_tailcall">Tail Calls</div>
				2220
				2221	<div class="doc_text">
				2222
				2223	<p>This box indicates whether the target supports guaranteed tail calls. These
				2224	are calls marked "<a href="LangRef.html#i_call">tail</a>" and use the fastcc
				2225	calling convention. Please see the <a href="#tailcallopt">tail call section
				2226	more more details</a>.</p>
				2227
				2228	</div>
				2229
				2230
				2231
				2232
				2233	<!-- ======================================================================= -->
				2234	<div class="doc_subsection">
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2235	<a name="tailcallopt">Tail call optimization</a>
				2236	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2237
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2238	<div class="doc_text">
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2239
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2240	<p>Tail call optimization, callee reusing the stack of the caller, is currently
				2241	supported on x86/x86-64 and PowerPC. It is performed if:</p>
				2242
				2243	<ul>
Chris Lattner	2968943	2010-03-11 00:22:57 +0000	[diff] [blame]	2244	<li>Caller and callee have the calling convention <tt>fastcc</tt> or
				2245	<tt>cc 10</tt> (GHC call convention).</li>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2246
				2247	<li>The call is a tail call - in tail position (ret immediately follows call
				2248	and ret uses value of call or is void).</li>
				2249
				2250	<li>Option <tt>-tailcallopt</tt> is enabled.</li>
				2251
				2252	<li>Platform specific constraints are met.</li>
				2253	</ul>
				2254
				2255	<p>x86/x86-64 constraints:</p>
				2256
				2257	<ul>
				2258	<li>No variable argument lists are used.</li>
				2259
				2260	<li>On x86-64 when generating GOT/PIC code only module-local calls (visibility
				2261	= hidden or protected) are supported.</li>
				2262	</ul>
				2263
				2264	<p>PowerPC constraints:</p>
				2265
				2266	<ul>
				2267	<li>No variable argument lists are used.</li>
				2268
				2269	<li>No byval parameters are used.</li>
				2270
				2271	<li>On ppc32/64 GOT/PIC only module-local calls (visibility = hidden or protected) are supported.</li>
				2272	</ul>
				2273
				2274	<p>Example:</p>
				2275
				2276	<p>Call as <tt>llc -tailcallopt test.ll</tt>.</p>
				2277
				2278	<div class="doc_code">
				2279	<pre>
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2280	declare fastcc i32 @tailcallee(i32 inreg %a1, i32 inreg %a2, i32 %a3, i32 %a4)
				2281
				2282	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
				2283	%l1 = add i32 %in1, %in2
				2284	%tmp = tail call fastcc i32 @tailcallee(i32 %in1 inreg, i32 %in2 inreg, i32 %in1, i32 %l1)
				2285	ret i32 %tmp
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2286	}
				2287	</pre>
				2288	</div>
				2289
				2290	<p>Implications of <tt>-tailcallopt</tt>:</p>
				2291
				2292	<p>To support tail call optimization in situations where the callee has more
				2293	arguments than the caller a 'callee pops arguments' convention is used. This
				2294	currently causes each <tt>fastcc</tt> call that is not tail call optimized
				2295	(because one or more of above constraints are not met) to be followed by a
				2296	readjustment of the stack. So performance might be worse in such cases.</p>
				2297
Arnold Schwaighofer	9097d14	2008-05-14 09:17:12 +0000	[diff] [blame]	2298	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2299	<!-- ======================================================================= -->
				2300	<div class="doc_subsection">
Evan Cheng	dc444e9	2010-03-08 21:05:02 +0000	[diff] [blame]	2301	<a name="sibcallopt">Sibling call optimization</a>
				2302	</div>
				2303
				2304	<div class="doc_text">
				2305
				2306	<p>Sibling call optimization is a restricted form of tail call optimization.
				2307	Unlike tail call optimization described in the previous section, it can be
				2308	performed automatically on any tail calls when <tt>-tailcallopt</tt> option
				2309	is not specified.</p>
				2310
				2311	<p>Sibling call optimization is currently performed on x86/x86-64 when the
				2312	following constraints are met:</p>
				2313
				2314	<ul>
				2315	<li>Caller and callee have the same calling convention. It can be either
				2316	<tt>c</tt> or <tt>fastcc</tt>.
				2317
				2318	<li>The call is a tail call - in tail position (ret immediately follows call
				2319	and ret uses value of call or is void).</li>
				2320
				2321	<li>Caller and callee have matching return type or the callee result is not
				2322	used.
				2323
				2324	<li>If any of the callee arguments are being passed in stack, they must be
				2325	available in caller's own incoming argument stack and the frame offsets
				2326	must be the same.
				2327	</ul>
				2328
				2329	<p>Example:</p>
				2330	<div class="doc_code">
				2331	<pre>
				2332	declare i32 @bar(i32, i32)
				2333
				2334	define i32 @foo(i32 %a, i32 %b, i32 %c) {
				2335	entry:
				2336	%0 = tail call i32 @bar(i32 %a, i32 %b)
				2337	ret i32 %0
				2338	}
				2339	</pre>
				2340	</div>
				2341
				2342	</div>
				2343	<!-- ======================================================================= -->
				2344	<div class="doc_subsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2345	<a name="x86">The X86 backend</a>
				2346	</div>
				2347
				2348	<div class="doc_text">
				2349
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2350	<p>The X86 code generator lives in the <tt>lib/Target/X86</tt> directory. This
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2351	code generator is capable of targeting a variety of x86-32 and x86-64
				2352	processors, and includes support for ISA extensions such as MMX and SSE.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2353
				2354	</div>
				2355
				2356	<!-- _______________________________________________________________________ -->
				2357	<div class="doc_subsubsection">
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2358	<a name="x86_tt">X86 Target Triples supported</a>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2359	</div>
				2360
				2361	<div class="doc_text">
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2362
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2363	<p>The following are the known target triples that are supported by the X86
				2364	backend. This is not an exhaustive list, and it would be useful to add those
				2365	that people test.</p>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2366
				2367	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2368	<li><b>i686-pc-linux-gnu</b> — Linux</li>
				2369
				2370	<li><b>i386-unknown-freebsd5.3</b> — FreeBSD 5.3</li>
				2371
				2372	<li><b>i686-pc-cygwin</b> — Cygwin on Win32</li>
				2373
				2374	<li><b>i686-pc-mingw32</b> — MingW on Win32</li>
				2375
				2376	<li><b>i386-pc-mingw32msvc</b> — MingW crosscompiler on Linux</li>
				2377
				2378	<li><b>i686-apple-darwin*</b> — Apple Darwin on X86</li>
Torok Edwin	c457b65	2009-06-15 12:17:44 +0000	[diff] [blame]	2379
				2380	<li><b>x86_64-unknown-linux-gnu</b> — Linux</li>
Chris Lattner	9b988be	2005-07-12 00:20:49 +0000	[diff] [blame]	2381	</ul>
				2382
				2383	</div>
				2384
				2385	<!-- _______________________________________________________________________ -->
				2386	<div class="doc_subsubsection">
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2387	<a name="x86_cc">X86 Calling Conventions supported</a>
				2388	</div>
				2389
				2390
				2391	<div class="doc_text">
				2392
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2393	<p>The following target-specific calling conventions are known to backend:</p>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2394
				2395	<ul>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2396	<li><b>x86_StdCall</b> — stdcall calling convention seen on Microsoft
				2397	Windows platform (CC ID = 64).</li>
				2398
				2399	<li><b>x86_FastCall</b> — fastcall calling convention seen on Microsoft
				2400	Windows platform (CC ID = 65).</li>
Anton Korobeynikov	bcb9770	2006-09-17 20:25:45 +0000	[diff] [blame]	2401	</ul>
				2402
				2403	</div>
				2404
				2405	<!-- _______________________________________________________________________ -->
				2406	<div class="doc_subsubsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2407	<a name="x86_memory">Representing X86 addressing modes in MachineInstrs</a>
				2408	</div>
				2409
				2410	<div class="doc_text">
				2411
Misha Brukman	600df45	2005-02-17 22:22:24 +0000	[diff] [blame]	2412	<p>The x86 has a very flexible way of accessing memory. It is capable of
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2413	forming memory addresses of the following expression directly in integer
				2414	instructions (which use ModR/M addressing):</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2415
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2416	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2417	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2418	SegmentReg: Base + [1,2,4,8] * IndexReg + Disp32
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2419	</pre>
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2420	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2421
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2422	<p>In order to represent this, LLVM tracks no less than 5 operands for each
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2423	memory operand of this form. This means that the "load" form of
				2424	'<tt>mov</tt>' has the following <tt>MachineOperand</tt>s in this order:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2425
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2426	<div class="doc_code">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2427	<pre>
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2428	Index: 0 \| 1 2 3 4 5
				2429	Meaning: DestReg, \| BaseReg, Scale, IndexReg, Displacement Segment
				2430	OperandTy: VirtReg, \| VirtReg, UnsImm, VirtReg, SignExtImm PhysReg
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2431	</pre>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2432	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2433
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2434	<p>Stores, and all other instructions, treat the four memory operands in the
Chris Lattner	b91227d	2009-10-10 21:30:55 +0000	[diff] [blame]	2435	same way and in the same order. If the segment register is unspecified
				2436	(regno = 0), then no segment override is generated. "Lea" operations do not
				2437	have a segment register specified, so they only have 4 operands for their
				2438	memory reference.</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2439
				2440	</div>
				2441
				2442	<!-- _______________________________________________________________________ -->
				2443	<div class="doc_subsubsection">
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2444	<a name="x86_memory">X86 address spaces supported</a>
				2445	</div>
				2446
				2447	<div class="doc_text">
				2448
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2449	<p>x86 has an experimental feature which provides
				2450	the ability to perform loads and stores to different address spaces
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2451	via the x86 segment registers. A segment override prefix byte on an
				2452	instruction causes the instruction's memory access to go to the specified
				2453	segment. LLVM address space 0 is the default address space, which includes
				2454	the stack, and any unqualified memory accesses in a program. Address spaces
				2455	1-255 are currently reserved for user-defined code. The GS-segment is
Chris Lattner	1777d0c	2009-05-05 18:52:19 +0000	[diff] [blame]	2456	represented by address space 256, while the FS-segment is represented by
				2457	address space 257. Other x86 segments have yet to be allocated address space
				2458	numbers.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2459
Dan Gohman	d26795a	2009-05-05 20:48:47 +0000	[diff] [blame]	2460	<p>While these address spaces may seem similar to TLS via the
				2461	<tt>thread_local</tt> keyword, and often use the same underlying hardware,
				2462	there are some fundamental differences.</p>
				2463
				2464	<p>The <tt>thread_local</tt> keyword applies to global variables and
				2465	specifies that they are to be allocated in thread-local memory. There are
				2466	no type qualifiers involved, and these variables can be pointed to with
				2467	normal pointers and accessed with normal loads and stores.
				2468	The <tt>thread_local</tt> keyword is target-independent at the LLVM IR
				2469	level (though LLVM doesn't yet have implementations of it for some
				2470	configurations).<p>
				2471
				2472	<p>Special address spaces, in contrast, apply to static types. Every
				2473	load and store has a particular address space in its address operand type,
				2474	and this is what determines which address space is accessed.
				2475	LLVM ignores these special address space qualifiers on global variables,
				2476	and does not provide a way to directly allocate storage in them.
				2477	At the LLVM IR level, the behavior of these special address spaces depends
				2478	in part on the underlying OS or runtime environment, and they are specific
				2479	to x86 (and LLVM doesn't yet handle them correctly in some cases).</p>
				2480
				2481	<p>Some operating systems and runtime environments use (or may in the future
				2482	use) the FS/GS-segment registers for various low-level purposes, so care
				2483	should be taken when considering them.</p>
Nate Begeman	3450984	2009-01-26 02:54:45 +0000	[diff] [blame]	2484
				2485	</div>
				2486
				2487	<!-- _______________________________________________________________________ -->
				2488	<div class="doc_subsubsection">
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2489	<a name="x86_names">Instruction naming</a>
				2490	</div>
				2491
				2492	<div class="doc_text">
				2493
Bill Wendling	91e10c4	2006-08-28 02:26:32 +0000	[diff] [blame]	2494	<p>An instruction name consists of the base name, a default operand size, and a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2495	a character per operand with an optional special size. For example:</p>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2496
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2497	<div class="doc_code">
				2498	<pre>
				2499	ADD8rr -> add, 8-bit register, 8-bit register
				2500	IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
				2501	IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
				2502	MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
				2503	</pre>
				2504	</div>
Chris Lattner	ec94f80	2004-06-04 00:16:02 +0000	[diff] [blame]	2505
				2506	</div>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2507
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2508	<!-- ======================================================================= -->
				2509	<div class="doc_subsection">
				2510	<a name="ppc">The PowerPC backend</a>
				2511	</div>
				2512
				2513	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2514
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2515	<p>The PowerPC code generator lives in the lib/Target/PowerPC directory. The
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2516	code generation is retargetable to several variations or <i>subtargets</i> of
				2517	the PowerPC ISA; including ppc32, ppc64 and altivec.</p>
				2518
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2519	</div>
				2520
				2521	<!-- _______________________________________________________________________ -->
				2522	<div class="doc_subsubsection">
				2523	<a name="ppc_abi">LLVM PowerPC ABI</a>
				2524	</div>
				2525
				2526	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2527
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2528	<p>LLVM follows the AIX PowerPC ABI, with two deviations. LLVM uses a PC
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2529	relative (PIC) or static addressing for accessing global values, so no TOC
				2530	(r2) is used. Second, r31 is used as a frame pointer to allow dynamic growth
				2531	of a stack frame. LLVM takes advantage of having no TOC to provide space to
				2532	save the frame pointer in the PowerPC linkage area of the caller frame.
				2533	Other details of PowerPC ABI can be found at <a href=
				2534	"http://developer.apple.com/documentation/DeveloperTools/Conceptual/LowLevelABI/Articles/32bitPowerPC.html"
				2535	>PowerPC ABI.</a> Note: This link describes the 32 bit ABI. The 64 bit ABI
				2536	is similar except space for GPRs are 8 bytes wide (not 4) and r13 is reserved
				2537	for system use.</p>
				2538
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2539	</div>
				2540
				2541	<!-- _______________________________________________________________________ -->
				2542	<div class="doc_subsubsection">
				2543	<a name="ppc_frame">Frame Layout</a>
				2544	</div>
				2545
				2546	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2547
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2548	<p>The size of a PowerPC frame is usually fixed for the duration of a
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2549	function's invocation. Since the frame is fixed size, all references
				2550	into the frame can be accessed via fixed offsets from the stack pointer. The
				2551	exception to this is when dynamic alloca or variable sized arrays are
				2552	present, then a base pointer (r31) is used as a proxy for the stack pointer
				2553	and stack pointer is free to grow or shrink. A base pointer is also used if
				2554	llvm-gcc is not passed the -fomit-frame-pointer flag. The stack pointer is
				2555	always aligned to 16 bytes, so that space allocated for altivec vectors will
				2556	be properly aligned.</p>
				2557
Dan Gohman	641b279	2008-11-24 16:27:17 +0000	[diff] [blame]	2558	<p>An invocation frame is laid out as follows (low memory at top);</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2559
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2560	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2561	<tr>
				2562	<td>Linkage<br><br></td>
				2563	</tr>
				2564	<tr>
				2565	<td>Parameter area<br><br></td>
				2566	</tr>
				2567	<tr>
				2568	<td>Dynamic area<br><br></td>
				2569	</tr>
				2570	<tr>
				2571	<td>Locals area<br><br></td>
				2572	</tr>
				2573	<tr>
				2574	<td>Saved registers area<br><br></td>
				2575	</tr>
				2576	<tr style="border-style: none hidden none hidden;">
				2577	<td><br></td>
				2578	</tr>
				2579	<tr>
				2580	<td>Previous Frame<br><br></td>
				2581	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2582	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2583
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2584	<p>The <i>linkage</i> area is used by a callee to save special registers prior
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2585	to allocating its own frame. Only three entries are relevant to LLVM. The
				2586	first entry is the previous stack pointer (sp), aka link. This allows
				2587	probing tools like gdb or exception handlers to quickly scan the frames in
				2588	the stack. A function epilog can also use the link to pop the frame from the
				2589	stack. The third entry in the linkage area is used to save the return
				2590	address from the lr register. Finally, as mentioned above, the last entry is
				2591	used to save the previous frame pointer (r31.) The entries in the linkage
				2592	area are the size of a GPR, thus the linkage area is 24 bytes long in 32 bit
				2593	mode and 48 bytes in 64 bit mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2594
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2595	<p>32 bit linkage area</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2596
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2597	<table class="layout">
				2598	<tr>
				2599	<td>0</td>
				2600	<td>Saved SP (r1)</td>
				2601	</tr>
				2602	<tr>
				2603	<td>4</td>
				2604	<td>Saved CR</td>
				2605	</tr>
				2606	<tr>
				2607	<td>8</td>
				2608	<td>Saved LR</td>
				2609	</tr>
				2610	<tr>
				2611	<td>12</td>
				2612	<td>Reserved</td>
				2613	</tr>
				2614	<tr>
				2615	<td>16</td>
				2616	<td>Reserved</td>
				2617	</tr>
				2618	<tr>
				2619	<td>20</td>
				2620	<td>Saved FP (r31)</td>
				2621	</tr>
				2622	</table>
				2623
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2624	<p>64 bit linkage area</p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2625
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2626	<table class="layout">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2627	<tr>
				2628	<td>0</td>
				2629	<td>Saved SP (r1)</td>
				2630	</tr>
				2631	<tr>
				2632	<td>8</td>
				2633	<td>Saved CR</td>
				2634	</tr>
				2635	<tr>
				2636	<td>16</td>
				2637	<td>Saved LR</td>
				2638	</tr>
				2639	<tr>
				2640	<td>24</td>
				2641	<td>Reserved</td>
				2642	</tr>
				2643	<tr>
				2644	<td>32</td>
				2645	<td>Reserved</td>
				2646	</tr>
				2647	<tr>
				2648	<td>40</td>
				2649	<td>Saved FP (r31)</td>
				2650	</tr>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2651	</table>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2652
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2653	<p>The <i>parameter area</i> is used to store arguments being passed to a callee
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2654	function. Following the PowerPC ABI, the first few arguments are actually
				2655	passed in registers, with the space in the parameter area unused. However,
				2656	if there are not enough registers or the callee is a thunk or vararg
				2657	function, these register arguments can be spilled into the parameter area.
				2658	Thus, the parameter area must be large enough to store all the parameters for
				2659	the largest call sequence made by the caller. The size must also be
				2660	minimally large enough to spill registers r3-r10. This allows callees blind
				2661	to the call signature, such as thunks and vararg functions, enough space to
				2662	cache the argument registers. Therefore, the parameter area is minimally 32
				2663	bytes (64 bytes in 64 bit mode.) Also note that since the parameter area is
				2664	a fixed offset from the top of the frame, that a callee can access its spilt
				2665	arguments using fixed offsets from the stack pointer (or base pointer.)</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2666
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2667	<p>Combining the information about the linkage, parameter areas and alignment. A
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2668	stack frame is minimally 64 bytes in 32 bit mode and 128 bytes in 64 bit
				2669	mode.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2670
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2671	<p>The <i>dynamic area</i> starts out as size zero. If a function uses dynamic
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2672	alloca then space is added to the stack, the linkage and parameter areas are
				2673	shifted to top of stack, and the new space is available immediately below the
				2674	linkage and parameter areas. The cost of shifting the linkage and parameter
				2675	areas is minor since only the link value needs to be copied. The link value
				2676	can be easily fetched by adding the original frame size to the base pointer.
				2677	Note that allocations in the dynamic space need to observe 16 byte
				2678	alignment.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2679
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2680	<p>The <i>locals area</i> is where the llvm compiler reserves space for local
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2681	variables.</p>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2682
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2683	<p>The <i>saved registers area</i> is where the llvm compiler spills callee
				2684	saved registers on entry to the callee.</p>
				2685
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2686	</div>
				2687
				2688	<!-- _______________________________________________________________________ -->
				2689	<div class="doc_subsubsection">
				2690	<a name="ppc_prolog">Prolog/Epilog</a>
				2691	</div>
				2692
				2693	<div class="doc_text">
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2694
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2695	<p>The llvm prolog and epilog are the same as described in the PowerPC ABI, with
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2696	the following exceptions. Callee saved registers are spilled after the frame
				2697	is created. This allows the llvm epilog/prolog support to be common with
				2698	other targets. The base pointer callee saved register r31 is saved in the
				2699	TOC slot of linkage area. This simplifies allocation of space for the base
				2700	pointer and makes it convenient to locate programatically and during
				2701	debugging.</p>
				2702
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2703	</div>
				2704
				2705	<!-- _______________________________________________________________________ -->
				2706	<div class="doc_subsubsection">
				2707	<a name="ppc_dynamic">Dynamic Allocation</a>
				2708	</div>
				2709
				2710	<div class="doc_text">
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2711
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2712	<p><i>TODO - More to come.</i></p>
Bill Wendling	8011880	2009-04-15 02:12:37 +0000	[diff] [blame]	2713
Jim Laskey	b744c25	2006-12-15 10:40:48 +0000	[diff] [blame]	2714	</div>
Jim Laskey	762b6cb	2006-12-14 17:19:50 +0000	[diff] [blame]	2715
				2716
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2717	<!-- *********************************************************************** -->
				2718	<hr>
				2719	<address>
				2720	<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
Misha Brukman	4440870	2008-12-11 17:34:48 +0000	[diff] [blame]	2721	src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2722	<a href="http://validator.w3.org/check/referer"><img
Misha Brukman	f00ddb0	2008-12-11 18:23:24 +0000	[diff] [blame]	2723	src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2724
				2725	<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
Reid Spencer	05fe4b0	2006-03-14 05:39:39 +0000	[diff] [blame]	2726	<a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
Chris Lattner	ce52b7e	2004-06-01 06:48:00 +0000	[diff] [blame]	2727	Last modified: $Date$
				2728	</address>
				2729
				2730	</body>
				2731	</html>