Blame - lld/docs/design.rst - toolchain/llvm-project

blob: 21ebcbf104925e86d4cd01e2a750f17a3b8991c1 [file] [log] [blame]

Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	1	.. _design:
				2
				3	Linker Design
				4	=============
				5
				6	Introduction
				7	------------
				8
				9	lld is a new generation of linker. It is not "section" based like traditional
				10	linkers which mostly just interlace sections from multiple object files into the
				11	output file. Instead, lld is based on "Atoms". Traditional section based
				12	linking work well for simple linking, but their model makes advanced linking
				13	features difficult to implement. Features like dead code stripping, reordering
				14	functions for locality, and C++ coalescing require the linker to work at a finer
				15	grain.
				16
				17	An atom is an indivisible chunk of code or data. An atom has a set of
				18	attributes, such as: name, scope, content-type, alignment, etc. An atom also
				19	has a list of References. A Reference contains: a kind, an optional offset, an
				20	optional addend, and an optional target atom.
				21
				22	The Atom model allows the linker to use standard graph theory models for linking
				23	data structures. Each atom is a node, and each Reference is an edge. The
				24	feature of dead code stripping is implemented by following edges to mark all
				25	live atoms, and then delete the non-live atoms.
				26
				27
				28	Atom Model
				29	----------
				30
Michael J. Spencer	aa53d68	2012-04-25 19:59:06 +0000	[diff] [blame]	31	An atom is an indivisible chunk of code or data. Typically each user written
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	32	function or global variable is an atom. In addition, the compiler may emit
				33	other atoms, such as for literal c-strings or floating point constants, or for
				34	runtime data structures like dwarf unwind info or pointers to initializers.
				35
				36	A simple "hello world" object file would be modeled like this:
				37
				38	.. image:: hello.png
				39
				40	There are three atoms: main, a proxy for printf, and an anonymous atom
				41	containing the c-string literal "hello world". The Atom "main" has two
				42	references. One is the call site for the call to printf, and the other is a
Michael J. Spencer	aa53d68	2012-04-25 19:59:06 +0000	[diff] [blame]	43	reference for the instruction that loads the address of the c-string literal.
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	44
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	45	There are only four different types of atoms:
				46
				47	* DefinedAtom
				48	95% of all atoms. This is a chunk of code or data
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	49
				50	* UndefinedAtom
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	51	This is a place holder in object files for a reference to some atom
				52	outside the translation unit.During core linking it is usually replaced
				53	by (coalesced into) another Atom.
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	54
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	55	* SharedLibraryAtom
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	56	If a required symbol name turns out to be defined in a dynamic shared
				57	library (and not some object file). A SharedLibraryAtom is the
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	58	placeholder Atom used to represent that fact.
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	59
				60	It is similar to an UndefinedAtom, but it also tracks information
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	61	about the associated shared library.
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	62
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	63	* AbsoluteAtom
				64	This is for embedded support where some stuff is implemented in ROM at
				65	some fixed address. This atom has no content. It is just an address
Alex Rosenberg	b65e888	2013-02-03 07:05:26 +0000	[diff] [blame]	66	that the Writer needs to fix up any references to point to.
Marshall Clow	341f496	2012-07-18 23:20:40 +0000	[diff] [blame]	67
				68
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	69	File Model
				70	----------
				71
				72	The linker views the input files as basically containers of Atoms and
				73	References, and just a few attributes of their own. The linker works with three
				74	kinds of files: object files, static libraries, and dynamic shared libraries.
				75	Each kind of file has reader object which presents the file in the model
				76	expected by the linker.
				77
				78	Object File
				79	~~~~~~~~~~~
				80
				81	An object file is just a container of atoms. When linking an object file, a
				82	reader is instantiated which parses the object file and instantiates a set of
				83	atoms representing all content in the .o file. The linker adds all those atoms
				84	to a master graph.
				85
				86	Static Library (Archive)
				87	~~~~~~~~~~~~~~~~~~~~~~~~
				88
				89	This is the traditional unix static archive which is just a collection of object
				90	files with a "table of contents". When linking with a static library, by default
				91	nothing is added to the master graph of atoms. Instead, if after merging all
				92	atoms from object files into a master graph, if any "undefined" atoms are left
				93	remaining in the master graph, the linker reads the table of contents for each
				94	static library to see if any have the needed definitions. If so, the set of
				95	atoms from the specified object file in the static library is added to the
				96	master graph of atoms.
				97
				98	Dynamic Library (Shared Object)
				99	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				100
				101	Dynamic libraries are different than object files and static libraries in that
				102	they don't directly add any content. Their purpose is to check at build time
				103	that the remaining undefined references can be resolved at runtime, and provide
				104	a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. The way
				105	this is modeled in the linker is that a dynamic library contributes no atoms to
				106	the initial graph of atoms. Instead, (like static libraries) if there are
				107	"undefined" atoms in the master graph of all atoms, then each dynamic library is
				108	checked to see if exports the required symbol. If so, a "shared library" atom is
				109	instantiated by the by the reader which the linker uses to replace the
				110	"undefined" atom.
				111
				112	Linking Steps
				113	-------------
				114
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	115	Through the use of abstract Atoms, the core of linking is architecture
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	116	independent and file format independent. All command line parsing is factored
				117	out into a separate "options" abstraction which enables the linker to be driven
				118	with different command line sets.
				119
				120	The overall steps in linking are:
				121
				122	#. Command line processing
				123
				124	#. Parsing input files
				125
				126	#. Resolving
				127
				128	#. Passes/Optimizations
				129
				130	#. Generate output file
				131
				132	The Resolving and Passes steps are done purely on the master graph of atoms, so
				133	they have no notion of file formats such as mach-o or ELF.
				134
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	135
				136	Input Files
				137	~~~~~~~~~~~
				138
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	139	Existing developer tools using different file formats for object files.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	140	A goal of lld is to be file format independent. This is done
				141	through a plug-in model for reading object files. The lld::Reader is the base
				142	class for all object file readers. A Reader follows the factory method pattern.
				143	A Reader instantiates an lld::File object (which is a graph of Atoms) from a
				144	given object file (on disk or in-memory).
				145
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	146	Every Reader subclass defines its own "options" class (for instance the mach-o
				147	Reader defines the class ReaderOptionsMachO). This options class is the
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	148	one-and-only way to control how the Reader operates when parsing an input file
				149	into an Atom graph. For instance, you may want the Reader to only accept
				150	certain architectures. The options class can be instantiated from command
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	151	line options, or it can be subclassed and the ivars programmatically set.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	152
				153
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	154	Resolving
				155	~~~~~~~~~
				156
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	157	The resolving step takes all the atoms' graphs from each object file and
				158	combines them into one master object graph. Unfortunately, it is not as simple
				159	as appending the atom list from each file into one big list. There are many
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	160	cases where atoms need to be coalesced. That is, two or more atoms need to be
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	161	coalesced into one atom. This is necessary to support: C language "tentative
				162	definitions", C++ weak symbols for templates and inlines defined in headers,
				163	replacing undefined atoms with actual definition atoms, and for merging copies
				164	of constants like c-strings and floating point constants.
				165
				166	The linker support coalescing by-name and by-content. By-name is used for
				167	tentative definitions and weak symbols. By-content is used for constant data
				168	that can be merged.
				169
				170	The resolving process maintains some global linking "state", including a "symbol
				171	table" which is a map from llvm::StringRef to lld::Atom*. With these data
Gabor Greif	c52fc9e	2012-04-25 21:09:37 +0000	[diff] [blame]	172	structures, the linker iterates all atoms in all input files. For each atom, it
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	173	checks if the atom is named and has a global or hidden scope. If so, the atom
				174	is added to the symbol table map. If there already is a matching atom in that
				175	table, that means the current atom needs to be coalesced with the found atom, or
				176	it is a multiple definition error.
				177
				178	When all initial input file atoms have been processed by the resolver, a scan is
				179	made to see if there are any undefined atoms in the graph. If there are, the
				180	linker scans all libraries (both static and dynamic) looking for definitions to
				181	replace the undefined atoms. It is an error if any undefined atoms are left
				182	remaining.
				183
				184	Dead code stripping (if requested) is done at the end of resolving. The linker
				185	does a simple mark-and-sweep. It starts with "root" atoms (like "main" in a main
				186	executable) and follows each references and marks each Atom that it visits as
				187	"live". When done, all atoms not marked "live" are removed.
				188
				189	The result of the Resolving phase is the creation of an lld::File object. The
Nick Kledzik	bb963df	2012-04-18 21:55:06 +0000	[diff] [blame]	190	goal is that the lld::File model is the internal representation
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	191	throughout the linker. The file readers parse (mach-o, ELF, COFF) into an
				192	lld::File. The file writers (mach-o, ELF, COFF) taken an lld::File and produce
				193	their file kind, and every Pass only operates on an lld::File. This is not only
				194	a simpler, consistent model, but it enables the state of the linker to be dumped
				195	at any point in the link for testing purposes.
				196
				197
				198	Passes
				199	~~~~~~
				200
				201	The Passes step is an open ended set of routines that each get a change to
				202	modify or enhance the current lld::File object. Some example Passes are:
				203
				204	* stub (PLT) generation
				205
				206	* GOT instantiation
				207
				208	* order_file optimization
				209
				210	* branch island generation
				211
				212	* branch shim generation
				213
				214	* Objective-C optimizations (Darwin specific)
				215
				216	* TLV instantiation (Darwin specific)
				217
Alex Rosenberg	b65e888	2013-02-03 07:05:26 +0000	[diff] [blame]	218	* DTrace probe processing (Darwin specific)
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	219
				220	* compact unwind encoding (Darwin specific)
				221
				222
				223	Some of these passes are specific to Darwin's runtime environments. But many of
				224	the passes are applicable to any OS (such as generating branch island for out of
				225	range branch instructions).
				226
				227	The general structure of a pass is to iterate through the atoms in the current
				228	lld::File object, inspecting each atom and doing something. For instance, the
				229	stub pass, looks for call sites to shared library atoms (e.g. call to printf).
				230	It then instantiates a "stub" atom (PLT entry) and a "lazy pointer" atom for
				231	each proxy atom needed, and these new atoms are added to the current lld::File
				232	object. Next, all the noted call sites to shared library atoms have their
				233	References altered to point to the stub atom instead of the shared library atom.
				234
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	235
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	236	Generate Output File
				237	~~~~~~~~~~~~~~~~~~~~
				238
				239	Once the passes are done, the output file writer is given current lld::File
				240	object. The writer's job is to create the executable content file wrapper and
				241	place the content of the atoms into it.
				242
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	243	lld uses a plug-in model for writing output files. All concrete writers (e.g.
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	244	ELF, mach-o, etc) are subclasses of the lld::Writer class.
Nick Kledzik	bb963df	2012-04-18 21:55:06 +0000	[diff] [blame]	245
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	246	Unlike the Reader class which has just one method to instantiate an lld::File,
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	247	the Writer class has multiple methods. The crucial method is to generate the
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	248	output file, but there are also methods which allow the Writer to contribute
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	249	Atoms to the resolver and specify passes to run.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	250
				251	An example of contributing
				252	atoms is that if the Writer knows a main executable is being linked and such
				253	an executable requires a specially named entry point (e.g. "_main"), the Writer
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	254	can add an UndefinedAtom with that special name to the resolver. This will
				255	cause the resolver to issue an error if that symbol is not defined.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	256
				257	Sometimes a Writer supports lazily created symbols, such as names for the start
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	258	of sections. To support this, the Writer can create a File object which vends
				259	no initial atoms, but does lazily supply atoms by name as needed.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	260
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	261	Every Writer subclass defines its own "options" class (for instance the mach-o
				262	Writer defines the class WriterOptionsMachO). This options class is the
				263	one-and-only way to control how the Writer operates when producing an output
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	264	file from an Atom graph. For instance, you may want the Writer to optimize
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	265	the output for certain OS versions, or strip local symbols, etc. The options
				266	class can be instantiated from command line options, or it can be subclassed
				267	and the ivars programmatically set.
Nick Kledzik	bb963df	2012-04-18 21:55:06 +0000	[diff] [blame]	268
				269
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	270	lld::File representations
				271	-------------------------
				272
				273	Just as LLVM has three representations of its IR model, lld has three
				274	representations of its File/Atom/Reference model:
				275
				276	* In memory, abstract C++ classes (lld::Atom, lld::Reference, and lld::File).
				277
				278	* textual (in YAML)
				279
				280	* binary format ("native")
				281
				282	Binary File Format
				283	~~~~~~~~~~~~~~~~~~
				284
				285	In theory, lld::File objects could be written to disk in an existing Object File
				286	format standard (e.g. ELF). Instead we choose to define a new binary file
				287	format. There are two main reasons for this: fidelity and performance. In order
				288	for lld to work as a linker on all platforms, its internal model must be rich
				289	enough to model all CPU and OS linking features. But if we choose an existing
				290	Object File format as the lld binary format, that means an on going need to
				291	retrofit each platform specific feature needed from alternate platforms into the
				292	existing Object File format. Having our own "native" binary format side steps
				293	that issue. We still need to be able to binary encode all the features, but
				294	once the in-memory model can represent the feature, it is straight forward to
				295	binary encode it.
				296
				297	The reason to use a binary file format at all, instead of a textual file format,
				298	is speed. You want the binary format to be as fast as possible to read into the
				299	in-memory model. Given that we control the in-memory model and the binary
				300	format, the obvious way to make reading super fast it to make the file format be
				301	basically just an array of atoms. The reader just mmaps in the file and looks
				302	at the header to see how many atoms there are and instantiate that many atom
				303	objects with the atom attribute information coming from that array. The trick
				304	is designing this in a way that can be extended as the Atom mode evolves and new
				305	attributes are added.
				306
				307	The native object file format starts with a header that lists how many "chunks"
				308	are in the file. A chunk is an array of "ivar data". The native file reader
				309	instantiates an array of Atom objects (with one large malloc call). Each atom
				310	contains just a pointer to its vtable and a pointer to its ivar data. All
				311	methods on lld::Atom are virtual, so all the method implementations return
				312	values based on the ivar data to which it has a pointer. If a new linking
				313	features is added which requires a change to the lld::Atom model, a new native
				314	reader class (e.g. version 2) is defined which knows how to read the new feature
				315	information from the new ivar data. The old reader class (e.g. version 1) is
				316	updated to do its best to model (the lack of the new feature) given the old ivar
				317	data in existing native object files.
				318
				319	With this model for the native file format, files can be read and turned
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	320	into the in-memory graph of lld::Atoms with just a few memory allocations.
Gabor Greif	c52fc9e	2012-04-25 21:09:37 +0000	[diff] [blame]	321	And the format can easily adapt over time to new features.
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	322
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	323	The binary file format follows the ReaderWriter patterns used in lld. The lld
				324	library comes with the classes: ReaderNative and WriterNative. So, switching
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	325	between file formats is as easy as switching which Reader subclass is used.
				326
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	327
				328	Textual representations in YAML
				329	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				330
				331	In designing a textual format we want something easy for humans to read and easy
				332	for the linker to parse. Since an atom has lots of attributes most of which are
				333	usually just the default, we should define default values for every attribute so
				334	that those can be omitted from the text representation. Here is the atoms for a
				335	simple hello world program expressed in YAML::
				336
				337	target-triple: x86_64-apple-darwin11
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	338
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	339	atoms:
				340	- name: _main
				341	scope: global
				342	type: code
				343	content: [ 55, 48, 89, e5, 48, 8d, 3d, 00, 00, 00, 00, 30, c0, e8, 00, 00,
				344	00, 00, 31, c0, 5d, c3 ]
				345	fixups:
				346	- offset: 07
				347	kind: pcrel32
				348	target: 2
				349	- offset: 0E
				350	kind: call32
				351	target: _fprintf
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	352
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	353	- type: c-string
				354	content: [ 73, 5A, 00 ]
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	355
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	356	...
				357
				358	The biggest use for the textual format will be writing test cases. Writing test
				359	cases in C is problematic because the compiler may vary its output over time for
				360	its own optimization reasons which my inadvertently disable or break the linker
				361	feature trying to be tested. By writing test cases in the linkers own textual
				362	format, we can exactly specify every attribute of every atom and thus target
				363	specific linker logic.
				364
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	365	The textual/YAML format follows the ReaderWriter patterns used in lld. The lld
				366	library comes with the classes: ReaderYAML and WriterYAML.
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	367
				368
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	369	Testing
Nick Kledzik	abb6981	2012-05-31 22:34:00 +0000	[diff] [blame]	370	-------
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	371
				372	The lld project contains a test suite which is being built up as new code is
				373	added to lld. All new lld functionality should have a tests added to the test
				374	suite. The test suite is `lit <http://llvm.org/cmds/lit.html/>`_ driven. Each
				375	test is a text file with comments telling lit how to run the test and check the
				376	result To facilitate testing, the lld project builds a tool called lld-core.
				377	This tool reads a YAML file (default from stdin), parses it into one or more
				378	lld::File objects in memory and then feeds those lld::File objects to the
				379	resolver phase. The output of the resolver is written as a native object file.
				380	It is then read back in using the native object file reader and then pass to the
				381	YAML writer. This round-about path means that all three representations
				382	(in-memory, binary, and text) are exercised, and any new feature has to work in
				383	all the representations to pass the test.
				384
				385
				386	Resolver testing
				387	~~~~~~~~~~~~~~~~
				388
				389	Basic testing is the "core linking" or resolving phase. That is where the
				390	linker merges object files. All test cases are written in YAML. One feature of
				391	YAML is that it allows multiple "documents" to be encoding in one YAML stream.
				392	That means one text file can appear to the linker as multiple .o files - the
				393	normal case for the linker.
				394
				395	Here is a simple example of a core linking test case. It checks that an
				396	undefined atom from one file will be replaced by a definition from another
				397	file::
				398
				399	# RUN: lld-core %s \| FileCheck %s
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	400
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	401	#
				402	# Test that undefined atoms are replaced with defined atoms.
				403	#
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	404
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	405	---
				406	atoms:
				407	- name: foo
				408	definition: undefined
				409	---
				410	atoms:
				411	- name: foo
				412	scope: global
				413	type: code
				414	...
Shankar Easwaran	3d8de47	2014-01-27 03:09:26 +0000	[diff] [blame^]	415
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	416	# CHECK: name: foo
				417	# CHECK: scope: global
				418	# CHECK: type: code
				419	# CHECK-NOT: name: foo
				420	# CHECK: ...
				421
				422
				423	Passes testing
				424	~~~~~~~~~~~~~~
				425
				426	Since Passes just operate on an lld::File object, the lld-core tool has the
				427	option to run a particular pass (after resolving). Thus, you can write a YAML
				428	test case with carefully crafted input to exercise areas of a Pass and the check
				429	the resulting lld::File object as represented in YAML.
				430
				431
				432	Design Issues
				433	-------------
				434
				435	There are a number of open issues in the design of lld. The plan is to wait and
				436	make these design decisions when we need to.
				437
				438
				439	Debug Info
				440	~~~~~~~~~~
				441
				442	Currently, the lld model says nothing about debug info. But the most popular
				443	debug format is DWARF and there is some impedance mismatch with the lld model
				444	and DWARF. In lld there are just Atoms and only Atoms that need to be in a
				445	special section at runtime have an associated section. Also, Atoms do not have
				446	addresses. The way DWARF is spec'ed different parts of DWARF are supposed to go
				447	into specially named sections and the DWARF references function code by address.
				448
				449	CPU and OS specific functionality
				450	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				451
				452	Currently, lld has an abstract "Platform" that deals with any CPU or OS specific
				453	differences in linking. We just keep adding virtual methods to the base
				454	Platform class as we find linking areas that might need customization. At some
				455	point we'll need to structure this better.
				456
				457
				458	File Attributes
				459	~~~~~~~~~~~~~~~
				460
				461	Currently, lld::File just has a path and a way to iterate its atoms. We will
Gabor Greif	c52fc9e	2012-04-25 21:09:37 +0000	[diff] [blame]	462	need to add more attributes on a File. For example, some equivalent to the
Daniel Dunbar	5969411	2012-04-06 21:02:24 +0000	[diff] [blame]	463	target triple. There is also a number of cached or computed attributes that
				464	could make various Passes more efficient. For instance, on Darwin there are a
				465	number of Objective-C optimizations that can be done by a Pass. But it would
				466	improve the plain C case if the Objective-C optimization Pass did not have to
				467	scan all atoms looking for any Objective-C data structures. This could be done
				468	if the lld::File object had an attribute that said if the file had any
				469	Objective-C data in it. The Resolving phase would then be required to "merge"
				470	that attribute as object files are added.