Blame - llvm/docs/tutorial/BuildingAJIT2.rst - toolchain/llvm-project

blob: 839875266a2412da2a734508719a9779d5c84b14 [file] [log] [blame]

Lang Hames	e038aae	2016-06-06 18:35:44 +0000	[diff] [blame]	1	=====================================================================
				2	Building a JIT: Adding Optimizations -- An introduction to ORC Layers
				3	=====================================================================
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	4
				5	.. contents::
				6	:local:
				7
				8	**This tutorial is under active development. It is incomplete and details may
				9	change frequently.** Nonetheless we invite you to try it out as it stands, and
				10	we welcome any feedback.
				11
				12	Chapter 2 Introduction
				13	======================
				14
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	15	Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
				16	`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
				17	class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
				18	executable code in memory. KaleidoscopeJIT was able to do this with relatively
				19	little code by composing two off-the-shelf ORC layers: IRCompileLayer and
				20	ObjectLinkingLayer, to do much of the heavy lifting.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	21
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	22	In this layer we'll learn more about the ORC layer concept by using a new layer,
				23	IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	24
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	25	Optimizing Modules using the IRTransformLayer
				26	=============================================
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	27
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	28	In `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM"
				29	tutorial series the llvm FunctionPassManager is introduced as a means for
				30	optimizing LLVM IR. Interested readers may read that chapter for details, but
Lang Hames	8705d11	2016-06-20 18:34:46 +0000	[diff] [blame]	31	in short: to optimize a Module we create an llvm::FunctionPassManager
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	32	instance, configure it with a set of optimizations, then run the PassManager on
				33	a Module to mutate it into a (hopefully) more optimized but semantically
				34	equivalent form. In the original tutorial series the FunctionPassManager was
Lang Hames	11c43d5	2016-06-20 18:37:52 +0000	[diff] [blame]	35	created outside the KaleidoscopeJIT and modules were optimized before being
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	36	added to it. In this Chapter we will make optimization a phase of our JIT
Lang Hames	11c43d5	2016-06-20 18:37:52 +0000	[diff] [blame]	37	instead. For now this will provide us a motivation to learn more about ORC
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	38	layers, but in the long term making optimization part of our JIT will yield an
				39	important benefit: When we begin lazily compiling code (i.e. deferring
				40	compilation of each function until the first time it's run), having
				41	optimization managed by our JIT will allow us to optimize lazily too, rather
				42	than having to do all our optimization up-front.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	43
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	44	To add optimization support to our JIT we will take the KaleidoscopeJIT from
				45	Chapter 1 and compose an ORC IRTransformLayer on top. We will look at how the
				46	IRTransformLayer works in more detail below, but the interface is simple: the
				47	constructor for this layer takes a reference to the layer below (as all layers
				48	do) plus an IR optimization function that it will apply to each Module that
				49	is added via addModuleSet:
				50
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	51	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	52
				53	class KaleidoscopeJIT {
				54	private:
				55	std::unique_ptr<TargetMachine> TM;
				56	const DataLayout DL;
				57	ObjectLinkingLayer<> ObjectLayer;
				58	IRCompileLayer<decltype(ObjectLayer)> CompileLayer;
				59
				60	typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)>
				61	OptimizeFunction;
				62
				63	IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer;
				64
				65	public:
				66	typedef decltype(OptimizeLayer)::ModuleSetHandleT ModuleHandle;
				67
				68	KaleidoscopeJIT()
				69	: TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
				70	CompileLayer(ObjectLayer, SimpleCompiler(*TM)),
				71	OptimizeLayer(CompileLayer,
				72	[this](std::unique_ptr<Module> M) {
				73	return optimizeModule(std::move(M));
				74	}) {
				75	llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
				76	}
				77
				78	Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
				79	but after the CompileLayer we introduce a typedef for our optimization function.
				80	In this case we use a std::function (a handy wrapper for "function-like" things)
				81	from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With
				82	our optimization function typedef in place we can declare our OptimizeLayer,
				83	which sits on top of our CompileLayer.
				84
				85	To initialize our OptimizeLayer we pass it a reference to the CompileLayer
				86	below (standard practice for layers), and we initialize the OptimizeFunction
Lang Hames	706db2e	2016-06-06 18:07:23 +0000	[diff] [blame]	87	using a lambda that calls out to an "optimizeModule" function that we will
				88	define below.
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	89
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	90	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	91
				92	// ...
				93	auto Resolver = createLambdaResolver(
				94	[&](const std::string &Name) {
				95	if (auto Sym = OptimizeLayer.findSymbol(Name, false))
Lang Hames	ad4a911	2016-08-01 20:49:11 +0000	[diff] [blame]	96	return Sym;
				97	return JITSymbol(nullptr);
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	98	},
				99	// ...
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame]	100
				101	.. code-block:: c++
				102
				103	// ...
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	104	return OptimizeLayer.addModuleSet(std::move(Ms),
				105	make_unique<SectionMemoryManager>(),
				106	std::move(Resolver));
				107	// ...
				108
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame]	109	.. code-block:: c++
				110
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	111	// ...
				112	return OptimizeLayer.findSymbol(MangledNameStream.str(), true);
				113	// ...
				114
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame]	115	.. code-block:: c++
				116
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	117	// ...
				118	OptimizeLayer.removeModuleSet(H);
				119	// ...
				120
				121	Next we need to replace references to 'CompileLayer' with references to
				122	OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In
				123	addModule we need to be careful to replace both references: the findSymbol call
				124	inside our resolver, and the call through to addModuleSet.
				125
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	126	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	127
				128	std::unique_ptr<Module> optimizeModule(std::unique_ptr<Module> M) {
				129	// Create a function pass manager.
				130	auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get());
				131
				132	// Add some optimizations.
				133	FPM->add(createInstructionCombiningPass());
				134	FPM->add(createReassociatePass());
				135	FPM->add(createGVNPass());
				136	FPM->add(createCFGSimplificationPass());
				137	FPM->doInitialization();
				138
				139	// Run the optimizations over all functions in the module being added to
				140	// the JIT.
				141	for (auto &F : *M)
				142	FPM->run(F);
				143
				144	return M;
				145	}
				146
				147	At the bottom of our JIT we add a private method to do the actual optimization:
				148	optimizeModule. This function sets up a FunctionPassManager, adds some passes
				149	to it, runs it over every function in the module, and then returns the mutated
Lang Hames	d29ee53	2016-06-06 18:22:47 +0000	[diff] [blame]	150	module. The specific optimizations are the same ones used in
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	151	`Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM"
Lang Hames	d29ee53	2016-06-06 18:22:47 +0000	[diff] [blame]	152	tutorial series. Readers may visit that chapter for a more in-depth
				153	discussion of these, and of IR optimization in general.
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	154
Lang Hames	d29ee53	2016-06-06 18:22:47 +0000	[diff] [blame]	155	And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
				156	addModule the OptimizeLayer will call our optimizeModule function before passing
				157	the transformed module on to the CompileLayer below. Of course, we could have
				158	called optimizeModule directly in our addModule function and not gone to the
				159	bother of using the IRTransformLayer, but doing so gives us another opportunity
				160	to see how layers compose. It also provides a neat entry point to the layer
				161	concept itself, because IRTransformLayer turns out to be one of the simplest
				162	implementations of the layer concept that can be devised:
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	163
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	164	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	165
				166	template <typename BaseLayerT, typename TransformFtor>
				167	class IRTransformLayer {
				168	public:
				169	typedef typename BaseLayerT::ModuleSetHandleT ModuleSetHandleT;
				170
				171	IRTransformLayer(BaseLayerT &BaseLayer,
				172	TransformFtor Transform = TransformFtor())
				173	: BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
				174
				175	template <typename ModuleSetT, typename MemoryManagerPtrT,
				176	typename SymbolResolverPtrT>
				177	ModuleSetHandleT addModuleSet(ModuleSetT Ms,
				178	MemoryManagerPtrT MemMgr,
				179	SymbolResolverPtrT Resolver) {
				180
				181	for (auto I = Ms.begin(), E = Ms.end(); I != E; ++I)
				182	I = Transform(std::move(I));
				183
				184	return BaseLayer.addModuleSet(std::move(Ms), std::move(MemMgr),
				185	std::move(Resolver));
				186	}
				187
				188	void removeModuleSet(ModuleSetHandleT H) { BaseLayer.removeModuleSet(H); }
				189
				190	JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) {
				191	return BaseLayer.findSymbol(Name, ExportedSymbolsOnly);
				192	}
				193
				194	JITSymbol findSymbolIn(ModuleSetHandleT H, const std::string &Name,
				195	bool ExportedSymbolsOnly) {
				196	return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly);
				197	}
				198
				199	void emitAndFinalize(ModuleSetHandleT H) {
				200	BaseLayer.emitAndFinalize(H);
				201	}
				202
				203	TransformFtor& getTransform() { return Transform; }
				204
				205	const TransformFtor& getTransform() const { return Transform; }
				206
				207	private:
				208	BaseLayerT &BaseLayer;
				209	TransformFtor Transform;
				210	};
				211
				212	This is the whole definition of IRTransformLayer, from
				213	``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its
				214	comments. It is a template class with two template arguments: ``BaesLayerT`` and
Lang Hames	d29ee53	2016-06-06 18:22:47 +0000	[diff] [blame]	215	``TransformFtor`` that provide the type of the base layer and the type of the
				216	"transform functor" (in our case a std::function) respectively. This class is
				217	concerned with two very simple jobs: (1) Running every IR Module that is added
				218	with addModuleSet through the transform functor, and (2) conforming to the ORC
				219	layer interface. The interface consists of one typedef and five methods:
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	220
Lang Hames	d29ee53	2016-06-06 18:22:47 +0000	[diff] [blame]	221	+------------------+-----------------------------------------------------------+
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	222	\| Interface \| Description \|
				223	+==================+===========================================================+
				224	\| \| Provides a handle that can be used to identify a module \|
				225	\| ModuleSetHandleT \| set when calling findSymbolIn, removeModuleSet, or \|
				226	\| \| emitAndFinalize. \|
				227	+------------------+-----------------------------------------------------------+
				228	\| \| Takes a given set of Modules and makes them "available \|
				229	\| \| for execution. This means that symbols in those modules \|
				230	\| \| should be searchable via findSymbol and findSymbolIn, and \|
				231	\| \| the address of the symbols should be read/writable (for \|
				232	\| \| data symbols), or executable (for function symbols) after \|
				233	\| \| JITSymbol::getAddress() is called. Note: This means that \|
				234	\| addModuleSet \| addModuleSet doesn't have to compile (or do any other \|
				235	\| \| work) up-front. It can, like IRCompileLayer, act \|
				236	\| \| eagerly, but it can also simply record the module and \|
				237	\| \| take no further action until somebody calls \|
				238	\| \| JITSymbol::getAddress(). In IRTransformLayer's case \|
				239	\| \| addModuleSet eagerly applies the transform functor to \|
				240	\| \| each module in the set, then passes the resulting set \|
				241	\| \| of mutated modules down to the layer below. \|
				242	+------------------+-----------------------------------------------------------+
				243	\| \| Removes a set of modules from the JIT. Code or data \|
				244	\| removeModuleSet \| defined in these modules will no longer be available, and \|
				245	\| \| the memory holding the JIT'd definitions will be freed. \|
				246	+------------------+-----------------------------------------------------------+
				247	\| \| Searches for the named symbol in all modules that have \|
				248	\| \| previously been added via addModuleSet (and not yet \|
				249	\| findSymbol \| removed by a call to removeModuleSet). In \|
				250	\| \| IRTransformLayer we just pass the query on to the layer \|
				251	\| \| below. In our REPL this is our default way to search for \|
				252	\| \| function definitions. \|
				253	+------------------+-----------------------------------------------------------+
				254	\| \| Searches for the named symbol in the module set indicated \|
				255	\| \| by the given ModuleSetHandleT. This is just an optimized \|
				256	\| \| search, better for lookup-speed when you know exactly \|
				257	\| \| a symbol definition should be found. In IRTransformLayer \|
				258	\| findSymbolIn \| we just pass this query on to the layer below. In our \|
				259	\| \| REPL we use this method to search for functions \|
				260	\| \| representing top-level expressions, since we know exactly \|
				261	\| \| where we'll find them: in the top-level expression module \|
				262	\| \| we just added. \|
				263	+------------------+-----------------------------------------------------------+
				264	\| \| Forces all of the actions required to make the code and \|
				265	\| \| data in a module set (represented by a ModuleSetHandleT) \|
				266	\| \| accessible. Behaves as if some symbol in the set had been \|
				267	\| \| searched for and JITSymbol::getSymbolAddress called. This \|
				268	\| emitAndFinalize \| is rarely needed, but can be useful when dealing with \|
				269	\| \| layers that usually behave lazily if the user wants to \|
				270	\| \| trigger early compilation (for example, to use idle CPU \|
				271	\| \| time to eagerly compile code in the background). \|
				272	+------------------+-----------------------------------------------------------+
				273
				274	This interface attempts to capture the natural operations of a JIT (with some
				275	wrinkles like emitAndFinalize for performance), similar to the basic JIT API
				276	operations we identified in Chapter 1. Conforming to the layer concept allows
				277	classes to compose neatly by implementing their behaviors in terms of the these
				278	same operations, carried out on the layer below. For example, an eager layer
				279	(like IRTransformLayer) can implement addModuleSet by running each module in the
				280	set through its transform up-front and immediately passing the result to the
				281	layer below. A lazy layer, by contrast, could implement addModuleSet by
				282	squirreling away the modules doing no other up-front work, but applying the
				283	transform (and calling addModuleSet on the layer below) when the client calls
				284	findSymbol instead. The JIT'd program behavior will be the same either way, but
				285	these choices will have different performance characteristics: Doing work
				286	eagerly means the JIT takes longer up-front, but proceeds smoothly once this is
				287	done. Deferring work allows the JIT to get up-and-running quickly, but will
				288	force the JIT to pause and wait whenever some code or data is needed that hasn't
Sylvestre Ledru	7d54050	2016-07-02 19:28:40 +0000	[diff] [blame]	289	already been processed.
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	290
				291	Our current REPL is eager: Each function definition is optimized and compiled as
				292	soon as it's typed in. If we were to make the transform layer lazy (but not
				293	change things otherwise) we could defer optimization until the first time we
				294	reference a function in a top-level expression (see if you can figure out why,
				295	then check out the answer below [1]_). In the next chapter, however we'll
				296	introduce fully lazy compilation, in which function's aren't compiled until
				297	they're first called at run-time. At this point the trade-offs get much more
				298	interesting: the lazier we are, the quicker we can start executing the first
				299	function, but the more often we'll have to pause to compile newly encountered
				300	functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow
				301	startup (which everything is optimized) but relatively short pauses as each
				302	function just passes through code-gen. If we both optimize and code-gen lazily
				303	we can start executing the first function more quickly, but we'll have longer
				304	pauses as each function has to be both optimized and code-gen'd when it's first
				305	executed. Things become even more interesting if we consider interproceedural
				306	optimizations like inlining, which must be performed eagerly. These are
				307	complex trade-offs, and there is no one-size-fits all solution to them, but by
				308	providing composable layers we leave the decisions to the person implementing
				309	the JIT, and make it easy for them to experiment with different configurations.
				310
				311	`Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	312
				313	Full Code Listing
				314	=================
				315
				316	Here is the complete code listing for our running example with an
				317	IRTransformLayer added to enable optimization. To build this example, use:
				318
				319	.. code-block:: bash
				320
				321	# Compile
				322	clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy
				323	# Run
				324	./toy
				325
				326	Here is the code:
				327
				328	.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
				329	:language: c++
				330
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	331	.. [1] When we add our top-level expression to the JIT, any calls to functions
				332	that we defined earlier will appear to the ObjectLinkingLayer as
				333	external symbols. The ObjectLinkingLayer will call the SymbolResolver
				334	that we defined in addModuleSet, which in turn calls findSymbol on the
				335	OptimizeLayer, at which point even a lazy transform layer will have to
				336	do its work.