Blame - llvm/docs/tutorial/BuildingAJIT2.rst - toolchain/llvm-project

blob: 3f484d9d092ba133abd06317b72e56fe577c09a4 [file] [log] [blame]

Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	1	====================================================================
				2	Building a JIT: Adding Optimizations - An introduction to ORC Layers
				3	====================================================================
				4
				5	.. contents::
				6	:local:
				7
				8	**This tutorial is under active development. It is incomplete and details may
				9	change frequently.** Nonetheless we invite you to try it out as it stands, and
				10	we welcome any feedback.
				11
				12	Chapter 2 Introduction
				13	======================
				14
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	15	Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
				16	`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
				17	class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
				18	executable code in memory. KaleidoscopeJIT was able to do this with relatively
				19	little code by composing two off-the-shelf ORC layers: IRCompileLayer and
				20	ObjectLinkingLayer, to do much of the heavy lifting.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	21
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	22	In this layer we'll learn more about the ORC layer concept by using a new layer,
				23	IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	24
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	25	Optimizing Modules using the IRTransformLayer
				26	=============================================
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	27
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	28	In `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM"
				29	tutorial series the llvm FunctionPassManager is introduced as a means for
				30	optimizing LLVM IR. Interested readers may read that chapter for details, but
				31	in short, to optimize a Module we create an llvm::FunctionPassManager
				32	instance, configure it with a set of optimizations, then run the PassManager on
				33	a Module to mutate it into a (hopefully) more optimized but semantically
				34	equivalent form. In the original tutorial series the FunctionPassManager was
				35	created outside the KaleidoscopeJIT, and modules were optimized before being
				36	added to it. In this Chapter we will make optimization a phase of our JIT
				37	instead. For now, this will provide us a motivation to learn more about ORC
				38	layers, but in the long term making optimization part of our JIT will yield an
				39	important benefit: When we begin lazily compiling code (i.e. deferring
				40	compilation of each function until the first time it's run), having
				41	optimization managed by our JIT will allow us to optimize lazily too, rather
				42	than having to do all our optimization up-front.
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	43
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	44	To add optimization support to our JIT we will take the KaleidoscopeJIT from
				45	Chapter 1 and compose an ORC IRTransformLayer on top. We will look at how the
				46	IRTransformLayer works in more detail below, but the interface is simple: the
				47	constructor for this layer takes a reference to the layer below (as all layers
				48	do) plus an IR optimization function that it will apply to each Module that
				49	is added via addModuleSet:
				50
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	51	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	52
				53	class KaleidoscopeJIT {
				54	private:
				55	std::unique_ptr<TargetMachine> TM;
				56	const DataLayout DL;
				57	ObjectLinkingLayer<> ObjectLayer;
				58	IRCompileLayer<decltype(ObjectLayer)> CompileLayer;
				59
				60	typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)>
				61	OptimizeFunction;
				62
				63	IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer;
				64
				65	public:
				66	typedef decltype(OptimizeLayer)::ModuleSetHandleT ModuleHandle;
				67
				68	KaleidoscopeJIT()
				69	: TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
				70	CompileLayer(ObjectLayer, SimpleCompiler(*TM)),
				71	OptimizeLayer(CompileLayer,
				72	[this](std::unique_ptr<Module> M) {
				73	return optimizeModule(std::move(M));
				74	}) {
				75	llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
				76	}
				77
				78	Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
				79	but after the CompileLayer we introduce a typedef for our optimization function.
				80	In this case we use a std::function (a handy wrapper for "function-like" things)
				81	from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With
				82	our optimization function typedef in place we can declare our OptimizeLayer,
				83	which sits on top of our CompileLayer.
				84
				85	To initialize our OptimizeLayer we pass it a reference to the CompileLayer
				86	below (standard practice for layers), and we initialize the OptimizeFunction
				87	using a lambda. In the lambda, we just call out to the "optimizeModule" function
				88	that we will define below.
				89
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	90	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	91
				92	// ...
				93	auto Resolver = createLambdaResolver(
				94	[&](const std::string &Name) {
				95	if (auto Sym = OptimizeLayer.findSymbol(Name, false))
				96	return Sym.toRuntimeDyldSymbol();
				97	return RuntimeDyld::SymbolInfo(nullptr);
				98	},
				99	// ...
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame^]	100
				101	.. code-block:: c++
				102
				103	// ...
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	104	// Add the set to the JIT with the resolver we created above and a newly
				105	// created SectionMemoryManager.
				106	return OptimizeLayer.addModuleSet(std::move(Ms),
				107	make_unique<SectionMemoryManager>(),
				108	std::move(Resolver));
				109	// ...
				110
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame^]	111	.. code-block:: c++
				112
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	113	// ...
				114	return OptimizeLayer.findSymbol(MangledNameStream.str(), true);
				115	// ...
				116
Lang Hames	3242f65	2016-06-06 05:07:52 +0000	[diff] [blame^]	117	.. code-block:: c++
				118
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	119	// ...
				120	OptimizeLayer.removeModuleSet(H);
				121	// ...
				122
				123	Next we need to replace references to 'CompileLayer' with references to
				124	OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In
				125	addModule we need to be careful to replace both references: the findSymbol call
				126	inside our resolver, and the call through to addModuleSet.
				127
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	128	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	129
				130	std::unique_ptr<Module> optimizeModule(std::unique_ptr<Module> M) {
				131	// Create a function pass manager.
				132	auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get());
				133
				134	// Add some optimizations.
				135	FPM->add(createInstructionCombiningPass());
				136	FPM->add(createReassociatePass());
				137	FPM->add(createGVNPass());
				138	FPM->add(createCFGSimplificationPass());
				139	FPM->doInitialization();
				140
				141	// Run the optimizations over all functions in the module being added to
				142	// the JIT.
				143	for (auto &F : *M)
				144	FPM->run(F);
				145
				146	return M;
				147	}
				148
				149	At the bottom of our JIT we add a private method to do the actual optimization:
				150	optimizeModule. This function sets up a FunctionPassManager, adds some passes
				151	to it, runs it over every function in the module, and then returns the mutated
				152	module. The specific optimizations used are the same ones used in
				153	`Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM"
				154	tutorial series -- readers may visit that chapter for a more in-depth
				155	discussion of them, and of IR optimization in general.
				156
				157	And that's it: When a module is added to our JIT the OptimizeLayer will now
				158	pass it to our optimizeModule function before passing the transformed module
				159	on to the CompileLayer below. Of course, we could have called optimizeModule
				160	directly in our addModule function and not gone to the bother of using the
				161	IRTransformLayer, but it gives us an opportunity to see how layers compose, and
				162	how one can be implemented, because IRTransformLayer turns out to be one of
				163	the simplest implementations of the layer concept that can be devised:
				164
Lang Hames	38eb031	2016-06-06 04:53:59 +0000	[diff] [blame]	165	.. code-block:: c++
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	166
				167	template <typename BaseLayerT, typename TransformFtor>
				168	class IRTransformLayer {
				169	public:
				170	typedef typename BaseLayerT::ModuleSetHandleT ModuleSetHandleT;
				171
				172	IRTransformLayer(BaseLayerT &BaseLayer,
				173	TransformFtor Transform = TransformFtor())
				174	: BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
				175
				176	template <typename ModuleSetT, typename MemoryManagerPtrT,
				177	typename SymbolResolverPtrT>
				178	ModuleSetHandleT addModuleSet(ModuleSetT Ms,
				179	MemoryManagerPtrT MemMgr,
				180	SymbolResolverPtrT Resolver) {
				181
				182	for (auto I = Ms.begin(), E = Ms.end(); I != E; ++I)
				183	I = Transform(std::move(I));
				184
				185	return BaseLayer.addModuleSet(std::move(Ms), std::move(MemMgr),
				186	std::move(Resolver));
				187	}
				188
				189	void removeModuleSet(ModuleSetHandleT H) { BaseLayer.removeModuleSet(H); }
				190
				191	JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) {
				192	return BaseLayer.findSymbol(Name, ExportedSymbolsOnly);
				193	}
				194
				195	JITSymbol findSymbolIn(ModuleSetHandleT H, const std::string &Name,
				196	bool ExportedSymbolsOnly) {
				197	return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly);
				198	}
				199
				200	void emitAndFinalize(ModuleSetHandleT H) {
				201	BaseLayer.emitAndFinalize(H);
				202	}
				203
				204	TransformFtor& getTransform() { return Transform; }
				205
				206	const TransformFtor& getTransform() const { return Transform; }
				207
				208	private:
				209	BaseLayerT &BaseLayer;
				210	TransformFtor Transform;
				211	};
				212
				213	This is the whole definition of IRTransformLayer, from
				214	``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its
				215	comments. It is a template class with two template arguments: ``BaesLayerT`` and
				216	``TransformFtor`` that provide the type of the base layer, and the type of the
				217	"transform functor" (in our case a std::function) respectively. The body of the
				218	class is concerned with two very simple jobs: (1) Running every IR Module that
				219	is added with addModuleSet through the transform functor, and (2) conforming to
				220	the ORC layer interface, which is:
				221
				222	+------------------------------------------------------------------------------+
				223	\| Interface \| Description \|
				224	+==================+===========================================================+
				225	\| \| Provides a handle that can be used to identify a module \|
				226	\| ModuleSetHandleT \| set when calling findSymbolIn, removeModuleSet, or \|
				227	\| \| emitAndFinalize. \|
				228	+------------------+-----------------------------------------------------------+
				229	\| \| Takes a given set of Modules and makes them "available \|
				230	\| \| for execution. This means that symbols in those modules \|
				231	\| \| should be searchable via findSymbol and findSymbolIn, and \|
				232	\| \| the address of the symbols should be read/writable (for \|
				233	\| \| data symbols), or executable (for function symbols) after \|
				234	\| \| JITSymbol::getAddress() is called. Note: This means that \|
				235	\| addModuleSet \| addModuleSet doesn't have to compile (or do any other \|
				236	\| \| work) up-front. It can, like IRCompileLayer, act \|
				237	\| \| eagerly, but it can also simply record the module and \|
				238	\| \| take no further action until somebody calls \|
				239	\| \| JITSymbol::getAddress(). In IRTransformLayer's case \|
				240	\| \| addModuleSet eagerly applies the transform functor to \|
				241	\| \| each module in the set, then passes the resulting set \|
				242	\| \| of mutated modules down to the layer below. \|
				243	+------------------+-----------------------------------------------------------+
				244	\| \| Removes a set of modules from the JIT. Code or data \|
				245	\| removeModuleSet \| defined in these modules will no longer be available, and \|
				246	\| \| the memory holding the JIT'd definitions will be freed. \|
				247	+------------------+-----------------------------------------------------------+
				248	\| \| Searches for the named symbol in all modules that have \|
				249	\| \| previously been added via addModuleSet (and not yet \|
				250	\| findSymbol \| removed by a call to removeModuleSet). In \|
				251	\| \| IRTransformLayer we just pass the query on to the layer \|
				252	\| \| below. In our REPL this is our default way to search for \|
				253	\| \| function definitions. \|
				254	+------------------+-----------------------------------------------------------+
				255	\| \| Searches for the named symbol in the module set indicated \|
				256	\| \| by the given ModuleSetHandleT. This is just an optimized \|
				257	\| \| search, better for lookup-speed when you know exactly \|
				258	\| \| a symbol definition should be found. In IRTransformLayer \|
				259	\| findSymbolIn \| we just pass this query on to the layer below. In our \|
				260	\| \| REPL we use this method to search for functions \|
				261	\| \| representing top-level expressions, since we know exactly \|
				262	\| \| where we'll find them: in the top-level expression module \|
				263	\| \| we just added. \|
				264	+------------------+-----------------------------------------------------------+
				265	\| \| Forces all of the actions required to make the code and \|
				266	\| \| data in a module set (represented by a ModuleSetHandleT) \|
				267	\| \| accessible. Behaves as if some symbol in the set had been \|
				268	\| \| searched for and JITSymbol::getSymbolAddress called. This \|
				269	\| emitAndFinalize \| is rarely needed, but can be useful when dealing with \|
				270	\| \| layers that usually behave lazily if the user wants to \|
				271	\| \| trigger early compilation (for example, to use idle CPU \|
				272	\| \| time to eagerly compile code in the background). \|
				273	+------------------+-----------------------------------------------------------+
				274
				275	This interface attempts to capture the natural operations of a JIT (with some
				276	wrinkles like emitAndFinalize for performance), similar to the basic JIT API
				277	operations we identified in Chapter 1. Conforming to the layer concept allows
				278	classes to compose neatly by implementing their behaviors in terms of the these
				279	same operations, carried out on the layer below. For example, an eager layer
				280	(like IRTransformLayer) can implement addModuleSet by running each module in the
				281	set through its transform up-front and immediately passing the result to the
				282	layer below. A lazy layer, by contrast, could implement addModuleSet by
				283	squirreling away the modules doing no other up-front work, but applying the
				284	transform (and calling addModuleSet on the layer below) when the client calls
				285	findSymbol instead. The JIT'd program behavior will be the same either way, but
				286	these choices will have different performance characteristics: Doing work
				287	eagerly means the JIT takes longer up-front, but proceeds smoothly once this is
				288	done. Deferring work allows the JIT to get up-and-running quickly, but will
				289	force the JIT to pause and wait whenever some code or data is needed that hasn't
				290	already been procesed.
				291
				292	Our current REPL is eager: Each function definition is optimized and compiled as
				293	soon as it's typed in. If we were to make the transform layer lazy (but not
				294	change things otherwise) we could defer optimization until the first time we
				295	reference a function in a top-level expression (see if you can figure out why,
				296	then check out the answer below [1]_). In the next chapter, however we'll
				297	introduce fully lazy compilation, in which function's aren't compiled until
				298	they're first called at run-time. At this point the trade-offs get much more
				299	interesting: the lazier we are, the quicker we can start executing the first
				300	function, but the more often we'll have to pause to compile newly encountered
				301	functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow
				302	startup (which everything is optimized) but relatively short pauses as each
				303	function just passes through code-gen. If we both optimize and code-gen lazily
				304	we can start executing the first function more quickly, but we'll have longer
				305	pauses as each function has to be both optimized and code-gen'd when it's first
				306	executed. Things become even more interesting if we consider interproceedural
				307	optimizations like inlining, which must be performed eagerly. These are
				308	complex trade-offs, and there is no one-size-fits all solution to them, but by
				309	providing composable layers we leave the decisions to the person implementing
				310	the JIT, and make it easy for them to experiment with different configurations.
				311
				312	`Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
Lang Hames	be84d2be	2016-05-26 00:38:04 +0000	[diff] [blame]	313
				314	Full Code Listing
				315	=================
				316
				317	Here is the complete code listing for our running example with an
				318	IRTransformLayer added to enable optimization. To build this example, use:
				319
				320	.. code-block:: bash
				321
				322	# Compile
				323	clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy
				324	# Run
				325	./toy
				326
				327	Here is the code:
				328
				329	.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
				330	:language: c++
				331
Lang Hames	c499d2a	2016-06-06 03:28:12 +0000	[diff] [blame]	332	.. [1] When we add our top-level expression to the JIT, any calls to functions
				333	that we defined earlier will appear to the ObjectLinkingLayer as
				334	external symbols. The ObjectLinkingLayer will call the SymbolResolver
				335	that we defined in addModuleSet, which in turn calls findSymbol on the
				336	OptimizeLayer, at which point even a lazy transform layer will have to
				337	do its work.