blob: 545451c139a78586088deef903996b557dfef2b9 [file] [log] [blame]
Lang Hames7331cc32016-05-23 20:34:19 +00001=======================================================
2Kaleidoscope: Building an ORC-based JIT in LLVM
3=======================================================
4
5.. contents::
6 :local:
7
8**This tutorial is under active development. It is incomplete and details may
9change frequently.** Nonetheless we invite you to try it out as it stands, and
10we welcome any feedback.
11
12Chapter 1 Introduction
13======================
14
15Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
16tutorial runs through the implementation of a JIT compiler using LLVM's
17On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
18KaleidoscopeJIT class used in the
19`Implementing a language with LLVM <LangImpl1.html>`_ tutorials and then
20introduces new features like optimization, lazy compilation and remote
21execution.
22
23The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how
24these APIs interact with other parts of LLVM, and to teach you how to recombine
25them to build a custom JIT that is suited to your use-case.
26
27The structure of the tutorial is:
28
29- Chapter #1: Investigate the simple KaleidoscopeJIT class. This will
30 introduce some of the basic concepts of the ORC JIT APIs, including the
31 idea of an ORC *Layer*.
32
33- `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding
34 a new layer that will optimize IR and generated code.
35
36- `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a
37 Compile-On-Demand layer to lazily compile IR.
38
39- `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by
40 replacing the Compile-On-Demand layer with a custom layer that uses the ORC
41 Compile Callbacks API directly to defer IR-generation until functions are
42 called.
43
44- `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into
45 a remote process with reduced privileges using the JIT Remote APIs.
46
47To provide input for our JIT we will use the Kaleidoscope REPL from
48`Chapter 7 <LangImpl7.html>`_ of the "Implementing a language in LLVM tutorial",
49with one minor modification: We will remove the FunctionPassManager from the
50code for that chapter and replace it with optimization support in our JIT class
51in Chapter #2.
52
53Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API.
54It was preceeded by MCJIT, and before that by the (now deleted) legacy JIT.
55These tutorials don't assume any experience with these earlier APIs, but
56readers acquainted with them will see many familiar elements. Where appropriate
57we will make this connection with the earlier APIs explicit to help people who
58are transitioning from them to ORC.
59
60JIT API Basics
61==============
62
63The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed,
64rather than compiling whole programs to disk ahead of time as a traditional
65compiler does. To support that aim our initial, bare-bones JIT API will be:
66
671. Handle addModule(Module &M) -- Make the given IR module available for
68 execution.
692. JITSymbol findSymbol(const std::string &Name) -- Search for pointers to
70 symbols (functions or variables) that have been added to the JIT.
713. void removeModule(Handle H) -- Remove a module from the JIT, releasing any
72 memory that had been used for the compiled code.
73
74A basic use-case for this API, executing the 'main' function from a module,
75will look like:
76
77.. code-block:: c++
78
79 std::unique_ptr<Module> M = buildModule();
80 JIT J;
81 Handle H = J.addModule(*M);
82 int (*Main)(int, char*[]) =
83 (int(*)(int, char*[])J.findSymbol("main").getAddress();
84 int Result = Main();
85 J.removeModule(H);
86
87The APIs that we build in these tutorials will all be variations on this simple
88theme. Behind the API we will refine the implementation of the JIT to add
89support for optimization and lazy compilation. Eventually we will extend the
90API itself to allow higher-level program representations (e.g. ASTs) to be
91added to the JIT.
92
93KaleidoscopeJIT
94===============
95
96In the previous section we described our API, now we examine a simple
97implementation of it: The KaleidoscopeJIT class [1]_ that was used in the
98`Implementing a language with LLVM <LangImpl1.html>`_ tutorials. We will use
99the REPL code from `Chapter 7 <LangImpl7.html>`_ of that tutorial to supply the
100input for our JIT: Each time the user enters an expression the REPL will add a
101new IR module containing the code for that expression to the JIT. If the
102expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also
103use the findSymbol method of our JIT class find and execute the code for the
104expression, and then use the removeModule method to remove the code again
105(since there's no way to re-invoke an anonymous expression). In later chapters
106of this tutorial we'll modify the REPL to enable new interactions with our JIT
107class, but for now we will take this setup for granted and focus our attention on
108the implementation of our JIT itself.
109
110Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the
111usual include guards and #includes [2]_, we get to the definition of our class:
112
113.. code-block:: c++
114
115 #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
116 #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
117
118 #include "llvm/ExecutionEngine/ExecutionEngine.h"
119 #include "llvm/ExecutionEngine/RTDyldMemoryManager.h"
120 #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
121 #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
122 #include "llvm/ExecutionEngine/Orc/LambdaResolver.h"
123 #include "llvm/ExecutionEngine/Orc/ObjectLinkingLayer.h"
124 #include "llvm/IR/Mangler.h"
125 #include "llvm/Support/DynamicLibrary.h"
126
127 namespace llvm {
128 namespace orc {
129
130 class KaleidoscopeJIT {
131 private:
132
133 std::unique_ptr<TargetMachine> TM;
134 const DataLayout DL;
135 ObjectLinkingLayer<> ObjectLayer;
136 IRCompileLayer<decltype(ObjectLayer)> CompileLayer;
137
138 public:
139
140 typedef decltype(CompileLayer)::ModuleSetHandleT ModuleHandleT;
141
142Our class begins with four members: A TargetMachine, TM, which will be used
143to build our LLVM compiler instance; A DataLayout, DL, which will be used for
144symbol mangling (more on that later), and two ORC *layers*: An
145ObjectLinkingLayer, and an IRCompileLayer. The ObjectLinkingLayer is the
146foundation of our JIT: it takes in-memory object files produced by a
147compiler and links them on the fly to make them executable. This
148JIT-on-top-of-a-linker design was introduced in MCJIT, where the linker was
149hidden inside the MCJIT class itself. In ORC we expose the linker as a visible,
150reusable component so that clients can access and configure it directly
151if they need to. In this tutorial our ObjectLinkingLayer will just be used to
152support the next layer in our stack: the IRCompileLayer, which will be
153responsible for taking LLVM IR, compiling it, and passing the resulting
154in-memory object files down to the object linking layer below.
155
156After our member variables comes typedef: ModuleHandle. This is the handle
157type that will be returned from our JIT's addModule method, and which can be
158used to remove a module again using the removeModule method. The IRCompileLayer
159class already provides a convenient handle type
160(IRCompileLayer::ModuleSetHandleT), so we will just provide a type-alias for
161this.
162
163.. code-block:: c++
164
165 KaleidoscopeJIT()
166 : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
167 CompileLayer(ObjectLayer, SimpleCompiler(*TM)) {
168 llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
169 }
170
171 TargetMachine &getTargetMachine() { return *TM; }
172
173Next up we have our class constructor. We begin by initializing TM using the
174EngineBuilder::selectTarget helper method, which constructs a TargetMachine for
175the current process. Next we use our newly created TargetMachine to initialize
176DL, our DataLayout. Then we initialize our IRCompileLayer. Our IRCompile layer
177needs two things: (1) A reference to our object linking layer, and (2) a
178compiler instance to use to perform the actual compilation from IR to object
179files. We use the off-the-shelf SimpleCompiler instance for now, but in later
180chapters we will substitute our own configurable compiler classes. Finally, in
181the body of the constructor, we call the DynamicLibrary::LoadLibraryPermanently
182method with a nullptr argument. Normally the LoadLibraryPermanently method is
183called with the path of a dynamic library to load, but when passed a null
184pointer it will 'load' the host process itself, making its exported symbols
185available for execution.
186
187.. code-block:: c++
188
189 ModuleHandleT addModule(std::unique_ptr<Module> M) {
190 // We need a memory manager to allocate memory and resolve symbols for this
191 // new module. Create one that resolves symbols by looking back into the
192 // JIT.
193 auto Resolver = createLambdaResolver(
194 [&](const std::string &Name) {
195 if (auto Sym = CompileLayer.findSymbol(Name, false))
196 return RuntimeDyld::SymbolInfo(Sym.getAddress(), Sym.getFlags());
197 return RuntimeDyld::SymbolInfo(nullptr);
198 },
199 [](const std::string &S) { return nullptr; });
200 std::vector<std::unique_ptr<Module>> Ms;
201 Ms.push_back(std::move(M));
202 return CompileLayer.addModuleSet(singletonSet(std::move(M)),
203 make_unique<SectionMemoryManager>(),
204 std::move(Resolver));
205 }
206
207*To be done: describe addModule -- createLambdaResolver, resolvers, memory
208managers, why 'module set' rather than a single module...*
209
210.. code-block:: c++
211
212 JITSymbol findSymbol(const std::string Name) {
213 std::string MangledName;
214 raw_string_ostream MangledNameStream(MangledName);
215 Mangler::getNameWithPrefix(MangledNameStream, Name, DL);
216 return CompileLayer.findSymbol(MangledNameStream.str(), true);
217 }
218
219 void removeModule(ModuleHandle H) {
220 CompileLayer.removeModuleSet(H);
221 }
222
223*To be done: describe findSymbol and removeModule -- why do we mangle? what's
224the relationship between findSymbol and resolvers, why remove modules...*
225
226*To be done: Conclusion, exercises (maybe a utility for a standalone IR JIT,
227like a mini-LLI), feed to next chapter.*
228
229Full Code Listing
230=================
231
232Here is the complete code listing for our running example, enhanced with
233mutable variables and var/in support. To build this example, use:
234
235.. code-block:: bash
236
237 # Compile
238 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy
239 # Run
240 ./toy
241
242Here is the code:
243
244.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
245 :language: c++
246
247`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
248
249.. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
250 simplifying assumption: symbols cannot be re-defined. This will make it
251 impossible to re-define symbols in the REPL, but will make our symbol
252 lookup logic simpler. Re-introducing support for symbol redefinition is
253 left as an exercise for the reader. (The KaleidoscopeJIT.h used in the
254 original tutorials will be a helpful reference).
255
256.. [2] +-----------------------+-----------------------------------------------+
257 | File | Reason for inclusion |
258 +=======================+===============================================+
259 | ExecutionEngine.h | Access to the EngineBuilder::selectTarget |
260 | | method. |
261 +-----------------------+-----------------------------------------------+
262 | | Access to the |
263 | RTDyldMemoryManager.h | RTDyldMemoryManager::getSymbolAddressInProcess|
264 | | method. |
265 +-----------------------+-----------------------------------------------+
266 | CompileUtils.h | Provides the SimpleCompiler class. |
267 +-----------------------+-----------------------------------------------+
268 | IRCompileLayer.h | Provides the IRCompileLayer class. |
269 +-----------------------+-----------------------------------------------+
270 | | Access the createLambdaResolver function, |
271 | LambdaResolver.h | which provides easy construction of symbol |
272 | | resolvers. |
273 +-----------------------+-----------------------------------------------+
274 | ObjectLinkingLayer.h | Provides the ObjectLinkingLayer class. |
275 +-----------------------+-----------------------------------------------+
276 | Mangler.h | Provides the Mangler class for platform |
277 | | specific name-mangling. |
278 +-----------------------+-----------------------------------------------+
279 | DynamicLibrary.h | Provides the DynamicLibrary class, which |
280 | | makes symbols in the host process searchable. |
281 +-----------------------+-----------------------------------------------+