|  | ======================================================= | 
|  | Building a JIT: Starting out with KaleidoscopeJIT | 
|  | ======================================================= | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | Chapter 1 Introduction | 
|  | ====================== | 
|  |  | 
|  | **Warning: This tutorial is currently being updated to account for ORC API | 
|  | changes. Only Chapters 1 and 2 are up-to-date.** | 
|  |  | 
|  | **Example code from Chapters 3 to 5 will compile and run, but has not been | 
|  | updated** | 
|  |  | 
|  | Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This | 
|  | tutorial runs through the implementation of a JIT compiler using LLVM's | 
|  | On-Request-Compilation (ORC) APIs. It begins with a simplified version of the | 
|  | KaleidoscopeJIT class used in the | 
|  | `Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then | 
|  | introduces new features like concurrent compilation, optimization, lazy | 
|  | compilation and remote execution. | 
|  |  | 
|  | The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how | 
|  | these APIs interact with other parts of LLVM, and to teach you how to recombine | 
|  | them to build a custom JIT that is suited to your use-case. | 
|  |  | 
|  | The structure of the tutorial is: | 
|  |  | 
|  | - Chapter #1: Investigate the simple KaleidoscopeJIT class. This will | 
|  | introduce some of the basic concepts of the ORC JIT APIs, including the | 
|  | idea of an ORC *Layer*. | 
|  |  | 
|  | - `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding | 
|  | a new layer that will optimize IR and generated code. | 
|  |  | 
|  | - `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a | 
|  | Compile-On-Demand layer to lazily compile IR. | 
|  |  | 
|  | - `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by | 
|  | replacing the Compile-On-Demand layer with a custom layer that uses the ORC | 
|  | Compile Callbacks API directly to defer IR-generation until functions are | 
|  | called. | 
|  |  | 
|  | - `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into | 
|  | a remote process with reduced privileges using the JIT Remote APIs. | 
|  |  | 
|  | To provide input for our JIT we will use a lightly modified version of the | 
|  | Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a | 
|  | language in LLVM tutorial". | 
|  |  | 
|  | Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API. | 
|  | It was preceded by MCJIT, and before that by the (now deleted) legacy JIT. | 
|  | These tutorials don't assume any experience with these earlier APIs, but | 
|  | readers acquainted with them will see many familiar elements. Where appropriate | 
|  | we will make this connection with the earlier APIs explicit to help people who | 
|  | are transitioning from them to ORC. | 
|  |  | 
|  | JIT API Basics | 
|  | ============== | 
|  |  | 
|  | The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed, | 
|  | rather than compiling whole programs to disk ahead of time as a traditional | 
|  | compiler does. To support that aim our initial, bare-bones JIT API will have | 
|  | just two functions: | 
|  |  | 
|  | 1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module | 
|  | available for execution. | 
|  | 2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to | 
|  | symbols (functions or variables) that have been added to the JIT. | 
|  |  | 
|  | A basic use-case for this API, executing the 'main' function from a module, | 
|  | will look like: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | JIT J; | 
|  | J.addModule(buildModule()); | 
|  | auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress(); | 
|  | int Result = Main(); | 
|  |  | 
|  | The APIs that we build in these tutorials will all be variations on this simple | 
|  | theme. Behind this API we will refine the implementation of the JIT to add | 
|  | support for concurrent compilation, optimization and lazy compilation. | 
|  | Eventually we will extend the API itself to allow higher-level program | 
|  | representations (e.g. ASTs) to be added to the JIT. | 
|  |  | 
|  | KaleidoscopeJIT | 
|  | =============== | 
|  |  | 
|  | In the previous section we described our API, now we examine a simple | 
|  | implementation of it: The KaleidoscopeJIT class [1]_ that was used in the | 
|  | `Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use | 
|  | the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the | 
|  | input for our JIT: Each time the user enters an expression the REPL will add a | 
|  | new IR module containing the code for that expression to the JIT. If the | 
|  | expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also | 
|  | use the lookup method of our JIT class find and execute the code for the | 
|  | expression. In later chapters of this tutorial we will modify the REPL to enable | 
|  | new interactions with our JIT class, but for now we will take this setup for | 
|  | granted and focus our attention on the implementation of our JIT itself. | 
|  |  | 
|  | Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the | 
|  | usual include guards and #includes [2]_, we get to the definition of our class: | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H | 
|  | #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H | 
|  |  | 
|  | #include "llvm/ADT/StringRef.h" | 
|  | #include "llvm/ExecutionEngine/JITSymbol.h" | 
|  | #include "llvm/ExecutionEngine/Orc/CompileUtils.h" | 
|  | #include "llvm/ExecutionEngine/Orc/Core.h" | 
|  | #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h" | 
|  | #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h" | 
|  | #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h" | 
|  | #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h" | 
|  | #include "llvm/ExecutionEngine/SectionMemoryManager.h" | 
|  | #include "llvm/IR/DataLayout.h" | 
|  | #include "llvm/IR/LLVMContext.h" | 
|  | #include <memory> | 
|  |  | 
|  | namespace llvm { | 
|  | namespace orc { | 
|  |  | 
|  | class KaleidoscopeJIT { | 
|  | private: | 
|  | ExecutionSession ES; | 
|  | RTDyldObjectLinkingLayer ObjectLayer; | 
|  | IRCompileLayer CompileLayer; | 
|  |  | 
|  | DataLayout DL; | 
|  | MangleAndInterner Mangle; | 
|  | ThreadSafeContext Ctx; | 
|  |  | 
|  | public: | 
|  | KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) | 
|  | : ObjectLayer(ES, | 
|  | []() { return std::make_unique<SectionMemoryManager>(); }), | 
|  | CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), | 
|  | DL(std::move(DL)), Mangle(ES, this->DL), | 
|  | Ctx(std::make_unique<LLVMContext>()) { | 
|  | ES.getMainJITDylib().setGenerator( | 
|  | cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL))); | 
|  | } | 
|  |  | 
|  | Our class begins with six member variables: An ExecutionSession member, ``ES``, | 
|  | which provides context for our running JIT'd code (including the string pool, | 
|  | global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer, | 
|  | ``ObjectLayer``, that can be used to add object files to our JIT (though we will | 
|  | not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to | 
|  | add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout | 
|  | and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling | 
|  | (more on that later); and finally an LLVMContext that clients will use when | 
|  | building IR files for the JIT. | 
|  |  | 
|  | Next up we have our class constructor, which takes a `JITTargetMachineBuilder`` | 
|  | that will be used by our IRCompiler, and a ``DataLayout`` that we will use to | 
|  | initialize our DL member. The constructor begins by initializing our | 
|  | ObjectLayer.  The ObjectLayer requires a reference to the ExecutionSession, and | 
|  | a function object that will build a JIT memory manager for each module that is | 
|  | added (a JIT memory manager manages memory allocations, memory permissions, and | 
|  | registration of exception handlers for JIT'd code). For this we use a lambda | 
|  | that returns a SectionMemoryManager, an off-the-shelf utility that provides all | 
|  | the basic memory management functionality required for this chapter. Next we | 
|  | initialize our CompileLayer. The CompileLayer needs three things: (1) A | 
|  | reference to the ExecutionSession, (2) A reference to our object layer, and (3) | 
|  | a compiler instance to use to perform the actual compilation from IR to object | 
|  | files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler, | 
|  | which we construct using this constructor's JITTargetMachineBuilder argument. | 
|  | The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build | 
|  | llvm TargetMachines (which are not thread safe) as needed for compiles. After | 
|  | this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with | 
|  | the input DataLayout, the ExecutionSession and DL member, and a new default | 
|  | constucted LLVMContext respectively. Now that our members have been initialized, | 
|  | so the one thing that remains to do is to tweak the configuration of the | 
|  | *JITDylib* that we will store our code in. We want to modify this dylib to | 
|  | contain not only the symbols that we add to it, but also the symbols from our | 
|  | REPL process as well. We do this by attaching a | 
|  | ``DynamicLibrarySearchGenerator`` instance using the | 
|  | ``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method. | 
|  |  | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() { | 
|  | auto JTMB = JITTargetMachineBuilder::detectHost(); | 
|  |  | 
|  | if (!JTMB) | 
|  | return JTMB.takeError(); | 
|  |  | 
|  | auto DL = JTMB->getDefaultDataLayoutForTarget(); | 
|  | if (!DL) | 
|  | return DL.takeError(); | 
|  |  | 
|  | return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL)); | 
|  | } | 
|  |  | 
|  | const DataLayout &getDataLayout() const { return DL; } | 
|  |  | 
|  | LLVMContext &getContext() { return *Ctx.getContext(); } | 
|  |  | 
|  | Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT | 
|  | instance that is configured to generate code for our host process. It does this | 
|  | by first generating a JITTargetMachineBuilder instance using that clases's | 
|  | detectHost method and then using that instance to generate a datalayout for | 
|  | the target process. Each of these operations can fail, so each returns its | 
|  | result wrapped in an Expected value [3]_ that we must check for error before | 
|  | continuing. If both operations succeed we can unwrap their results (using the | 
|  | dereference operator) and pass them into KaleidoscopeJIT's constructor on the | 
|  | last line of the function. | 
|  |  | 
|  | Following the named constructor we have the ``getDataLayout()`` and | 
|  | ``getContext()`` methods. These are used to make data structures created and | 
|  | managed by the JIT (especially the LLVMContext) available to the REPL code that | 
|  | will build our IR modules. | 
|  |  | 
|  | .. code-block:: c++ | 
|  |  | 
|  | void addModule(std::unique_ptr<Module> M) { | 
|  | cantFail(CompileLayer.add(ES.getMainJITDylib(), | 
|  | ThreadSafeModule(std::move(M), Ctx))); | 
|  | } | 
|  |  | 
|  | Expected<JITEvaluatedSymbol> lookup(StringRef Name) { | 
|  | return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str())); | 
|  | } | 
|  |  | 
|  | Now we come to the first of our JIT API methods: addModule. This method is | 
|  | responsible for adding IR to the JIT and making it available for execution. In | 
|  | this initial implementation of our JIT we will make our modules "available for | 
|  | execution" by adding them to the CompileLayer, which will it turn store the | 
|  | Module in the main JITDylib. This process will create new symbol table entries | 
|  | in the JITDylib for each definition in the module, and will defer compilation of | 
|  | the module until any of its definitions is looked up. Note that this is not lazy | 
|  | compilation: just referencing a definition, even if it is never used, will be | 
|  | enough to trigger compilation. In later chapters we will teach our JIT to defer | 
|  | compilation of functions until they're actually called.  To add our Module we | 
|  | must first wrap it in a ThreadSafeModule instance, which manages the lifetime of | 
|  | the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our | 
|  | example, all modules will share the Ctx member, which will exist for the | 
|  | duration of the JIT. Once we switch to concurrent compilation in later chapters | 
|  | we will use a new context per module. | 
|  |  | 
|  | Our last method is ``lookup``, which allows us to look up addresses for | 
|  | function and variable definitions added to the JIT based on their symbol names. | 
|  | As noted above, lookup will implicitly trigger compilation for any symbol | 
|  | that has not already been compiled. Our lookup method calls through to | 
|  | `ExecutionSession::lookup`, passing in a list of dylibs to search (in our case | 
|  | just the main dylib), and the symbol name to search for, with a twist: We have | 
|  | to *mangle* the name of the symbol we're searching for first. The ORC JIT | 
|  | components use mangled symbols internally the same way a static compiler and | 
|  | linker would, rather than using plain IR symbol names. This allows JIT'd code | 
|  | to interoperate easily with precompiled code in the application or shared | 
|  | libraries. The kind of mangling will depend on the DataLayout, which in turn | 
|  | depends on the target platform. To allow us to remain portable and search based | 
|  | on the un-mangled name, we just re-produce this mangling ourselves using our | 
|  | ``Mangle`` member function object. | 
|  |  | 
|  | This brings us to the end of Chapter 1 of Building a JIT. You now have a basic | 
|  | but fully functioning JIT stack that you can use to take LLVM IR and make it | 
|  | executable within the context of your JIT process. In the next chapter we'll | 
|  | look at how to extend this JIT to produce better quality code, and in the | 
|  | process take a deeper look at the ORC layer concept. | 
|  |  | 
|  | `Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ | 
|  |  | 
|  | Full Code Listing | 
|  | ================= | 
|  |  | 
|  | Here is the complete code listing for our running example. To build this | 
|  | example, use: | 
|  |  | 
|  | .. code-block:: bash | 
|  |  | 
|  | # Compile | 
|  | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy | 
|  | # Run | 
|  | ./toy | 
|  |  | 
|  | Here is the code: | 
|  |  | 
|  | .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h | 
|  | :language: c++ | 
|  |  | 
|  | .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a | 
|  | simplifying assumption: symbols cannot be re-defined. This will make it | 
|  | impossible to re-define symbols in the REPL, but will make our symbol | 
|  | lookup logic simpler. Re-introducing support for symbol redefinition is | 
|  | left as an exercise for the reader. (The KaleidoscopeJIT.h used in the | 
|  | original tutorials will be a helpful reference). | 
|  |  | 
|  | .. [2] +-----------------------------+-----------------------------------------------+ | 
|  | |         File                |               Reason for inclusion            | | 
|  | +=============================+===============================================+ | 
|  | |        JITSymbol.h          | Defines the lookup result type                | | 
|  | |                             | JITEvaluatedSymbol                            | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |       CompileUtils.h        | Provides the SimpleCompiler class.            | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |           Core.h            | Core utilities such as ExecutionSession and   | | 
|  | |                             | JITDylib.                                     | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |      ExecutionUtils.h       | Provides the DynamicLibrarySearchGenerator    | | 
|  | |                             | class.                                        | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |      IRCompileLayer.h       | Provides the IRCompileLayer class.            | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |  JITTargetMachineBuilder.h  | Provides the JITTargetMachineBuilder class.   | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | | RTDyldObjectLinkingLayer.h  | Provides the RTDyldObjectLinkingLayer class.  | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |   SectionMemoryManager.h    | Provides the SectionMemoryManager class.      | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |        DataLayout.h         | Provides the DataLayout class.                | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  | |        LLVMContext.h        | Provides the LLVMContext class.               | | 
|  | +-----------------------------+-----------------------------------------------+ | 
|  |  | 
|  | .. [3] See the ErrorHandling section in the LLVM Programmer's Manual | 
|  | (http://llvm.org/docs/ProgrammersManual.html#error-handling) |