Lang Hames | e038aae | 2016-06-06 18:35:44 +0000 | [diff] [blame] | 1 | ===================================================================== |
| 2 | Building a JIT: Adding Optimizations -- An introduction to ORC Layers |
| 3 | ===================================================================== |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | **This tutorial is under active development. It is incomplete and details may |
| 9 | change frequently.** Nonetheless we invite you to try it out as it stands, and |
| 10 | we welcome any feedback. |
| 11 | |
| 12 | Chapter 2 Introduction |
| 13 | ====================== |
| 14 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 15 | **Warning: This tutorial is currently being updated to account for ORC API |
| 16 | changes. Only Chapters 1 and 2 are up-to-date.** |
Lang Hames | f3fb9836 | 2018-02-06 21:25:20 +0000 | [diff] [blame] | 17 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 18 | **Example code from Chapters 3 to 5 will compile and run, but has not been |
| 19 | updated** |
Lang Hames | f3fb9836 | 2018-02-06 21:25:20 +0000 | [diff] [blame] | 20 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 21 | Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In |
| 22 | `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT |
| 23 | class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce |
| 24 | executable code in memory. KaleidoscopeJIT was able to do this with relatively |
| 25 | little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and |
| 26 | ObjectLinkingLayer, to do much of the heavy lifting. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 27 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 28 | In this layer we'll learn more about the ORC layer concept by using a new layer, |
| 29 | IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 30 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 31 | Optimizing Modules using the IRTransformLayer |
| 32 | ============================================= |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 33 | |
Kirill Bobyrev | e436483 | 2017-07-10 09:07:23 +0000 | [diff] [blame] | 34 | In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM" |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 35 | tutorial series the llvm *FunctionPassManager* is introduced as a means for |
| 36 | optimizing LLVM IR. Interested readers may read that chapter for details, but |
Lang Hames | 8705d11 | 2016-06-20 18:34:46 +0000 | [diff] [blame] | 37 | in short: to optimize a Module we create an llvm::FunctionPassManager |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 38 | instance, configure it with a set of optimizations, then run the PassManager on |
| 39 | a Module to mutate it into a (hopefully) more optimized but semantically |
| 40 | equivalent form. In the original tutorial series the FunctionPassManager was |
Lang Hames | 11c43d5 | 2016-06-20 18:37:52 +0000 | [diff] [blame] | 41 | created outside the KaleidoscopeJIT and modules were optimized before being |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 42 | added to it. In this Chapter we will make optimization a phase of our JIT |
Lang Hames | 11c43d5 | 2016-06-20 18:37:52 +0000 | [diff] [blame] | 43 | instead. For now this will provide us a motivation to learn more about ORC |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 44 | layers, but in the long term making optimization part of our JIT will yield an |
| 45 | important benefit: When we begin lazily compiling code (i.e. deferring |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 46 | compilation of each function until the first time it's run) having |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 47 | optimization managed by our JIT will allow us to optimize lazily too, rather |
| 48 | than having to do all our optimization up-front. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 49 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 50 | To add optimization support to our JIT we will take the KaleidoscopeJIT from |
| 51 | Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the |
| 52 | IRTransformLayer works in more detail below, but the interface is simple: the |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 53 | constructor for this layer takes a reference to the execution session and the |
| 54 | layer below (as all layers do) plus an *IR optimization function* that it will |
| 55 | apply to each Module that is added via addModule: |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 56 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 57 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 58 | |
| 59 | class KaleidoscopeJIT { |
| 60 | private: |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 61 | ExecutionSession ES; |
| 62 | RTDyldObjectLinkingLayer ObjectLayer; |
| 63 | IRCompileLayer CompileLayer; |
| 64 | IRTransformLayer TransformLayer; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 65 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 66 | DataLayout DL; |
| 67 | MangleAndInterner Mangle; |
| 68 | ThreadSafeContext Ctx; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 69 | |
| 70 | public: |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 71 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 72 | KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL) |
| 73 | : ObjectLayer(ES, |
| 74 | []() { return llvm::make_unique<SectionMemoryManager>(); }), |
| 75 | CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))), |
| 76 | TransformLayer(ES, CompileLayer, optimizeModule), |
| 77 | DL(std::move(DL)), Mangle(ES, this->DL), |
| 78 | Ctx(llvm::make_unique<LLVMContext>()) { |
| 79 | ES.getMainJITDylib().setGenerator( |
| 80 | cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL))); |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 81 | } |
| 82 | |
| 83 | Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 84 | but after the CompileLayer we introduce a new member, TransformLayer, which sits |
| 85 | on top of our CompileLayer. We initialize our OptimizeLayer with a reference to |
| 86 | the ExecutionSession and output layer (standard practice for layers), along with |
| 87 | a *transform function*. For our transform function we supply our classes |
| 88 | optimizeModule static method. |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame] | 89 | |
| 90 | .. code-block:: c++ |
| 91 | |
| 92 | // ... |
Don Hinton | 4b93d23 | 2017-09-17 00:24:43 +0000 | [diff] [blame] | 93 | return cantFail(OptimizeLayer.addModule(std::move(M), |
| 94 | std::move(Resolver))); |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 95 | // ... |
| 96 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 97 | Next we need to update our addModule method to replace the call to |
| 98 | ``CompileLayer::add`` with a call to ``OptimizeLayer::add`` instead. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 99 | |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame] | 100 | .. code-block:: c++ |
| 101 | |
Lang Hames | 8bf69be | 2018-11-13 01:26:25 +0000 | [diff] [blame] | 102 | static Expected<ThreadSafeModule> |
| 103 | optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) { |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 104 | // Create a function pass manager. |
| 105 | auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get()); |
| 106 | |
| 107 | // Add some optimizations. |
| 108 | FPM->add(createInstructionCombiningPass()); |
| 109 | FPM->add(createReassociatePass()); |
| 110 | FPM->add(createGVNPass()); |
| 111 | FPM->add(createCFGSimplificationPass()); |
| 112 | FPM->doInitialization(); |
| 113 | |
| 114 | // Run the optimizations over all functions in the module being added to |
| 115 | // the JIT. |
| 116 | for (auto &F : *M) |
| 117 | FPM->run(F); |
| 118 | |
| 119 | return M; |
| 120 | } |
| 121 | |
| 122 | At the bottom of our JIT we add a private method to do the actual optimization: |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 123 | *optimizeModule*. This function takes the module to be transformed as input (as |
| 124 | a ThreadSafeModule) along with a reference to a reference to a new class: |
| 125 | ``MaterializationResponsibility``. The MaterializationResponsibility argument |
| 126 | can be used to query JIT state for the module being transformed, such as the set |
| 127 | of definitions in the module that JIT'd code is actively trying to call/access. |
| 128 | For now we will ignore this argument and use a standard optimization |
| 129 | pipeline. To do this we set up a FunctionPassManager, add some passes to it, run |
| 130 | it over every function in the module, and then return the mutated module. The |
| 131 | specific optimizations are the same ones used in `Chapter 4 <LangImpl04.html>`_ |
| 132 | of the "Implementing a language with LLVM" tutorial series. Readers may visit |
| 133 | that chapter for a more in-depth discussion of these, and of IR optimization in |
| 134 | general. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 135 | |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 136 | And that's it in terms of changes to KaleidoscopeJIT: When a module is added via |
| 137 | addModule the OptimizeLayer will call our optimizeModule function before passing |
| 138 | the transformed module on to the CompileLayer below. Of course, we could have |
| 139 | called optimizeModule directly in our addModule function and not gone to the |
| 140 | bother of using the IRTransformLayer, but doing so gives us another opportunity |
| 141 | to see how layers compose. It also provides a neat entry point to the *layer* |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 142 | concept itself, because IRTransformLayer is one of the simplest layers that |
| 143 | can be implemented. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 144 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 145 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 146 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 147 | // From IRTransformLayer.h: |
| 148 | class IRTransformLayer : public IRLayer { |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 149 | public: |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 150 | using TransformFunction = std::function<Expected<ThreadSafeModule>( |
| 151 | ThreadSafeModule, const MaterializationResponsibility &R)>; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 152 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 153 | IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer, |
| 154 | TransformFunction Transform = identityTransform); |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 155 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 156 | void setTransform(TransformFunction Transform) { |
| 157 | this->Transform = std::move(Transform); |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 158 | } |
| 159 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 160 | static ThreadSafeModule |
| 161 | identityTransform(ThreadSafeModule TSM, |
| 162 | const MaterializationResponsibility &R) { |
| 163 | return TSM; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 164 | } |
| 165 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 166 | void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 167 | |
| 168 | private: |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 169 | IRLayer &BaseLayer; |
| 170 | TransformFunction Transform; |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 171 | }; |
| 172 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 173 | // From IRTransfomrLayer.cpp: |
| 174 | |
| 175 | IRTransformLayer::IRTransformLayer(ExecutionSession &ES, |
| 176 | IRLayer &BaseLayer, |
| 177 | TransformFunction Transform) |
| 178 | : IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {} |
| 179 | |
| 180 | void IRTransformLayer::emit(MaterializationResponsibility R, |
| 181 | ThreadSafeModule TSM) { |
| 182 | assert(TSM.getModule() && "Module must not be null"); |
| 183 | |
| 184 | if (auto TransformedTSM = Transform(std::move(TSM), R)) |
| 185 | BaseLayer.emit(std::move(R), std::move(*TransformedTSM)); |
| 186 | else { |
| 187 | R.failMaterialization(); |
| 188 | getExecutionSession().reportError(TransformedTSM.takeError()); |
| 189 | } |
| 190 | } |
| 191 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 192 | This is the whole definition of IRTransformLayer, from |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 193 | ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h`` and |
| 194 | ``llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp``. This class is concerned |
| 195 | with two very simple jobs: (1) Running every IR Module that is emitted via this |
| 196 | layer through the transform function object, and (2) implementing the ORC |
| 197 | ``IRLayer`` interface (which itself conforms to the general ORC Layer concept, |
| 198 | more on that below). Most of the class is straightforward: a typedef for the |
| 199 | transform function, a constructor to initialize the members, a setter for the |
| 200 | transform function value, and a default no-op transform. The most important |
| 201 | method is ``emit`` as this is half of our IRLayer interface. The emit method |
| 202 | applies our transform to each module that it is called on and, if the transform |
| 203 | succeeds, passes the transformed module to the base layer. If the transform |
| 204 | fails, our emit function calls |
| 205 | ``MaterializationResponsibility::failMaterialization`` (this JIT clients who |
| 206 | may be waiting on other threads know that the code they were waiting for has |
| 207 | failed to compile) and logs the error with the execution session before bailing |
| 208 | out. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 209 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 210 | The other half of the IRLayer interface we inherit unmodified from the IRLayer |
| 211 | class: |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 212 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 213 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 214 | |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 215 | Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) { |
| 216 | return JD.define(llvm::make_unique<BasicIRLayerMaterializationUnit>( |
| 217 | *this, std::move(K), std::move(TSM))); |
| 218 | } |
| 219 | |
| 220 | This code, from ``llvm/lib/ExecutionEngine/Orc/Layer.cpp``, adds a |
| 221 | ThreadSafeModule to a given JITDylib by wrapping it up in a |
| 222 | ``MaterializationUnit`` (in this case a ``BasicIRLayerMaterializationUnit``). |
| 223 | Most layers that derived from IRLayer can rely on this default implementation |
| 224 | of the ``add`` method. |
| 225 | |
| 226 | These two operations, ``add`` and ``emit``, together constitute the layer |
| 227 | concept: A layer is a way to wrap a portion of a compiler pipeline (in this case |
| 228 | the "opt" phase of an LLVM compiler) whose API is is opaque to ORC in an |
| 229 | interface that allows ORC to invoke it when needed. The add method takes an |
| 230 | module in some input program representation (in this case an LLVM IR module) and |
| 231 | stores it in the target JITDylib, arranging for it to be passed back to the |
| 232 | Layer's emit method when any symbol defined by that module is requested. Layers |
| 233 | can compose neatly by calling the 'emit' method of a base layer to complete |
| 234 | their work. For example, in this tutorial our IRTransformLayer calls through to |
| 235 | our IRCompileLayer to compile the transformed IR, and our IRCompileLayer in turn |
| 236 | calls our ObjectLayer to link the object file produced by our compiler. |
| 237 | |
| 238 | |
| 239 | So far we have learned how to optimize and compile our LLVM IR, but we have not |
| 240 | focused on when compilation happens. Our current REPL is eager: Each function |
| 241 | definition is optimized and compiled as soon as it is referenced by any other |
| 242 | code, regardless of whether it is ever called at runtime. In the next chapter we |
| 243 | will introduce fully lazy compilation, in which functions are not compiled until |
| 244 | they are first called at run-time. At this point the trade-offs get much more |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 245 | interesting: the lazier we are, the quicker we can start executing the first |
Lang Hames | 575515f | 2018-11-13 01:25:34 +0000 | [diff] [blame] | 246 | function, but the more often we will have to pause to compile newly encountered |
| 247 | functions. If we only code-gen lazily, but optimize eagerly, we will have a |
| 248 | longer startup time (as everything is optimized) but relatively short pauses as |
| 249 | each function just passes through code-gen. If we both optimize and code-gen |
| 250 | lazily we can start executing the first function more quickly, but we will have |
| 251 | longer pauses as each function has to be both optimized and code-gen'd when it |
| 252 | is first executed. Things become even more interesting if we consider |
| 253 | interproceedural optimizations like inlining, which must be performed eagerly. |
| 254 | These are complex trade-offs, and there is no one-size-fits all solution to |
| 255 | them, but by providing composable layers we leave the decisions to the person |
| 256 | implementing the JIT, and make it easy for them to experiment with different |
| 257 | configurations. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 258 | |
| 259 | `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 260 | |
| 261 | Full Code Listing |
| 262 | ================= |
| 263 | |
| 264 | Here is the complete code listing for our running example with an |
| 265 | IRTransformLayer added to enable optimization. To build this example, use: |
| 266 | |
| 267 | .. code-block:: bash |
| 268 | |
| 269 | # Compile |
Don Hinton | 4b93d23 | 2017-09-17 00:24:43 +0000 | [diff] [blame] | 270 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 271 | # Run |
| 272 | ./toy |
| 273 | |
| 274 | Here is the code: |
| 275 | |
| 276 | .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h |
| 277 | :language: c++ |