Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 1 | ==================================================================== |
| 2 | Building a JIT: Adding Optimizations - An introduction to ORC Layers |
| 3 | ==================================================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | **This tutorial is under active development. It is incomplete and details may |
| 9 | change frequently.** Nonetheless we invite you to try it out as it stands, and |
| 10 | we welcome any feedback. |
| 11 | |
| 12 | Chapter 2 Introduction |
| 13 | ====================== |
| 14 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 15 | Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In |
| 16 | `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT |
| 17 | class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce |
| 18 | executable code in memory. KaleidoscopeJIT was able to do this with relatively |
| 19 | little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and |
| 20 | ObjectLinkingLayer, to do much of the heavy lifting. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 21 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 22 | In this layer we'll learn more about the ORC layer concept by using a new layer, |
| 23 | IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 24 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 25 | Optimizing Modules using the IRTransformLayer |
| 26 | ============================================= |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 27 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 28 | In `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" |
| 29 | tutorial series the llvm *FunctionPassManager* is introduced as a means for |
| 30 | optimizing LLVM IR. Interested readers may read that chapter for details, but |
| 31 | in short, to optimize a Module we create an llvm::FunctionPassManager |
| 32 | instance, configure it with a set of optimizations, then run the PassManager on |
| 33 | a Module to mutate it into a (hopefully) more optimized but semantically |
| 34 | equivalent form. In the original tutorial series the FunctionPassManager was |
| 35 | created outside the KaleidoscopeJIT, and modules were optimized before being |
| 36 | added to it. In this Chapter we will make optimization a phase of our JIT |
| 37 | instead. For now, this will provide us a motivation to learn more about ORC |
| 38 | layers, but in the long term making optimization part of our JIT will yield an |
| 39 | important benefit: When we begin lazily compiling code (i.e. deferring |
| 40 | compilation of each function until the first time it's run), having |
| 41 | optimization managed by our JIT will allow us to optimize lazily too, rather |
| 42 | than having to do all our optimization up-front. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 43 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 44 | To add optimization support to our JIT we will take the KaleidoscopeJIT from |
| 45 | Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the |
| 46 | IRTransformLayer works in more detail below, but the interface is simple: the |
| 47 | constructor for this layer takes a reference to the layer below (as all layers |
| 48 | do) plus an *IR optimization function* that it will apply to each Module that |
| 49 | is added via addModuleSet: |
| 50 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 51 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 52 | |
| 53 | class KaleidoscopeJIT { |
| 54 | private: |
| 55 | std::unique_ptr<TargetMachine> TM; |
| 56 | const DataLayout DL; |
| 57 | ObjectLinkingLayer<> ObjectLayer; |
| 58 | IRCompileLayer<decltype(ObjectLayer)> CompileLayer; |
| 59 | |
| 60 | typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)> |
| 61 | OptimizeFunction; |
| 62 | |
| 63 | IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer; |
| 64 | |
| 65 | public: |
| 66 | typedef decltype(OptimizeLayer)::ModuleSetHandleT ModuleHandle; |
| 67 | |
| 68 | KaleidoscopeJIT() |
| 69 | : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), |
| 70 | CompileLayer(ObjectLayer, SimpleCompiler(*TM)), |
| 71 | OptimizeLayer(CompileLayer, |
| 72 | [this](std::unique_ptr<Module> M) { |
| 73 | return optimizeModule(std::move(M)); |
| 74 | }) { |
| 75 | llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); |
| 76 | } |
| 77 | |
| 78 | Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, |
| 79 | but after the CompileLayer we introduce a typedef for our optimization function. |
| 80 | In this case we use a std::function (a handy wrapper for "function-like" things) |
| 81 | from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With |
| 82 | our optimization function typedef in place we can declare our OptimizeLayer, |
| 83 | which sits on top of our CompileLayer. |
| 84 | |
| 85 | To initialize our OptimizeLayer we pass it a reference to the CompileLayer |
| 86 | below (standard practice for layers), and we initialize the OptimizeFunction |
| 87 | using a lambda. In the lambda, we just call out to the "optimizeModule" function |
| 88 | that we will define below. |
| 89 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 90 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 91 | |
| 92 | // ... |
| 93 | auto Resolver = createLambdaResolver( |
| 94 | [&](const std::string &Name) { |
| 95 | if (auto Sym = OptimizeLayer.findSymbol(Name, false)) |
| 96 | return Sym.toRuntimeDyldSymbol(); |
| 97 | return RuntimeDyld::SymbolInfo(nullptr); |
| 98 | }, |
| 99 | // ... |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame^] | 100 | |
| 101 | .. code-block:: c++ |
| 102 | |
| 103 | // ... |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 104 | // Add the set to the JIT with the resolver we created above and a newly |
| 105 | // created SectionMemoryManager. |
| 106 | return OptimizeLayer.addModuleSet(std::move(Ms), |
| 107 | make_unique<SectionMemoryManager>(), |
| 108 | std::move(Resolver)); |
| 109 | // ... |
| 110 | |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame^] | 111 | .. code-block:: c++ |
| 112 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 113 | // ... |
| 114 | return OptimizeLayer.findSymbol(MangledNameStream.str(), true); |
| 115 | // ... |
| 116 | |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame^] | 117 | .. code-block:: c++ |
| 118 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 119 | // ... |
| 120 | OptimizeLayer.removeModuleSet(H); |
| 121 | // ... |
| 122 | |
| 123 | Next we need to replace references to 'CompileLayer' with references to |
| 124 | OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In |
| 125 | addModule we need to be careful to replace both references: the findSymbol call |
| 126 | inside our resolver, and the call through to addModuleSet. |
| 127 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 128 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 129 | |
| 130 | std::unique_ptr<Module> optimizeModule(std::unique_ptr<Module> M) { |
| 131 | // Create a function pass manager. |
| 132 | auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get()); |
| 133 | |
| 134 | // Add some optimizations. |
| 135 | FPM->add(createInstructionCombiningPass()); |
| 136 | FPM->add(createReassociatePass()); |
| 137 | FPM->add(createGVNPass()); |
| 138 | FPM->add(createCFGSimplificationPass()); |
| 139 | FPM->doInitialization(); |
| 140 | |
| 141 | // Run the optimizations over all functions in the module being added to |
| 142 | // the JIT. |
| 143 | for (auto &F : *M) |
| 144 | FPM->run(F); |
| 145 | |
| 146 | return M; |
| 147 | } |
| 148 | |
| 149 | At the bottom of our JIT we add a private method to do the actual optimization: |
| 150 | *optimizeModule*. This function sets up a FunctionPassManager, adds some passes |
| 151 | to it, runs it over every function in the module, and then returns the mutated |
| 152 | module. The specific optimizations used are the same ones used in |
| 153 | `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" |
| 154 | tutorial series -- readers may visit that chapter for a more in-depth |
| 155 | discussion of them, and of IR optimization in general. |
| 156 | |
| 157 | And that's it: When a module is added to our JIT the OptimizeLayer will now |
| 158 | pass it to our optimizeModule function before passing the transformed module |
| 159 | on to the CompileLayer below. Of course, we could have called optimizeModule |
| 160 | directly in our addModule function and not gone to the bother of using the |
| 161 | IRTransformLayer, but it gives us an opportunity to see how layers compose, and |
| 162 | how one can be implemented, because IRTransformLayer turns out to be one of |
| 163 | the simplest implementations of the *layer* concept that can be devised: |
| 164 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 165 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 166 | |
| 167 | template <typename BaseLayerT, typename TransformFtor> |
| 168 | class IRTransformLayer { |
| 169 | public: |
| 170 | typedef typename BaseLayerT::ModuleSetHandleT ModuleSetHandleT; |
| 171 | |
| 172 | IRTransformLayer(BaseLayerT &BaseLayer, |
| 173 | TransformFtor Transform = TransformFtor()) |
| 174 | : BaseLayer(BaseLayer), Transform(std::move(Transform)) {} |
| 175 | |
| 176 | template <typename ModuleSetT, typename MemoryManagerPtrT, |
| 177 | typename SymbolResolverPtrT> |
| 178 | ModuleSetHandleT addModuleSet(ModuleSetT Ms, |
| 179 | MemoryManagerPtrT MemMgr, |
| 180 | SymbolResolverPtrT Resolver) { |
| 181 | |
| 182 | for (auto I = Ms.begin(), E = Ms.end(); I != E; ++I) |
| 183 | *I = Transform(std::move(*I)); |
| 184 | |
| 185 | return BaseLayer.addModuleSet(std::move(Ms), std::move(MemMgr), |
| 186 | std::move(Resolver)); |
| 187 | } |
| 188 | |
| 189 | void removeModuleSet(ModuleSetHandleT H) { BaseLayer.removeModuleSet(H); } |
| 190 | |
| 191 | JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) { |
| 192 | return BaseLayer.findSymbol(Name, ExportedSymbolsOnly); |
| 193 | } |
| 194 | |
| 195 | JITSymbol findSymbolIn(ModuleSetHandleT H, const std::string &Name, |
| 196 | bool ExportedSymbolsOnly) { |
| 197 | return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly); |
| 198 | } |
| 199 | |
| 200 | void emitAndFinalize(ModuleSetHandleT H) { |
| 201 | BaseLayer.emitAndFinalize(H); |
| 202 | } |
| 203 | |
| 204 | TransformFtor& getTransform() { return Transform; } |
| 205 | |
| 206 | const TransformFtor& getTransform() const { return Transform; } |
| 207 | |
| 208 | private: |
| 209 | BaseLayerT &BaseLayer; |
| 210 | TransformFtor Transform; |
| 211 | }; |
| 212 | |
| 213 | This is the whole definition of IRTransformLayer, from |
| 214 | ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its |
| 215 | comments. It is a template class with two template arguments: ``BaesLayerT`` and |
| 216 | ``TransformFtor`` that provide the type of the base layer, and the type of the |
| 217 | "transform functor" (in our case a std::function) respectively. The body of the |
| 218 | class is concerned with two very simple jobs: (1) Running every IR Module that |
| 219 | is added with addModuleSet through the transform functor, and (2) conforming to |
| 220 | the ORC layer interface, which is: |
| 221 | |
| 222 | +------------------------------------------------------------------------------+ |
| 223 | | Interface | Description | |
| 224 | +==================+===========================================================+ |
| 225 | | | Provides a handle that can be used to identify a module | |
| 226 | | ModuleSetHandleT | set when calling findSymbolIn, removeModuleSet, or | |
| 227 | | | emitAndFinalize. | |
| 228 | +------------------+-----------------------------------------------------------+ |
| 229 | | | Takes a given set of Modules and makes them "available | |
| 230 | | | for execution. This means that symbols in those modules | |
| 231 | | | should be searchable via findSymbol and findSymbolIn, and | |
| 232 | | | the address of the symbols should be read/writable (for | |
| 233 | | | data symbols), or executable (for function symbols) after | |
| 234 | | | JITSymbol::getAddress() is called. Note: This means that | |
| 235 | | addModuleSet | addModuleSet doesn't have to compile (or do any other | |
| 236 | | | work) up-front. It *can*, like IRCompileLayer, act | |
| 237 | | | eagerly, but it can also simply record the module and | |
| 238 | | | take no further action until somebody calls | |
| 239 | | | JITSymbol::getAddress(). In IRTransformLayer's case | |
| 240 | | | addModuleSet eagerly applies the transform functor to | |
| 241 | | | each module in the set, then passes the resulting set | |
| 242 | | | of mutated modules down to the layer below. | |
| 243 | +------------------+-----------------------------------------------------------+ |
| 244 | | | Removes a set of modules from the JIT. Code or data | |
| 245 | | removeModuleSet | defined in these modules will no longer be available, and | |
| 246 | | | the memory holding the JIT'd definitions will be freed. | |
| 247 | +------------------+-----------------------------------------------------------+ |
| 248 | | | Searches for the named symbol in all modules that have | |
| 249 | | | previously been added via addModuleSet (and not yet | |
| 250 | | findSymbol | removed by a call to removeModuleSet). In | |
| 251 | | | IRTransformLayer we just pass the query on to the layer | |
| 252 | | | below. In our REPL this is our default way to search for | |
| 253 | | | function definitions. | |
| 254 | +------------------+-----------------------------------------------------------+ |
| 255 | | | Searches for the named symbol in the module set indicated | |
| 256 | | | by the given ModuleSetHandleT. This is just an optimized | |
| 257 | | | search, better for lookup-speed when you know exactly | |
| 258 | | | a symbol definition should be found. In IRTransformLayer | |
| 259 | | findSymbolIn | we just pass this query on to the layer below. In our | |
| 260 | | | REPL we use this method to search for functions | |
| 261 | | | representing top-level expressions, since we know exactly | |
| 262 | | | where we'll find them: in the top-level expression module | |
| 263 | | | we just added. | |
| 264 | +------------------+-----------------------------------------------------------+ |
| 265 | | | Forces all of the actions required to make the code and | |
| 266 | | | data in a module set (represented by a ModuleSetHandleT) | |
| 267 | | | accessible. Behaves as if some symbol in the set had been | |
| 268 | | | searched for and JITSymbol::getSymbolAddress called. This | |
| 269 | | emitAndFinalize | is rarely needed, but can be useful when dealing with | |
| 270 | | | layers that usually behave lazily if the user wants to | |
| 271 | | | trigger early compilation (for example, to use idle CPU | |
| 272 | | | time to eagerly compile code in the background). | |
| 273 | +------------------+-----------------------------------------------------------+ |
| 274 | |
| 275 | This interface attempts to capture the natural operations of a JIT (with some |
| 276 | wrinkles like emitAndFinalize for performance), similar to the basic JIT API |
| 277 | operations we identified in Chapter 1. Conforming to the layer concept allows |
| 278 | classes to compose neatly by implementing their behaviors in terms of the these |
| 279 | same operations, carried out on the layer below. For example, an eager layer |
| 280 | (like IRTransformLayer) can implement addModuleSet by running each module in the |
| 281 | set through its transform up-front and immediately passing the result to the |
| 282 | layer below. A lazy layer, by contrast, could implement addModuleSet by |
| 283 | squirreling away the modules doing no other up-front work, but applying the |
| 284 | transform (and calling addModuleSet on the layer below) when the client calls |
| 285 | findSymbol instead. The JIT'd program behavior will be the same either way, but |
| 286 | these choices will have different performance characteristics: Doing work |
| 287 | eagerly means the JIT takes longer up-front, but proceeds smoothly once this is |
| 288 | done. Deferring work allows the JIT to get up-and-running quickly, but will |
| 289 | force the JIT to pause and wait whenever some code or data is needed that hasn't |
| 290 | already been procesed. |
| 291 | |
| 292 | Our current REPL is eager: Each function definition is optimized and compiled as |
| 293 | soon as it's typed in. If we were to make the transform layer lazy (but not |
| 294 | change things otherwise) we could defer optimization until the first time we |
| 295 | reference a function in a top-level expression (see if you can figure out why, |
| 296 | then check out the answer below [1]_). In the next chapter, however we'll |
| 297 | introduce fully lazy compilation, in which function's aren't compiled until |
| 298 | they're first called at run-time. At this point the trade-offs get much more |
| 299 | interesting: the lazier we are, the quicker we can start executing the first |
| 300 | function, but the more often we'll have to pause to compile newly encountered |
| 301 | functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow |
| 302 | startup (which everything is optimized) but relatively short pauses as each |
| 303 | function just passes through code-gen. If we both optimize and code-gen lazily |
| 304 | we can start executing the first function more quickly, but we'll have longer |
| 305 | pauses as each function has to be both optimized and code-gen'd when it's first |
| 306 | executed. Things become even more interesting if we consider interproceedural |
| 307 | optimizations like inlining, which must be performed eagerly. These are |
| 308 | complex trade-offs, and there is no one-size-fits all solution to them, but by |
| 309 | providing composable layers we leave the decisions to the person implementing |
| 310 | the JIT, and make it easy for them to experiment with different configurations. |
| 311 | |
| 312 | `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 313 | |
| 314 | Full Code Listing |
| 315 | ================= |
| 316 | |
| 317 | Here is the complete code listing for our running example with an |
| 318 | IRTransformLayer added to enable optimization. To build this example, use: |
| 319 | |
| 320 | .. code-block:: bash |
| 321 | |
| 322 | # Compile |
| 323 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy |
| 324 | # Run |
| 325 | ./toy |
| 326 | |
| 327 | Here is the code: |
| 328 | |
| 329 | .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h |
| 330 | :language: c++ |
| 331 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 332 | .. [1] When we add our top-level expression to the JIT, any calls to functions |
| 333 | that we defined earlier will appear to the ObjectLinkingLayer as |
| 334 | external symbols. The ObjectLinkingLayer will call the SymbolResolver |
| 335 | that we defined in addModuleSet, which in turn calls findSymbol on the |
| 336 | OptimizeLayer, at which point even a lazy transform layer will have to |
| 337 | do its work. |