Lang Hames | e038aae | 2016-06-06 18:35:44 +0000 | [diff] [blame] | 1 | ===================================================================== |
| 2 | Building a JIT: Adding Optimizations -- An introduction to ORC Layers |
| 3 | ===================================================================== |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | **This tutorial is under active development. It is incomplete and details may |
| 9 | change frequently.** Nonetheless we invite you to try it out as it stands, and |
| 10 | we welcome any feedback. |
| 11 | |
| 12 | Chapter 2 Introduction |
| 13 | ====================== |
| 14 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 15 | Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In |
| 16 | `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT |
| 17 | class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce |
| 18 | executable code in memory. KaleidoscopeJIT was able to do this with relatively |
| 19 | little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and |
| 20 | ObjectLinkingLayer, to do much of the heavy lifting. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 21 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 22 | In this layer we'll learn more about the ORC layer concept by using a new layer, |
| 23 | IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 24 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 25 | Optimizing Modules using the IRTransformLayer |
| 26 | ============================================= |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 27 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 28 | In `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" |
| 29 | tutorial series the llvm *FunctionPassManager* is introduced as a means for |
| 30 | optimizing LLVM IR. Interested readers may read that chapter for details, but |
Lang Hames | 8705d11 | 2016-06-20 18:34:46 +0000 | [diff] [blame] | 31 | in short: to optimize a Module we create an llvm::FunctionPassManager |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 32 | instance, configure it with a set of optimizations, then run the PassManager on |
| 33 | a Module to mutate it into a (hopefully) more optimized but semantically |
| 34 | equivalent form. In the original tutorial series the FunctionPassManager was |
Lang Hames | 11c43d5 | 2016-06-20 18:37:52 +0000 | [diff] [blame] | 35 | created outside the KaleidoscopeJIT and modules were optimized before being |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 36 | added to it. In this Chapter we will make optimization a phase of our JIT |
Lang Hames | 11c43d5 | 2016-06-20 18:37:52 +0000 | [diff] [blame] | 37 | instead. For now this will provide us a motivation to learn more about ORC |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 38 | layers, but in the long term making optimization part of our JIT will yield an |
| 39 | important benefit: When we begin lazily compiling code (i.e. deferring |
| 40 | compilation of each function until the first time it's run), having |
| 41 | optimization managed by our JIT will allow us to optimize lazily too, rather |
| 42 | than having to do all our optimization up-front. |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 43 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 44 | To add optimization support to our JIT we will take the KaleidoscopeJIT from |
| 45 | Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the |
| 46 | IRTransformLayer works in more detail below, but the interface is simple: the |
| 47 | constructor for this layer takes a reference to the layer below (as all layers |
| 48 | do) plus an *IR optimization function* that it will apply to each Module that |
| 49 | is added via addModuleSet: |
| 50 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 51 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 52 | |
| 53 | class KaleidoscopeJIT { |
| 54 | private: |
| 55 | std::unique_ptr<TargetMachine> TM; |
| 56 | const DataLayout DL; |
| 57 | ObjectLinkingLayer<> ObjectLayer; |
| 58 | IRCompileLayer<decltype(ObjectLayer)> CompileLayer; |
| 59 | |
| 60 | typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)> |
| 61 | OptimizeFunction; |
| 62 | |
| 63 | IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer; |
| 64 | |
| 65 | public: |
| 66 | typedef decltype(OptimizeLayer)::ModuleSetHandleT ModuleHandle; |
| 67 | |
| 68 | KaleidoscopeJIT() |
| 69 | : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), |
| 70 | CompileLayer(ObjectLayer, SimpleCompiler(*TM)), |
| 71 | OptimizeLayer(CompileLayer, |
| 72 | [this](std::unique_ptr<Module> M) { |
| 73 | return optimizeModule(std::move(M)); |
| 74 | }) { |
| 75 | llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); |
| 76 | } |
| 77 | |
| 78 | Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, |
| 79 | but after the CompileLayer we introduce a typedef for our optimization function. |
| 80 | In this case we use a std::function (a handy wrapper for "function-like" things) |
| 81 | from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With |
| 82 | our optimization function typedef in place we can declare our OptimizeLayer, |
| 83 | which sits on top of our CompileLayer. |
| 84 | |
| 85 | To initialize our OptimizeLayer we pass it a reference to the CompileLayer |
| 86 | below (standard practice for layers), and we initialize the OptimizeFunction |
Lang Hames | 706db2e | 2016-06-06 18:07:23 +0000 | [diff] [blame] | 87 | using a lambda that calls out to an "optimizeModule" function that we will |
| 88 | define below. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 89 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 90 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 91 | |
| 92 | // ... |
| 93 | auto Resolver = createLambdaResolver( |
| 94 | [&](const std::string &Name) { |
| 95 | if (auto Sym = OptimizeLayer.findSymbol(Name, false)) |
Lang Hames | ad4a911 | 2016-08-01 20:49:11 +0000 | [diff] [blame] | 96 | return Sym; |
| 97 | return JITSymbol(nullptr); |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 98 | }, |
| 99 | // ... |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame] | 100 | |
| 101 | .. code-block:: c++ |
| 102 | |
| 103 | // ... |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 104 | return OptimizeLayer.addModuleSet(std::move(Ms), |
| 105 | make_unique<SectionMemoryManager>(), |
| 106 | std::move(Resolver)); |
| 107 | // ... |
| 108 | |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame] | 109 | .. code-block:: c++ |
| 110 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 111 | // ... |
| 112 | return OptimizeLayer.findSymbol(MangledNameStream.str(), true); |
| 113 | // ... |
| 114 | |
Lang Hames | 3242f65 | 2016-06-06 05:07:52 +0000 | [diff] [blame] | 115 | .. code-block:: c++ |
| 116 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 117 | // ... |
| 118 | OptimizeLayer.removeModuleSet(H); |
| 119 | // ... |
| 120 | |
| 121 | Next we need to replace references to 'CompileLayer' with references to |
| 122 | OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In |
| 123 | addModule we need to be careful to replace both references: the findSymbol call |
| 124 | inside our resolver, and the call through to addModuleSet. |
| 125 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 126 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 127 | |
| 128 | std::unique_ptr<Module> optimizeModule(std::unique_ptr<Module> M) { |
| 129 | // Create a function pass manager. |
| 130 | auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get()); |
| 131 | |
| 132 | // Add some optimizations. |
| 133 | FPM->add(createInstructionCombiningPass()); |
| 134 | FPM->add(createReassociatePass()); |
| 135 | FPM->add(createGVNPass()); |
| 136 | FPM->add(createCFGSimplificationPass()); |
| 137 | FPM->doInitialization(); |
| 138 | |
| 139 | // Run the optimizations over all functions in the module being added to |
| 140 | // the JIT. |
| 141 | for (auto &F : *M) |
| 142 | FPM->run(F); |
| 143 | |
| 144 | return M; |
| 145 | } |
| 146 | |
| 147 | At the bottom of our JIT we add a private method to do the actual optimization: |
| 148 | *optimizeModule*. This function sets up a FunctionPassManager, adds some passes |
| 149 | to it, runs it over every function in the module, and then returns the mutated |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 150 | module. The specific optimizations are the same ones used in |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 151 | `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 152 | tutorial series. Readers may visit that chapter for a more in-depth |
| 153 | discussion of these, and of IR optimization in general. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 154 | |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 155 | And that's it in terms of changes to KaleidoscopeJIT: When a module is added via |
| 156 | addModule the OptimizeLayer will call our optimizeModule function before passing |
| 157 | the transformed module on to the CompileLayer below. Of course, we could have |
| 158 | called optimizeModule directly in our addModule function and not gone to the |
| 159 | bother of using the IRTransformLayer, but doing so gives us another opportunity |
| 160 | to see how layers compose. It also provides a neat entry point to the *layer* |
| 161 | concept itself, because IRTransformLayer turns out to be one of the simplest |
| 162 | implementations of the layer concept that can be devised: |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 163 | |
Lang Hames | 38eb031 | 2016-06-06 04:53:59 +0000 | [diff] [blame] | 164 | .. code-block:: c++ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 165 | |
| 166 | template <typename BaseLayerT, typename TransformFtor> |
| 167 | class IRTransformLayer { |
| 168 | public: |
| 169 | typedef typename BaseLayerT::ModuleSetHandleT ModuleSetHandleT; |
| 170 | |
| 171 | IRTransformLayer(BaseLayerT &BaseLayer, |
| 172 | TransformFtor Transform = TransformFtor()) |
| 173 | : BaseLayer(BaseLayer), Transform(std::move(Transform)) {} |
| 174 | |
| 175 | template <typename ModuleSetT, typename MemoryManagerPtrT, |
| 176 | typename SymbolResolverPtrT> |
| 177 | ModuleSetHandleT addModuleSet(ModuleSetT Ms, |
| 178 | MemoryManagerPtrT MemMgr, |
| 179 | SymbolResolverPtrT Resolver) { |
| 180 | |
| 181 | for (auto I = Ms.begin(), E = Ms.end(); I != E; ++I) |
| 182 | *I = Transform(std::move(*I)); |
| 183 | |
| 184 | return BaseLayer.addModuleSet(std::move(Ms), std::move(MemMgr), |
| 185 | std::move(Resolver)); |
| 186 | } |
| 187 | |
| 188 | void removeModuleSet(ModuleSetHandleT H) { BaseLayer.removeModuleSet(H); } |
| 189 | |
| 190 | JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) { |
| 191 | return BaseLayer.findSymbol(Name, ExportedSymbolsOnly); |
| 192 | } |
| 193 | |
| 194 | JITSymbol findSymbolIn(ModuleSetHandleT H, const std::string &Name, |
| 195 | bool ExportedSymbolsOnly) { |
| 196 | return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly); |
| 197 | } |
| 198 | |
| 199 | void emitAndFinalize(ModuleSetHandleT H) { |
| 200 | BaseLayer.emitAndFinalize(H); |
| 201 | } |
| 202 | |
| 203 | TransformFtor& getTransform() { return Transform; } |
| 204 | |
| 205 | const TransformFtor& getTransform() const { return Transform; } |
| 206 | |
| 207 | private: |
| 208 | BaseLayerT &BaseLayer; |
| 209 | TransformFtor Transform; |
| 210 | }; |
| 211 | |
| 212 | This is the whole definition of IRTransformLayer, from |
| 213 | ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its |
| 214 | comments. It is a template class with two template arguments: ``BaesLayerT`` and |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 215 | ``TransformFtor`` that provide the type of the base layer and the type of the |
| 216 | "transform functor" (in our case a std::function) respectively. This class is |
| 217 | concerned with two very simple jobs: (1) Running every IR Module that is added |
| 218 | with addModuleSet through the transform functor, and (2) conforming to the ORC |
| 219 | layer interface. The interface consists of one typedef and five methods: |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 220 | |
Lang Hames | d29ee53 | 2016-06-06 18:22:47 +0000 | [diff] [blame] | 221 | +------------------+-----------------------------------------------------------+ |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 222 | | Interface | Description | |
| 223 | +==================+===========================================================+ |
| 224 | | | Provides a handle that can be used to identify a module | |
| 225 | | ModuleSetHandleT | set when calling findSymbolIn, removeModuleSet, or | |
| 226 | | | emitAndFinalize. | |
| 227 | +------------------+-----------------------------------------------------------+ |
| 228 | | | Takes a given set of Modules and makes them "available | |
| 229 | | | for execution. This means that symbols in those modules | |
| 230 | | | should be searchable via findSymbol and findSymbolIn, and | |
| 231 | | | the address of the symbols should be read/writable (for | |
| 232 | | | data symbols), or executable (for function symbols) after | |
| 233 | | | JITSymbol::getAddress() is called. Note: This means that | |
| 234 | | addModuleSet | addModuleSet doesn't have to compile (or do any other | |
| 235 | | | work) up-front. It *can*, like IRCompileLayer, act | |
| 236 | | | eagerly, but it can also simply record the module and | |
| 237 | | | take no further action until somebody calls | |
| 238 | | | JITSymbol::getAddress(). In IRTransformLayer's case | |
| 239 | | | addModuleSet eagerly applies the transform functor to | |
| 240 | | | each module in the set, then passes the resulting set | |
| 241 | | | of mutated modules down to the layer below. | |
| 242 | +------------------+-----------------------------------------------------------+ |
| 243 | | | Removes a set of modules from the JIT. Code or data | |
| 244 | | removeModuleSet | defined in these modules will no longer be available, and | |
| 245 | | | the memory holding the JIT'd definitions will be freed. | |
| 246 | +------------------+-----------------------------------------------------------+ |
| 247 | | | Searches for the named symbol in all modules that have | |
| 248 | | | previously been added via addModuleSet (and not yet | |
| 249 | | findSymbol | removed by a call to removeModuleSet). In | |
| 250 | | | IRTransformLayer we just pass the query on to the layer | |
| 251 | | | below. In our REPL this is our default way to search for | |
| 252 | | | function definitions. | |
| 253 | +------------------+-----------------------------------------------------------+ |
| 254 | | | Searches for the named symbol in the module set indicated | |
| 255 | | | by the given ModuleSetHandleT. This is just an optimized | |
| 256 | | | search, better for lookup-speed when you know exactly | |
| 257 | | | a symbol definition should be found. In IRTransformLayer | |
| 258 | | findSymbolIn | we just pass this query on to the layer below. In our | |
| 259 | | | REPL we use this method to search for functions | |
| 260 | | | representing top-level expressions, since we know exactly | |
| 261 | | | where we'll find them: in the top-level expression module | |
| 262 | | | we just added. | |
| 263 | +------------------+-----------------------------------------------------------+ |
| 264 | | | Forces all of the actions required to make the code and | |
| 265 | | | data in a module set (represented by a ModuleSetHandleT) | |
| 266 | | | accessible. Behaves as if some symbol in the set had been | |
| 267 | | | searched for and JITSymbol::getSymbolAddress called. This | |
| 268 | | emitAndFinalize | is rarely needed, but can be useful when dealing with | |
| 269 | | | layers that usually behave lazily if the user wants to | |
| 270 | | | trigger early compilation (for example, to use idle CPU | |
| 271 | | | time to eagerly compile code in the background). | |
| 272 | +------------------+-----------------------------------------------------------+ |
| 273 | |
| 274 | This interface attempts to capture the natural operations of a JIT (with some |
| 275 | wrinkles like emitAndFinalize for performance), similar to the basic JIT API |
| 276 | operations we identified in Chapter 1. Conforming to the layer concept allows |
| 277 | classes to compose neatly by implementing their behaviors in terms of the these |
| 278 | same operations, carried out on the layer below. For example, an eager layer |
| 279 | (like IRTransformLayer) can implement addModuleSet by running each module in the |
| 280 | set through its transform up-front and immediately passing the result to the |
| 281 | layer below. A lazy layer, by contrast, could implement addModuleSet by |
| 282 | squirreling away the modules doing no other up-front work, but applying the |
| 283 | transform (and calling addModuleSet on the layer below) when the client calls |
| 284 | findSymbol instead. The JIT'd program behavior will be the same either way, but |
| 285 | these choices will have different performance characteristics: Doing work |
| 286 | eagerly means the JIT takes longer up-front, but proceeds smoothly once this is |
| 287 | done. Deferring work allows the JIT to get up-and-running quickly, but will |
| 288 | force the JIT to pause and wait whenever some code or data is needed that hasn't |
Sylvestre Ledru | 7d54050 | 2016-07-02 19:28:40 +0000 | [diff] [blame] | 289 | already been processed. |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 290 | |
| 291 | Our current REPL is eager: Each function definition is optimized and compiled as |
| 292 | soon as it's typed in. If we were to make the transform layer lazy (but not |
| 293 | change things otherwise) we could defer optimization until the first time we |
| 294 | reference a function in a top-level expression (see if you can figure out why, |
| 295 | then check out the answer below [1]_). In the next chapter, however we'll |
| 296 | introduce fully lazy compilation, in which function's aren't compiled until |
| 297 | they're first called at run-time. At this point the trade-offs get much more |
| 298 | interesting: the lazier we are, the quicker we can start executing the first |
| 299 | function, but the more often we'll have to pause to compile newly encountered |
| 300 | functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow |
| 301 | startup (which everything is optimized) but relatively short pauses as each |
| 302 | function just passes through code-gen. If we both optimize and code-gen lazily |
| 303 | we can start executing the first function more quickly, but we'll have longer |
| 304 | pauses as each function has to be both optimized and code-gen'd when it's first |
| 305 | executed. Things become even more interesting if we consider interproceedural |
| 306 | optimizations like inlining, which must be performed eagerly. These are |
| 307 | complex trade-offs, and there is no one-size-fits all solution to them, but by |
| 308 | providing composable layers we leave the decisions to the person implementing |
| 309 | the JIT, and make it easy for them to experiment with different configurations. |
| 310 | |
| 311 | `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ |
Lang Hames | be84d2be | 2016-05-26 00:38:04 +0000 | [diff] [blame] | 312 | |
| 313 | Full Code Listing |
| 314 | ================= |
| 315 | |
| 316 | Here is the complete code listing for our running example with an |
| 317 | IRTransformLayer added to enable optimization. To build this example, use: |
| 318 | |
| 319 | .. code-block:: bash |
| 320 | |
| 321 | # Compile |
| 322 | clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy |
| 323 | # Run |
| 324 | ./toy |
| 325 | |
| 326 | Here is the code: |
| 327 | |
| 328 | .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h |
| 329 | :language: c++ |
| 330 | |
Lang Hames | c499d2a | 2016-06-06 03:28:12 +0000 | [diff] [blame] | 331 | .. [1] When we add our top-level expression to the JIT, any calls to functions |
| 332 | that we defined earlier will appear to the ObjectLinkingLayer as |
| 333 | external symbols. The ObjectLinkingLayer will call the SymbolResolver |
| 334 | that we defined in addModuleSet, which in turn calls findSymbol on the |
| 335 | OptimizeLayer, at which point even a lazy transform layer will have to |
| 336 | do its work. |