Bill Wendling | a3a2eb0 | 2012-06-20 10:08:02 +0000 | [diff] [blame] | 1 | .. _lto: |
| 2 | |
| 3 | ====================================================== |
| 4 | LLVM Link Time Optimization: Design and Implementation |
| 5 | ====================================================== |
| 6 | |
| 7 | .. contents:: |
| 8 | :local: |
| 9 | |
| 10 | Description |
| 11 | =========== |
| 12 | |
| 13 | LLVM features powerful intermodular optimizations which can be used at link |
| 14 | time. Link Time Optimization (LTO) is another name for intermodular |
| 15 | optimization when performed during the link stage. This document describes the |
| 16 | interface and design between the LTO optimizer and the linker. |
| 17 | |
| 18 | Design Philosophy |
| 19 | ================= |
| 20 | |
| 21 | The LLVM Link Time Optimizer provides complete transparency, while doing |
| 22 | intermodular optimization, in the compiler tool chain. Its main goal is to let |
| 23 | the developer take advantage of intermodular optimizations without making any |
| 24 | significant changes to the developer's makefiles or build system. This is |
| 25 | achieved through tight integration with the linker. In this model, the linker |
| 26 | treates LLVM bitcode files like native object files and allows mixing and |
| 27 | matching among them. The linker uses `libLTO`_, a shared object, to handle LLVM |
| 28 | bitcode files. This tight integration between the linker and LLVM optimizer |
| 29 | helps to do optimizations that are not possible in other models. The linker |
| 30 | input allows the optimizer to avoid relying on conservative escape analysis. |
| 31 | |
| 32 | Example of link time optimization |
| 33 | --------------------------------- |
| 34 | |
| 35 | The following example illustrates the advantages of LTO's integrated approach |
| 36 | and clean interface. This example requires a system linker which supports LTO |
| 37 | through the interface described in this document. Here, clang transparently |
| 38 | invokes system linker. |
| 39 | |
| 40 | * Input source file ``a.c`` is compiled into LLVM bitcode form. |
| 41 | * Input source file ``main.c`` is compiled into native object code. |
| 42 | |
| 43 | .. code-block:: c++ |
| 44 | |
| 45 | --- a.h --- |
| 46 | extern int foo1(void); |
| 47 | extern void foo2(void); |
| 48 | extern void foo4(void); |
| 49 | |
| 50 | --- a.c --- |
| 51 | #include "a.h" |
| 52 | |
| 53 | static signed int i = 0; |
| 54 | |
| 55 | void foo2(void) { |
| 56 | i = -1; |
| 57 | } |
| 58 | |
| 59 | static int foo3() { |
| 60 | foo4(); |
| 61 | return 10; |
| 62 | } |
| 63 | |
| 64 | int foo1(void) { |
| 65 | int data = 0; |
| 66 | |
| 67 | if (i < 0) |
| 68 | data = foo3(); |
| 69 | |
| 70 | data = data + 42; |
| 71 | return data; |
| 72 | } |
| 73 | |
| 74 | --- main.c --- |
| 75 | #include <stdio.h> |
| 76 | #include "a.h" |
| 77 | |
| 78 | void foo4(void) { |
| 79 | printf("Hi\n"); |
| 80 | } |
| 81 | |
| 82 | int main() { |
| 83 | return foo1(); |
| 84 | } |
| 85 | |
| 86 | .. code-block:: bash |
| 87 | |
| 88 | --- command lines --- |
| 89 | % clang -emit-llvm -c a.c -o a.o # <-- a.o is LLVM bitcode file |
| 90 | % clang -c main.c -o main.o # <-- main.o is native object file |
| 91 | % clang a.o main.o -o main # <-- standard link command without modifications |
| 92 | |
| 93 | * In this example, the linker recognizes that ``foo2()`` is an externally |
| 94 | visible symbol defined in LLVM bitcode file. The linker completes its usual |
| 95 | symbol resolution pass and finds that ``foo2()`` is not used |
| 96 | anywhere. This information is used by the LLVM optimizer and it |
| 97 | removes ``foo2()``.</li> |
| 98 | |
| 99 | * As soon as ``foo2()`` is removed, the optimizer recognizes that condition ``i |
| 100 | < 0`` is always false, which means ``foo3()`` is never used. Hence, the |
| 101 | optimizer also removes ``foo3()``. |
| 102 | |
| 103 | * And this in turn, enables linker to remove ``foo4()``. |
| 104 | |
| 105 | This example illustrates the advantage of tight integration with the |
| 106 | linker. Here, the optimizer can not remove ``foo3()`` without the linker's |
| 107 | input. |
| 108 | |
| 109 | Alternative Approaches |
| 110 | ---------------------- |
| 111 | |
| 112 | **Compiler driver invokes link time optimizer separately.** |
| 113 | In this model the link time optimizer is not able to take advantage of |
| 114 | information collected during the linker's normal symbol resolution phase. |
| 115 | In the above example, the optimizer can not remove ``foo2()`` without the |
| 116 | linker's input because it is externally visible. This in turn prohibits the |
| 117 | optimizer from removing ``foo3()``. |
| 118 | |
| 119 | **Use separate tool to collect symbol information from all object files.** |
| 120 | In this model, a new, separate, tool or library replicates the linker's |
| 121 | capability to collect information for link time optimization. Not only is |
| 122 | this code duplication difficult to justify, but it also has several other |
| 123 | disadvantages. For example, the linking semantics and the features provided |
| 124 | by the linker on various platform are not unique. This means, this new tool |
| 125 | needs to support all such features and platforms in one super tool or a |
| 126 | separate tool per platform is required. This increases maintenance cost for |
| 127 | link time optimizer significantly, which is not necessary. This approach |
| 128 | also requires staying synchronized with linker developements on various |
| 129 | platforms, which is not the main focus of the link time optimizer. Finally, |
| 130 | this approach increases end user's build time due to the duplication of work |
| 131 | done by this separate tool and the linker itself. |
| 132 | |
| 133 | Multi-phase communication between ``libLTO`` and linker |
| 134 | ======================================================= |
| 135 | |
| 136 | The linker collects information about symbol defininitions and uses in various |
| 137 | link objects which is more accurate than any information collected by other |
| 138 | tools during typical build cycles. The linker collects this information by |
| 139 | looking at the definitions and uses of symbols in native .o files and using |
| 140 | symbol visibility information. The linker also uses user-supplied information, |
| 141 | such as a list of exported symbols. LLVM optimizer collects control flow |
| 142 | information, data flow information and knows much more about program structure |
| 143 | from the optimizer's point of view. Our goal is to take advantage of tight |
| 144 | integration between the linker and the optimizer by sharing this information |
| 145 | during various linking phases. |
| 146 | |
| 147 | Phase 1 : Read LLVM Bitcode Files |
| 148 | --------------------------------- |
| 149 | |
| 150 | The linker first reads all object files in natural order and collects symbol |
| 151 | information. This includes native object files as well as LLVM bitcode files. |
| 152 | To minimize the cost to the linker in the case that all .o files are native |
| 153 | object files, the linker only calls ``lto_module_create()`` when a supplied |
| 154 | object file is found to not be a native object file. If ``lto_module_create()`` |
| 155 | returns that the file is an LLVM bitcode file, the linker then iterates over the |
| 156 | module using ``lto_module_get_symbol_name()`` and |
| 157 | ``lto_module_get_symbol_attribute()`` to get all symbols defined and referenced. |
| 158 | This information is added to the linker's global symbol table. |
| 159 | |
| 160 | |
| 161 | The lto* functions are all implemented in a shared object libLTO. This allows |
| 162 | the LLVM LTO code to be updated independently of the linker tool. On platforms |
| 163 | that support it, the shared object is lazily loaded. |
| 164 | |
| 165 | Phase 2 : Symbol Resolution |
| 166 | --------------------------- |
| 167 | |
| 168 | In this stage, the linker resolves symbols using global symbol table. It may |
| 169 | report undefined symbol errors, read archive members, replace weak symbols, etc. |
| 170 | The linker is able to do this seamlessly even though it does not know the exact |
| 171 | content of input LLVM bitcode files. If dead code stripping is enabled then the |
| 172 | linker collects the list of live symbols. |
| 173 | |
| 174 | Phase 3 : Optimize Bitcode Files |
| 175 | -------------------------------- |
| 176 | |
| 177 | After symbol resolution, the linker tells the LTO shared object which symbols |
| 178 | are needed by native object files. In the example above, the linker reports |
| 179 | that only ``foo1()`` is used by native object files using |
| 180 | ``lto_codegen_add_must_preserve_symbol()``. Next the linker invokes the LLVM |
| 181 | optimizer and code generators using ``lto_codegen_compile()`` which returns a |
| 182 | native object file creating by merging the LLVM bitcode files and applying |
| 183 | various optimization passes. |
| 184 | |
| 185 | Phase 4 : Symbol Resolution after optimization |
| 186 | ---------------------------------------------- |
| 187 | |
| 188 | In this phase, the linker reads optimized a native object file and updates the |
| 189 | internal global symbol table to reflect any changes. The linker also collects |
| 190 | information about any changes in use of external symbols by LLVM bitcode |
| 191 | files. In the example above, the linker notes that ``foo4()`` is not used any |
| 192 | more. If dead code stripping is enabled then the linker refreshes the live |
| 193 | symbol information appropriately and performs dead code stripping. |
| 194 | |
| 195 | After this phase, the linker continues linking as if it never saw LLVM bitcode |
| 196 | files. |
| 197 | |
| 198 | .. _libLTO: |
| 199 | |
| 200 | ``libLTO`` |
| 201 | ========== |
| 202 | |
| 203 | ``libLTO`` is a shared object that is part of the LLVM tools, and is intended |
| 204 | for use by a linker. ``libLTO`` provides an abstract C interface to use the LLVM |
| 205 | interprocedural optimizer without exposing details of LLVM's internals. The |
| 206 | intention is to keep the interface as stable as possible even when the LLVM |
| 207 | optimizer continues to evolve. It should even be possible for a completely |
| 208 | different compilation technology to provide a different libLTO that works with |
| 209 | their object files and the standard linker tool. |
| 210 | |
| 211 | ``lto_module_t`` |
| 212 | ---------------- |
| 213 | |
| 214 | A non-native object file is handled via an ``lto_module_t``. The following |
| 215 | functions allow the linker to check if a file (on disk or in a memory buffer) is |
| 216 | a file which libLTO can process: |
| 217 | |
| 218 | .. code-block:: c |
| 219 | |
| 220 | lto_module_is_object_file(const char*) |
| 221 | lto_module_is_object_file_for_target(const char*, const char*) |
| 222 | lto_module_is_object_file_in_memory(const void*, size_t) |
| 223 | lto_module_is_object_file_in_memory_for_target(const void*, size_t, const char*) |
| 224 | |
| 225 | If the object file can be processed by ``libLTO``, the linker creates a |
| 226 | ``lto_module_t`` by using one of: |
| 227 | |
| 228 | .. code-block:: c |
| 229 | |
| 230 | lto_module_create(const char*) |
| 231 | lto_module_create_from_memory(const void*, size_t) |
| 232 | |
| 233 | and when done, the handle is released via |
| 234 | |
| 235 | .. code-block:: c |
| 236 | |
| 237 | lto_module_dispose(lto_module_t) |
| 238 | |
| 239 | |
| 240 | The linker can introspect the non-native object file by getting the number of |
| 241 | symbols and getting the name and attributes of each symbol via: |
| 242 | |
| 243 | .. code-block:: c |
| 244 | |
| 245 | lto_module_get_num_symbols(lto_module_t) |
| 246 | lto_module_get_symbol_name(lto_module_t, unsigned int) |
| 247 | lto_module_get_symbol_attribute(lto_module_t, unsigned int) |
| 248 | |
| 249 | The attributes of a symbol include the alignment, visibility, and kind. |
| 250 | |
| 251 | ``lto_code_gen_t`` |
| 252 | ------------------ |
| 253 | |
| 254 | Once the linker has loaded each non-native object files into an |
| 255 | ``lto_module_t``, it can request ``libLTO`` to process them all and generate a |
| 256 | native object file. This is done in a couple of steps. First, a code generator |
| 257 | is created with: |
| 258 | |
| 259 | .. code-block:: c |
| 260 | |
| 261 | lto_codegen_create() |
| 262 | |
| 263 | Then, each non-native object file is added to the code generator with: |
| 264 | |
| 265 | .. code-block:: c |
| 266 | |
| 267 | lto_codegen_add_module(lto_code_gen_t, lto_module_t) |
| 268 | |
| 269 | The linker then has the option of setting some codegen options. Whether or not |
| 270 | to generate DWARF debug info is set with: |
| 271 | |
| 272 | .. code-block:: c |
| 273 | |
| 274 | lto_codegen_set_debug_model(lto_code_gen_t) |
| 275 | |
| 276 | Which kind of position independence is set with: |
| 277 | |
| 278 | .. code-block:: c |
| 279 | |
| 280 | lto_codegen_set_pic_model(lto_code_gen_t) |
| 281 | |
| 282 | And each symbol that is referenced by a native object file or otherwise must not |
| 283 | be optimized away is set with: |
| 284 | |
| 285 | .. code-block:: c |
| 286 | |
| 287 | lto_codegen_add_must_preserve_symbol(lto_code_gen_t, const char*) |
| 288 | |
| 289 | After all these settings are done, the linker requests that a native object file |
| 290 | be created from the modules with the settings using: |
| 291 | |
| 292 | .. code-block:: c |
| 293 | |
| 294 | lto_codegen_compile(lto_code_gen_t, size*) |
| 295 | |
| 296 | which returns a pointer to a buffer containing the generated native object file. |
| 297 | The linker then parses that and links it with the rest of the native object |
| 298 | files. |