Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 1 | ===================================== |
| 2 | Performance Tips for Frontend Authors |
| 3 | ===================================== |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | :depth: 2 |
| 8 | |
| 9 | Abstract |
| 10 | ======== |
| 11 | |
| 12 | The intended audience of this document is developers of language frontends |
| 13 | targeting LLVM IR. This document is home to a collection of tips on how to |
| 14 | generate IR that optimizes well. As with any optimizer, LLVM has its strengths |
| 15 | and weaknesses. In some cases, surprisingly small changes in the source IR |
| 16 | can have a large effect on the generated code. |
| 17 | |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 18 | IR Best Practices |
| 19 | ================= |
| 20 | |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 21 | Avoid loads and stores of large aggregate type |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 22 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 23 | |
| 24 | LLVM currently does not optimize well loads and stores of large :ref:`aggregate |
| 25 | types <t_aggregate>` (i.e. structs and arrays). As an alternative, consider |
| 26 | loading individual fields from memory. |
| 27 | |
| 28 | Aggregates that are smaller than the largest (performant) load or store |
| 29 | instruction supported by the targeted hardware are well supported. These can |
| 30 | be an effective way to represent collections of small packed fields. |
| 31 | |
| 32 | Prefer zext over sext when legal |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 33 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 34 | |
| 35 | On some architectures (X86_64 is one), sign extension can involve an extra |
| 36 | instruction whereas zero extension can be folded into a load. LLVM will try to |
| 37 | replace a sext with a zext when it can be proven safe, but if you have |
| 38 | information in your source language about the range of a integer value, it can |
| 39 | be profitable to use a zext rather than a sext. |
| 40 | |
| 41 | Alternatively, you can :ref:`specify the range of the value using metadata |
| 42 | <range-metadata>` and LLVM can do the sext to zext conversion for you. |
| 43 | |
| 44 | Zext GEP indices to machine register width |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 45 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 46 | |
| 47 | Internally, LLVM often promotes the width of GEP indices to machine register |
| 48 | width. When it does so, it will default to using sign extension (sext) |
| 49 | operations for safety. If your source language provides information about |
| 50 | the range of the index, you may wish to manually extend indices to machine |
| 51 | register width using a zext instruction. |
| 52 | |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 53 | Other Things to Consider |
| 54 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | dd323ac | 2015-03-02 19:19:04 +0000 | [diff] [blame] | 55 | |
| 56 | #. Make sure that a DataLayout is provided (this will likely become required in |
| 57 | the near future, but is certainly important for optimization). |
| 58 | |
Philip Reames | dd323ac | 2015-03-02 19:19:04 +0000 | [diff] [blame] | 59 | #. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing |
| 60 | analysis), prefer GEPs |
| 61 | |
Philip Reames | dd323ac | 2015-03-02 19:19:04 +0000 | [diff] [blame] | 62 | #. Use the "most-private" possible linkage types for the functions being defined |
| 63 | (private, internal or linkonce_odr preferably) |
| 64 | |
Philip Reames | dd323ac | 2015-03-02 19:19:04 +0000 | [diff] [blame] | 65 | #. Prefer globals over inttoptr of a constant address - this gives you |
| 66 | dereferencability information. In MCJIT, use getSymbolAddress to provide |
| 67 | actual address. |
| 68 | |
| 69 | #. Be wary of ordered and atomic memory operations. They are hard to optimize |
| 70 | and may not be well optimized by the current optimizer. Depending on your |
| 71 | source language, you may consider using fences instead. |
| 72 | |
Philip Reames | 34843ae | 2015-03-05 05:55:55 +0000 | [diff] [blame] | 73 | #. If calling a function which is known to throw an exception (unwind), use |
| 74 | an invoke with a normal destination which contains an unreachable |
| 75 | instruction. This form conveys to the optimizer that the call returns |
| 76 | abnormally. For an invoke which neither returns normally or requires unwind |
| 77 | code in the current function, you can use a noreturn call instruction if |
| 78 | desired. This is generally not required because the optimizer will convert |
| 79 | an invoke with an unreachable unwind destination to a call instruction. |
| 80 | |
Philip Reames | 34843ae | 2015-03-05 05:55:55 +0000 | [diff] [blame] | 81 | #. Use profile metadata to indicate statically known cold paths, even if |
| 82 | dynamic profiling information is not available. This can make a large |
| 83 | difference in code placement and thus the performance of tight loops. |
| 84 | |
| 85 | #. When generating code for loops, try to avoid terminating the header block of |
| 86 | the loop earlier than necessary. If the terminator of the loop header |
| 87 | block is a loop exiting conditional branch, the effectiveness of LICM will |
| 88 | be limited for loads not in the header. (This is due to the fact that LLVM |
| 89 | may not know such a load is safe to speculatively execute and thus can't |
| 90 | lift an otherwise loop invariant load unless it can prove the exiting |
| 91 | condition is not taken.) It can be profitable, in some cases, to emit such |
| 92 | instructions into the header even if they are not used along a rarely |
| 93 | executed path that exits the loop. This guidance specifically does not |
| 94 | apply if the condition which terminates the loop header is itself invariant, |
| 95 | or can be easily discharged by inspecting the loop index variables. |
| 96 | |
| 97 | #. In hot loops, consider duplicating instructions from small basic blocks |
| 98 | which end in highly predictable terminators into their successor blocks. |
| 99 | If a hot successor block contains instructions which can be vectorized |
| 100 | with the duplicated ones, this can provide a noticeable throughput |
| 101 | improvement. Note that this is not always profitable and does involve a |
| 102 | potentially large increase in code size. |
| 103 | |
| 104 | #. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds |
| 105 | of predecessors). Among other issues, the register allocator is known to |
| 106 | perform badly with confronted with such structures. The only exception to |
| 107 | this guidance is that a unified return block with high in-degree is fine. |
| 108 | |
Philip Reames | 65f3359 | 2015-04-26 22:15:18 +0000 | [diff] [blame] | 109 | #. When checking a value against a constant, emit the check using a consistent |
Philip Reames | 5b07572 | 2015-04-26 22:25:29 +0000 | [diff] [blame] | 110 | comparison type. The GVN pass *will* optimize redundant equalities even if |
Philip Reames | 65f3359 | 2015-04-26 22:15:18 +0000 | [diff] [blame] | 111 | the type of comparison is inverted, but GVN only runs late in the pipeline. |
Philip Reames | e0e9083 | 2015-04-26 22:23:12 +0000 | [diff] [blame] | 112 | As a result, you may miss the opportunity to run other important |
Philip Reames | 65f3359 | 2015-04-26 22:15:18 +0000 | [diff] [blame] | 113 | optimizations. Improvements to EarlyCSE to remove this issue are tracked in |
| 114 | Bug 23333. |
| 115 | |
Philip Reames | 5b07572 | 2015-04-26 22:25:29 +0000 | [diff] [blame] | 116 | #. Avoid using arithmetic intrinsics unless you are *required* by your source |
Philip Reames | 65f3359 | 2015-04-26 22:15:18 +0000 | [diff] [blame] | 117 | language specification to emit a particular code sequence. The optimizer |
| 118 | is quite good at reasoning about general control flow and arithmetic, it is |
| 119 | not anywhere near as strong at reasoning about the various intrinsics. If |
| 120 | profitable for code generation purposes, the optimizer will likely form the |
Philip Reames | 5b07572 | 2015-04-26 22:25:29 +0000 | [diff] [blame] | 121 | intrinsics itself late in the optimization pipeline. It is *very* rarely |
Philip Reames | 65f3359 | 2015-04-26 22:15:18 +0000 | [diff] [blame] | 122 | profitable to emit these directly in the language frontend. This item |
| 123 | explicitly includes the use of the :ref:`overflow intrinsics <int_overflow>`. |
| 124 | |
Philip Reames | e0e9083 | 2015-04-26 22:23:12 +0000 | [diff] [blame] | 125 | #. Avoid using the :ref:`assume intrinsic <int_assume>` until you've |
| 126 | established that a) there's no other way to express the given fact and b) |
| 127 | that fact is critical for optimization purposes. Assumes are a great |
| 128 | prototyping mechanism, but they can have negative effects on both compile |
| 129 | time and optimization effectiveness. The former is fixable with enough |
| 130 | effort, but the later is fairly fundamental to their designed purpose. |
| 131 | |
Philip Reames | dd323ac | 2015-03-02 19:19:04 +0000 | [diff] [blame] | 132 | |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 133 | Describing Language Specific Properties |
| 134 | ======================================= |
| 135 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 136 | When translating a source language to LLVM, finding ways to express concepts |
| 137 | and guarantees available in your source language which are not natively |
| 138 | provided by LLVM IR will greatly improve LLVM's ability to optimize your code. |
| 139 | As an example, C/C++'s ability to mark every add as "no signed wrap (nsw)" goes |
| 140 | a long way to assisting the optimizer in reasoning about loop induction |
| 141 | variables and thus generating more optimal code for loops. |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 142 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 143 | The LLVM LangRef includes a number of mechanisms for annotating the IR with |
| 144 | additional semantic information. It is *strongly* recommended that you become |
| 145 | highly familiar with this document. The list below is intended to highlight a |
| 146 | couple of items of particular interest, but is by no means exhaustive. |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 147 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 148 | Restricted Operation Semantics |
| 149 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 150 | #. Add nsw/nuw flags as appropriate. Reasoning about overflow is |
| 151 | generally hard for an optimizer so providing these facts from the frontend |
| 152 | can be very impactful. |
| 153 | |
| 154 | #. Use fast-math flags on floating point operations if legal. If you don't |
| 155 | need strict IEEE floating point semantics, there are a number of additional |
| 156 | optimizations that can be performed. This can be highly impactful for |
| 157 | floating point intensive computations. |
| 158 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 159 | Describing Aliasing Properties |
| 160 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 161 | |
| 162 | #. Add noalias/align/dereferenceable/nonnull to function arguments and return |
| 163 | values as appropriate |
| 164 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 165 | #. Use pointer aliasing metadata, especially tbaa metadata, to communicate |
| 166 | otherwise-non-deducible pointer aliasing facts |
| 167 | |
| 168 | #. Use inbounds on geps. This can help to disambiguate some aliasing queries. |
| 169 | |
| 170 | |
| 171 | Modeling Memory Effects |
| 172 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
| 173 | |
| 174 | #. Mark functions as readnone/readonly/argmemonly or noreturn/nounwind when |
| 175 | known. The optimizer will try to infer these flags, but may not always be |
| 176 | able to. Manual annotations are particularly important for external |
| 177 | functions that the optimizer can not analyze. |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 178 | |
| 179 | #. Use the lifetime.start/lifetime.end and invariant.start/invariant.end |
| 180 | intrinsics where possible. Common profitable uses are for stack like data |
| 181 | structures (thus allowing dead store elimination) and for describing |
| 182 | life times of allocas (thus allowing smaller stack sizes). |
| 183 | |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 184 | #. Mark invariant locations using !invariant.load and TBAA's constant flags |
| 185 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 186 | Pass Ordering |
| 187 | ^^^^^^^^^^^^^ |
| 188 | |
| 189 | One of the most common mistakes made by new language frontend projects is to |
| 190 | use the existing -O2 or -O3 pass pipelines as is. These pass pipelines make a |
| 191 | good starting point for an optimizing compiler for any language, but they have |
| 192 | been carefully tuned for C and C++, not your target language. You will almost |
| 193 | certainly need to use a custom pass order to achieve optimal performance. A |
| 194 | couple specific suggestions: |
Philip Reames | a3bf52c | 2015-08-24 17:19:18 +0000 | [diff] [blame] | 195 | |
| 196 | #. For languages with numerous rarely executed guard conditions (e.g. null |
| 197 | checks, type checks, range checks) consider adding an extra execution or |
| 198 | two of LoopUnswith and LICM to your pass order. The standard pass order, |
| 199 | which is tuned for C and C++ applications, may not be sufficient to remove |
| 200 | all dischargeable checks from loops. |
| 201 | |
Philip Reames | aa297ea | 2015-08-24 17:38:58 +0000 | [diff] [blame] | 202 | #. If you language uses range checks, consider using the IRCE pass. It is not |
| 203 | currently part of the standard pass order. |
| 204 | |
| 205 | #. A useful sanity check to run is to run your optimized IR back through the |
| 206 | -O2 pipeline again. If you see noticeable improvement in the resulting IR, |
| 207 | you likely need to adjust your pass order. |
| 208 | |
| 209 | |
| 210 | I Still Can't Find What I'm Looking For |
| 211 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 212 | |
| 213 | If you didn't find what you were looking for above, consider proposing an piece |
| 214 | of metadata which provides the optimization hint you need. Such extensions are |
| 215 | relatively common and are generally well received by the community. You will |
| 216 | need to ensure that your proposal is sufficiently general so that it benefits |
| 217 | others if you wish to contribute it upstream. |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 218 | |
Philip Reames | 7223a7f | 2015-08-24 17:46:11 +0000 | [diff] [blame^] | 219 | You should also consider describing the problem you're facing on `llvm-dev |
| 220 | <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ and asking for advice. |
| 221 | It's entirely possible someone has encountered your problem before and can |
| 222 | give good advice. If there are multiple interested parties, that also |
| 223 | increases the chances that a metadata extension would be well received by the |
| 224 | community as a whole. |
| 225 | |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 226 | Adding to this document |
| 227 | ======================= |
| 228 | |
| 229 | If you run across a case that you feel deserves to be covered here, please send |
| 230 | a patch to `llvm-commits |
Tanya Lattner | 0d28f80 | 2015-08-05 03:51:17 +0000 | [diff] [blame] | 231 | <http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review. |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 232 | |
Tanya Lattner | 0d28f80 | 2015-08-05 03:51:17 +0000 | [diff] [blame] | 233 | If you have questions on these items, please direct them to `llvm-dev |
| 234 | <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_. The more relevant |
Philip Reames | f8bf9dd | 2015-02-27 23:14:50 +0000 | [diff] [blame] | 235 | context you are able to give to your question, the more likely it is to be |
| 236 | answered. |
| 237 | |