Jim Stichnoth | 5bc2b1d | 2014-05-22 13:38:48 -0700 | [diff] [blame] | 1 | Target-specific lowering in ICE |
| 2 | =============================== |
| 3 | |
| 4 | This document discusses several issues around generating target-specific ICE |
| 5 | instructions from high-level ICE instructions. |
| 6 | |
| 7 | Meeting register address mode constraints |
| 8 | ----------------------------------------- |
| 9 | |
| 10 | Target-specific instructions often require specific operands to be in physical |
| 11 | registers. Sometimes one specific register is required, but usually any |
| 12 | register in a particular register class will suffice, and that register class is |
| 13 | defined by the instruction/operand type. |
| 14 | |
| 15 | The challenge is that ``Variable`` represents an operand that is either a stack |
| 16 | location in the current frame, or a physical register. Register allocation |
| 17 | happens after target-specific lowering, so during lowering we generally don't |
| 18 | know whether a ``Variable`` operand will meet a target instruction's physical |
| 19 | register requirement. |
| 20 | |
| 21 | To this end, ICE allows certain hints/directives: |
| 22 | |
| 23 | * ``Variable::setWeightInfinite()`` forces a ``Variable`` to get some |
| 24 | physical register (without specifying which particular one) from a |
| 25 | register class. |
| 26 | |
| 27 | * ``Variable::setRegNum()`` forces a ``Variable`` to be assigned a specific |
| 28 | physical register. |
| 29 | |
| 30 | * ``Variable::setPreferredRegister()`` registers a preference for a physical |
| 31 | register based on another ``Variable``'s physical register assignment. |
| 32 | |
| 33 | These hints/directives are described below in more detail. In most cases, |
| 34 | though, they don't need to be explicity used, as the routines that create |
| 35 | lowered instructions have reasonable defaults and simple options that control |
| 36 | these hints/directives. |
| 37 | |
| 38 | The recommended ICE lowering strategy is to generate extra assignment |
| 39 | instructions involving extra ``Variable`` temporaries, using the |
| 40 | hints/directives to force suitable register assignments for the temporaries, and |
| 41 | then let the global register allocator clean things up. |
| 42 | |
| 43 | Note: There is a spectrum of *implementation complexity* versus *translation |
| 44 | speed* versus *code quality*. This recommended strategy picks a point on the |
| 45 | spectrum representing very low complexity ("splat-isel"), pretty good code |
| 46 | quality in terms of frame size and register shuffling/spilling, but perhaps not |
| 47 | the fastest translation speed since extra instructions and operands are created |
| 48 | up front and cleaned up at the end. |
| 49 | |
| 50 | Ensuring some physical register |
| 51 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 52 | |
| 53 | The x86 instruction:: |
| 54 | |
| 55 | mov dst, src |
| 56 | |
| 57 | needs at least one of its operands in a physical register (ignoring the case |
| 58 | where ``src`` is a constant). This can be done as follows:: |
| 59 | |
| 60 | mov reg, src |
| 61 | mov dst, reg |
| 62 | |
| 63 | so long as ``reg`` is guaranteed to have a physical register assignment. The |
| 64 | low-level lowering code that accomplishes this looks something like:: |
| 65 | |
| 66 | Variable *Reg; |
| 67 | Reg = Func->makeVariable(Dst->getType()); |
| 68 | Reg->setWeightInfinite(); |
| 69 | NewInst = InstX8632Mov::create(Func, Reg, Src); |
| 70 | NewInst = InstX8632Mov::create(Func, Dst, Reg); |
| 71 | |
| 72 | ``Cfg::makeVariable()`` generates a new temporary, and |
| 73 | ``Variable::setWeightInfinite()`` gives it infinite weight for the purpose of |
| 74 | register allocation, thus guaranteeing it a physical register. |
| 75 | |
| 76 | The ``_mov(Dest, Src)`` method in the ``TargetX8632`` class is sufficiently |
| 77 | powerful to handle these details in most situations. Its ``Dest`` argument is |
| 78 | an in/out parameter. If its input value is ``NULL``, then a new temporary |
| 79 | variable is created, its type is set to the same type as the ``Src`` operand, it |
| 80 | is given infinite register weight, and the new ``Variable`` is returned through |
| 81 | the in/out parameter. (This is in addition to the new temporary being the dest |
| 82 | operand of the ``mov`` instruction.) The simpler version of the above example |
| 83 | is:: |
| 84 | |
| 85 | Variable *Reg = NULL; |
| 86 | _mov(Reg, Src); |
| 87 | _mov(Dst, Reg); |
| 88 | |
| 89 | Preferring another ``Variable``'s physical register |
| 90 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 91 | |
| 92 | One problem with this example is that the register allocator usually just |
| 93 | assigns the first available register to a live range. If this instruction ends |
| 94 | the live range of ``src``, this may lead to code like the following:: |
| 95 | |
| 96 | mov reg:eax, src:esi |
| 97 | mov dst:edi, reg:eax |
| 98 | |
| 99 | Since the first instruction happens to end the live range of ``src:esi``, it |
| 100 | would be better to assign ``esi`` to ``reg``:: |
| 101 | |
| 102 | mov reg:esi, src:esi |
| 103 | mov dst:edi, reg:esi |
| 104 | |
| 105 | The first instruction, ``mov esi, esi``, is a redundant assignment and will |
| 106 | ultimately be elided, leaving just ``mov edi, esi``. |
| 107 | |
| 108 | We can tell the register allocator to prefer the register assigned to a |
| 109 | different ``Variable``, using ``Variable::setPreferredRegister()``:: |
| 110 | |
| 111 | Variable *Reg; |
| 112 | Reg = Func->makeVariable(Dst->getType()); |
| 113 | Reg->setWeightInfinite(); |
| 114 | Reg->setPreferredRegister(Src); |
| 115 | NewInst = InstX8632Mov::create(Func, Reg, Src); |
| 116 | NewInst = InstX8632Mov::create(Func, Dst, Reg); |
| 117 | |
| 118 | Or more simply:: |
| 119 | |
| 120 | Variable *Reg = NULL; |
| 121 | _mov(Reg, Src); |
| 122 | _mov(Dst, Reg); |
| 123 | Reg->setPreferredRegister(llvm::dyn_cast<Variable>(Src)); |
| 124 | |
| 125 | The usefulness of ``setPreferredRegister()`` is tied into the implementation of |
| 126 | the register allocator. ICE uses linear-scan register allocation, which sorts |
| 127 | live ranges by starting point and assigns registers in that order. Using |
| 128 | ``B->setPreferredRegister(A)`` only helps when ``A`` has already been assigned a |
| 129 | register by the time ``B`` is being considered. For an assignment ``B=A``, this |
| 130 | is usually a safe assumption because ``B``'s live range begins at this |
| 131 | instruction but ``A``'s live range must have started earlier. (There may be |
| 132 | exceptions for variables that are no longer in SSA form.) But |
| 133 | ``A->setPreferredRegister(B)`` is unlikely to help unless ``B`` has been |
| 134 | precolored. In summary, generally the best practice is to use a pattern like:: |
| 135 | |
| 136 | NewInst = InstX8632Mov::create(Func, Dst, Src); |
| 137 | Dst->setPreferredRegister(Src); |
| 138 | //Src->setPreferredRegister(Dst); -- unlikely to have any effect |
| 139 | |
| 140 | Ensuring a specific physical register |
| 141 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 142 | |
| 143 | Some instructions require operands in specific physical registers, or produce |
| 144 | results in specific physical registers. For example, the 32-bit ``ret`` |
| 145 | instruction needs its operand in ``eax``. This can be done with |
| 146 | ``Variable::setRegNum()``:: |
| 147 | |
| 148 | Variable *Reg; |
| 149 | Reg = Func->makeVariable(Src->getType()); |
| 150 | Reg->setWeightInfinite(); |
| 151 | Reg->setRegNum(Reg_eax); |
| 152 | NewInst = InstX8632Mov::create(Func, Reg, Src); |
| 153 | NewInst = InstX8632Ret::create(Func, Reg); |
| 154 | |
| 155 | Precoloring with ``Variable::setRegNum()`` effectively gives it infinite weight |
| 156 | for register allocation, so the call to ``Variable::setWeightInfinite()`` is |
| 157 | technically unnecessary, but perhaps documents the intention a bit more |
| 158 | strongly. |
| 159 | |
| 160 | The ``_mov(Dest, Src, RegNum)`` method in the ``TargetX8632`` class has an |
| 161 | optional ``RegNum`` argument to force a specific register assignment when the |
| 162 | input ``Dest`` is ``NULL``. As described above, passing in ``Dest=NULL`` causes |
| 163 | a new temporary variable to be created with infinite register weight, and in |
| 164 | addition the specific register is chosen. The simpler version of the above |
| 165 | example is:: |
| 166 | |
| 167 | Variable *Reg = NULL; |
| 168 | _mov(Reg, Src, Reg_eax); |
| 169 | _ret(Reg); |
| 170 | |
| 171 | Disabling live-range interference |
| 172 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 173 | |
| 174 | Another problem with the "``mov reg,src; mov dst,reg``" example happens when |
| 175 | the instructions do *not* end the live range of ``src``. In this case, the live |
| 176 | ranges of ``reg`` and ``src`` interfere, so they can't get the same physical |
| 177 | register despite the explicit preference. However, ``reg`` is meant to be an |
| 178 | alias of ``src`` so they needn't be considered to interfere with each other. |
| 179 | This can be expressed via the second (bool) argument of |
| 180 | ``setPreferredRegister()``:: |
| 181 | |
| 182 | Variable *Reg; |
| 183 | Reg = Func->makeVariable(Dst->getType()); |
| 184 | Reg->setWeightInfinite(); |
| 185 | Reg->setPreferredRegister(Src, true); |
| 186 | NewInst = InstX8632Mov::create(Func, Reg, Src); |
| 187 | NewInst = InstX8632Mov::create(Func, Dst, Reg); |
| 188 | |
| 189 | This should be used with caution and probably only for these short-live-range |
| 190 | temporaries, otherwise the classic "lost copy" or "lost swap" problem may be |
| 191 | encountered. |
| 192 | |
| 193 | Instructions with register side effects |
| 194 | --------------------------------------- |
| 195 | |
| 196 | Some instructions produce unwanted results in other registers, or otherwise kill |
| 197 | preexisting values in other registers. For example, a ``call`` kills the |
| 198 | scratch registers. Also, the x86-32 ``idiv`` instruction produces the quotient |
| 199 | in ``eax`` and the remainder in ``edx``, but generally only one of those is |
| 200 | needed in the lowering. It's important that the register allocator doesn't |
| 201 | allocate that register to a live range that spans the instruction. |
| 202 | |
| 203 | ICE provides the ``InstFakeKill`` pseudo-instruction to mark such register |
| 204 | kills. For each of the instruction's source variables, a fake trivial live |
| 205 | range is created that begins and ends in that instruction. The ``InstFakeKill`` |
| 206 | instruction is inserted after the ``call`` instruction. For example:: |
| 207 | |
| 208 | CallInst = InstX8632Call::create(Func, ... ); |
| 209 | VarList KilledRegs; |
| 210 | KilledRegs.push_back(eax); |
| 211 | KilledRegs.push_back(ecx); |
| 212 | KilledRegs.push_back(edx); |
| 213 | NewInst = InstFakeKill::create(Func, KilledRegs, CallInst); |
| 214 | |
| 215 | The last argument to the ``InstFakeKill`` constructor links it to the previous |
| 216 | call instruction, such that if its linked instruction is dead-code eliminated, |
| 217 | the ``InstFakeKill`` instruction is eliminated as well. |
| 218 | |
| 219 | The killed register arguments need to be assigned a physical register via |
| 220 | ``Variable::setRegNum()`` for this to be effective. To avoid a massive |
| 221 | proliferation of ``Variable`` temporaries, the ``TargetLowering`` object caches |
| 222 | one precolored ``Variable`` for each physical register:: |
| 223 | |
| 224 | CallInst = InstX8632Call::create(Func, ... ); |
| 225 | VarList KilledRegs; |
| 226 | Variable *eax = Func->getTarget()->getPhysicalRegister(Reg_eax); |
| 227 | Variable *ecx = Func->getTarget()->getPhysicalRegister(Reg_ecx); |
| 228 | Variable *edx = Func->getTarget()->getPhysicalRegister(Reg_edx); |
| 229 | KilledRegs.push_back(eax); |
| 230 | KilledRegs.push_back(ecx); |
| 231 | KilledRegs.push_back(edx); |
| 232 | NewInst = InstFakeKill::create(Func, KilledRegs, CallInst); |
| 233 | |
| 234 | On first glance, it may seem unnecessary to explicitly kill the register that |
| 235 | returns the ``call`` return value. However, if for some reason the ``call`` |
| 236 | result ends up being unused, dead-code elimination could remove dead assignments |
| 237 | and incorrectly expose the return value register to a register allocation |
| 238 | assignment spanning the call, which would be incorrect. |
| 239 | |
| 240 | Instructions producing multiple values |
| 241 | -------------------------------------- |
| 242 | |
| 243 | ICE instructions allow at most one destination ``Variable``. Some machine |
| 244 | instructions produce more than one usable result. For example, the x86-32 |
| 245 | ``call`` ABI returns a 64-bit integer result in the ``edx:eax`` register pair. |
| 246 | Also, x86-32 has a version of the ``imul`` instruction that produces a 64-bit |
| 247 | result in the ``edx:eax`` register pair. |
| 248 | |
| 249 | To support multi-dest instructions, ICE provides the ``InstFakeDef`` |
| 250 | pseudo-instruction, whose destination can be precolored to the appropriate |
| 251 | physical register. For example, a ``call`` returning a 64-bit result in |
| 252 | ``edx:eax``:: |
| 253 | |
| 254 | CallInst = InstX8632Call::create(Func, RegLow, ... ); |
| 255 | ... |
| 256 | NewInst = InstFakeKill::create(Func, KilledRegs, CallInst); |
| 257 | Variable *RegHigh = Func->makeVariable(IceType_i32); |
| 258 | RegHigh->setRegNum(Reg_edx); |
| 259 | NewInst = InstFakeDef::create(Func, RegHigh); |
| 260 | |
| 261 | ``RegHigh`` is then assigned into the desired ``Variable``. If that assignment |
| 262 | ends up being dead-code eliminated, the ``InstFakeDef`` instruction may be |
| 263 | eliminated as well. |
| 264 | |
| 265 | Preventing dead-code elimination |
| 266 | -------------------------------- |
| 267 | |
| 268 | ICE instructions with a non-NULL ``Dest`` are subject to dead-code elimination. |
| 269 | However, some instructions must not be eliminated in order to preserve side |
| 270 | effects. This applies to most function calls, volatile loads, and loads and |
| 271 | integer divisions where the underlying language and runtime are relying on |
| 272 | hardware exception handling. |
| 273 | |
| 274 | ICE facilitates this with the ``InstFakeUse`` pseudo-instruction. This forces a |
| 275 | use of its source ``Variable`` to keep that variable's definition alive. Since |
| 276 | the ``InstFakeUse`` instruction has no ``Dest``, it will not be eliminated. |
| 277 | |
| 278 | Here is the full example of the x86-32 ``call`` returning a 32-bit integer |
| 279 | result:: |
| 280 | |
| 281 | Variable *Reg = Func->makeVariable(IceType_i32); |
| 282 | Reg->setRegNum(Reg_eax); |
| 283 | CallInst = InstX8632Call::create(Func, Reg, ... ); |
| 284 | VarList KilledRegs; |
| 285 | Variable *eax = Func->getTarget()->getPhysicalRegister(Reg_eax); |
| 286 | Variable *ecx = Func->getTarget()->getPhysicalRegister(Reg_ecx); |
| 287 | Variable *edx = Func->getTarget()->getPhysicalRegister(Reg_edx); |
| 288 | KilledRegs.push_back(eax); |
| 289 | KilledRegs.push_back(ecx); |
| 290 | KilledRegs.push_back(edx); |
| 291 | NewInst = InstFakeKill::create(Func, KilledRegs, CallInst); |
| 292 | NewInst = InstFakeUse::create(Func, Reg); |
| 293 | NewInst = InstX8632Mov::create(Func, Result, Reg); |
| 294 | |
| 295 | Without the ``InstFakeUse``, the entire call sequence could be dead-code |
| 296 | eliminated if its result were unused. |
| 297 | |
| 298 | One more note on this topic. These tools can be used to allow a multi-dest |
| 299 | instruction to be dead-code eliminated only when none of its results is live. |
| 300 | The key is to use the optional source parameter of the ``InstFakeDef`` |
| 301 | instruction. Using pseudocode:: |
| 302 | |
| 303 | t1:eax = call foo(arg1, ...) |
| 304 | InstFakeKill(eax, ecx, edx) |
| 305 | t2:edx = InstFakeDef(t1) |
| 306 | v_result_low = t1 |
| 307 | v_result_high = t2 |
| 308 | |
| 309 | If ``v_result_high`` is live but ``v_result_low`` is dead, adding ``t1`` as an |
| 310 | argument to ``InstFakeDef`` suffices to keep the ``call`` instruction live. |