| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 1 | //===---------------------------------------------------------------------===// |
| 2 | // Random notes about and ideas for the SystemZ backend. |
| 3 | //===---------------------------------------------------------------------===// |
| 4 | |
| 5 | The initial backend is deliberately restricted to z10. We should add support |
| 6 | for later architectures at some point. |
| 7 | |
| 8 | -- |
| 9 | |
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 10 | If an inline asm ties an i32 "r" result to an i64 input, the input |
| 11 | will be treated as an i32, leaving the upper bits uninitialised. |
| 12 | For example: |
| 13 | |
| 14 | define void @f4(i32 *%dst) { |
| 15 | %val = call i32 asm "blah $0", "=r,0" (i64 103) |
| 16 | store i32 %val, i32 *%dst |
| 17 | ret void |
| 18 | } |
| 19 | |
| 20 | from CodeGen/SystemZ/asm-09.ll will use LHI rather than LGHI. |
| 21 | to load 103. This seems to be a general target-independent problem. |
| 22 | |
| 23 | -- |
| 24 | |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 25 | The tuning of the choice between LOAD ADDRESS (LA) and addition in |
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 26 | SystemZISelDAGToDAG.cpp is suspect. It should be tweaked based on |
| 27 | performance measurements. |
| 28 | |
| 29 | -- |
| 30 | |
| 31 | There is no scheduling support. |
| 32 | |
| 33 | -- |
| 34 | |
| Richard Sandiford | 9140910 | 2013-08-09 09:25:57 +0000 | [diff] [blame] | 35 | We don't use the BRANCH ON INDEX instructions. |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 36 | |
| 37 | -- |
| 38 | |
| Richard Sandiford | 0e0498b | 2013-09-10 12:22:45 +0000 | [diff] [blame] | 39 | We only use MVC, XC and CLC for constant-length block operations. |
| 40 | We could extend them to variable-length operations too, |
| 41 | using EXECUTE RELATIVE LONG. |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 42 | |
| Richard Sandiford | 0e0498b | 2013-09-10 12:22:45 +0000 | [diff] [blame] | 43 | MVCIN, MVCLE and CLCLE may be worthwhile too. |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 44 | |
| 45 | -- |
| 46 | |
| Richard Sandiford | 2bf7b8c | 2013-08-20 09:40:35 +0000 | [diff] [blame] | 47 | We don't use CUSE or the TRANSLATE family of instructions for string |
| 48 | operations. The TRANSLATE ones are probably more difficult to exploit. |
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 49 | |
| 50 | -- |
| 51 | |
| 52 | We don't take full advantage of builtins like fabsl because the calling |
| 53 | conventions require f128s to be returned by invisible reference. |
| 54 | |
| 55 | -- |
| 56 | |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 57 | ADD LOGICAL WITH SIGNED IMMEDIATE could be useful when we need to |
| 58 | produce a carry. SUBTRACT LOGICAL IMMEDIATE could be useful when we |
| 59 | need to produce a borrow. (Note that there are no memory forms of |
| 60 | ADD LOGICAL WITH CARRY and SUBTRACT LOGICAL WITH BORROW, so the high |
| 61 | part of 128-bit memory operations would probably need to be done |
| 62 | via a register.) |
| 63 | |
| 64 | -- |
| 65 | |
| Ulrich Weigand | d3604dc | 2017-05-10 14:18:47 +0000 | [diff] [blame] | 66 | We don't use ICM, STCM, or CLM. |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 67 | |
| 68 | -- |
| 69 | |
| Ulrich Weigand | 9932f92 | 2017-06-30 12:56:29 +0000 | [diff] [blame] | 70 | We don't use ADD (LOGICAL) HIGH, SUBTRACT (LOGICAL) HIGH, |
| 71 | or COMPARE (LOGICAL) HIGH yet. |
| 72 | |
| 73 | -- |
| 74 | |
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 75 | DAGCombiner doesn't yet fold truncations of extended loads. Functions like: |
| 76 | |
| 77 | unsigned long f (unsigned long x, unsigned short *y) |
| 78 | { |
| 79 | return (x << 32) | *y; |
| 80 | } |
| 81 | |
| 82 | therefore end up as: |
| 83 | |
| 84 | sllg %r2, %r2, 32 |
| 85 | llgh %r0, 0(%r3) |
| 86 | lr %r2, %r0 |
| 87 | br %r14 |
| 88 | |
| 89 | but truncating the load would give: |
| 90 | |
| 91 | sllg %r2, %r2, 32 |
| 92 | lh %r2, 0(%r3) |
| 93 | br %r14 |
| 94 | |
| 95 | -- |
| 96 | |
| 97 | Functions like: |
| 98 | |
| 99 | define i64 @f1(i64 %a) { |
| 100 | %and = and i64 %a, 1 |
| 101 | ret i64 %and |
| 102 | } |
| 103 | |
| 104 | ought to be implemented as: |
| 105 | |
| 106 | lhi %r0, 1 |
| 107 | ngr %r2, %r0 |
| 108 | br %r14 |
| 109 | |
| Elliot Colp | bda4cb6 | 2016-06-30 14:42:47 +0000 | [diff] [blame] | 110 | but two-address optimizations reverse the order of the AND and force: |
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 111 | |
| 112 | lhi %r0, 1 |
| 113 | ngr %r0, %r2 |
| 114 | lgr %r2, %r0 |
| 115 | br %r14 |
| 116 | |
| 117 | CodeGen/SystemZ/and-04.ll has several examples of this. |
| 118 | |
| 119 | -- |
| 120 | |
| 121 | Out-of-range displacements are usually handled by loading the full |
| 122 | address into a register. In many cases it would be better to create |
| 123 | an anchor point instead. E.g. for: |
| 124 | |
| 125 | define void @f4a(i128 *%aptr, i64 %base) { |
| 126 | %addr = add i64 %base, 524288 |
| 127 | %bptr = inttoptr i64 %addr to i128 * |
| 128 | %a = load volatile i128 *%aptr |
| 129 | %b = load i128 *%bptr |
| 130 | %add = add i128 %a, %b |
| 131 | store i128 %add, i128 *%aptr |
| 132 | ret void |
| 133 | } |
| 134 | |
| 135 | (from CodeGen/SystemZ/int-add-08.ll) we load %base+524288 and %base+524296 |
| 136 | into separate registers, rather than using %base+524288 as a base for both. |
| 137 | |
| 138 | -- |
| 139 | |
| 140 | Dynamic stack allocations round the size to 8 bytes and then allocate |
| 141 | that rounded amount. It would be simpler to subtract the unrounded |
| 142 | size from the copy of the stack pointer and then align the result. |
| 143 | See CodeGen/SystemZ/alloca-01.ll for an example. |
| 144 | |
| 145 | -- |
| 146 | |
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 147 | If needed, we can support 16-byte atomics using LPQ, STPQ and CSDG. |
| 148 | |
| 149 | -- |
| 150 | |
| 151 | We might want to model all access registers and use them to spill |
| 152 | 32-bit values. |
| Ulrich Weigand | aa04768 | 2016-04-11 14:38:47 +0000 | [diff] [blame] | 153 | |
| 154 | -- |
| 155 | |
| Ulrich Weigand | aa04768 | 2016-04-11 14:38:47 +0000 | [diff] [blame] | 156 | We might want to use the 'overflow' condition of eg. AR to support |
| 157 | llvm.sadd.with.overflow.i32 and related instructions - the generated code |
| 158 | for signed overflow check is currently quite bad. This would improve |
| 159 | the results of using -ftrapv. |