| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 1 | //===---------------------------------------------------------------------===// | 
|  | 2 | // Random notes about and ideas for the SystemZ backend. | 
|  | 3 | //===---------------------------------------------------------------------===// | 
|  | 4 |  | 
|  | 5 | The initial backend is deliberately restricted to z10.  We should add support | 
|  | 6 | for later architectures at some point. | 
|  | 7 |  | 
|  | 8 | -- | 
|  | 9 |  | 
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 10 | If an inline asm ties an i32 "r" result to an i64 input, the input | 
|  | 11 | will be treated as an i32, leaving the upper bits uninitialised. | 
|  | 12 | For example: | 
|  | 13 |  | 
|  | 14 | define void @f4(i32 *%dst) { | 
|  | 15 | %val = call i32 asm "blah $0", "=r,0" (i64 103) | 
|  | 16 | store i32 %val, i32 *%dst | 
|  | 17 | ret void | 
|  | 18 | } | 
|  | 19 |  | 
|  | 20 | from CodeGen/SystemZ/asm-09.ll will use LHI rather than LGHI. | 
|  | 21 | to load 103.  This seems to be a general target-independent problem. | 
|  | 22 |  | 
|  | 23 | -- | 
|  | 24 |  | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 25 | The tuning of the choice between LOAD ADDRESS (LA) and addition in | 
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 26 | SystemZISelDAGToDAG.cpp is suspect.  It should be tweaked based on | 
|  | 27 | performance measurements. | 
|  | 28 |  | 
|  | 29 | -- | 
|  | 30 |  | 
|  | 31 | There is no scheduling support. | 
|  | 32 |  | 
|  | 33 | -- | 
|  | 34 |  | 
| Richard Sandiford | 9140910 | 2013-08-09 09:25:57 +0000 | [diff] [blame] | 35 | We don't use the BRANCH ON INDEX instructions. | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 36 |  | 
|  | 37 | -- | 
|  | 38 |  | 
| Richard Sandiford | 0e0498b | 2013-09-10 12:22:45 +0000 | [diff] [blame] | 39 | We only use MVC, XC and CLC for constant-length block operations. | 
|  | 40 | We could extend them to variable-length operations too, | 
|  | 41 | using EXECUTE RELATIVE LONG. | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 42 |  | 
| Richard Sandiford | 0e0498b | 2013-09-10 12:22:45 +0000 | [diff] [blame] | 43 | MVCIN, MVCLE and CLCLE may be worthwhile too. | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 44 |  | 
|  | 45 | -- | 
|  | 46 |  | 
| Richard Sandiford | 2bf7b8c | 2013-08-20 09:40:35 +0000 | [diff] [blame] | 47 | We don't use CUSE or the TRANSLATE family of instructions for string | 
|  | 48 | operations.  The TRANSLATE ones are probably more difficult to exploit. | 
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 49 |  | 
|  | 50 | -- | 
|  | 51 |  | 
|  | 52 | We don't take full advantage of builtins like fabsl because the calling | 
|  | 53 | conventions require f128s to be returned by invisible reference. | 
|  | 54 |  | 
|  | 55 | -- | 
|  | 56 |  | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 57 | ADD LOGICAL WITH SIGNED IMMEDIATE could be useful when we need to | 
|  | 58 | produce a carry.  SUBTRACT LOGICAL IMMEDIATE could be useful when we | 
|  | 59 | need to produce a borrow.  (Note that there are no memory forms of | 
|  | 60 | ADD LOGICAL WITH CARRY and SUBTRACT LOGICAL WITH BORROW, so the high | 
|  | 61 | part of 128-bit memory operations would probably need to be done | 
|  | 62 | via a register.) | 
|  | 63 |  | 
|  | 64 | -- | 
|  | 65 |  | 
| Ulrich Weigand | d3604dc | 2017-05-10 14:18:47 +0000 | [diff] [blame] | 66 | We don't use ICM, STCM, or CLM. | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 67 |  | 
|  | 68 | -- | 
|  | 69 |  | 
| Ulrich Weigand | 9932f92 | 2017-06-30 12:56:29 +0000 | [diff] [blame^] | 70 | We don't use ADD (LOGICAL) HIGH, SUBTRACT (LOGICAL) HIGH, | 
|  | 71 | or COMPARE (LOGICAL) HIGH yet. | 
|  | 72 |  | 
|  | 73 | -- | 
|  | 74 |  | 
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 75 | DAGCombiner doesn't yet fold truncations of extended loads.  Functions like: | 
|  | 76 |  | 
|  | 77 | unsigned long f (unsigned long x, unsigned short *y) | 
|  | 78 | { | 
|  | 79 | return (x << 32) | *y; | 
|  | 80 | } | 
|  | 81 |  | 
|  | 82 | therefore end up as: | 
|  | 83 |  | 
|  | 84 | sllg    %r2, %r2, 32 | 
|  | 85 | llgh    %r0, 0(%r3) | 
|  | 86 | lr      %r2, %r0 | 
|  | 87 | br      %r14 | 
|  | 88 |  | 
|  | 89 | but truncating the load would give: | 
|  | 90 |  | 
|  | 91 | sllg    %r2, %r2, 32 | 
|  | 92 | lh      %r2, 0(%r3) | 
|  | 93 | br      %r14 | 
|  | 94 |  | 
|  | 95 | -- | 
|  | 96 |  | 
|  | 97 | Functions like: | 
|  | 98 |  | 
|  | 99 | define i64 @f1(i64 %a) { | 
|  | 100 | %and = and i64 %a, 1 | 
|  | 101 | ret i64 %and | 
|  | 102 | } | 
|  | 103 |  | 
|  | 104 | ought to be implemented as: | 
|  | 105 |  | 
|  | 106 | lhi     %r0, 1 | 
|  | 107 | ngr     %r2, %r0 | 
|  | 108 | br      %r14 | 
|  | 109 |  | 
| Elliot Colp | bda4cb6 | 2016-06-30 14:42:47 +0000 | [diff] [blame] | 110 | but two-address optimizations reverse the order of the AND and force: | 
| Ulrich Weigand | 5f613df | 2013-05-06 16:15:19 +0000 | [diff] [blame] | 111 |  | 
|  | 112 | lhi     %r0, 1 | 
|  | 113 | ngr     %r0, %r2 | 
|  | 114 | lgr     %r2, %r0 | 
|  | 115 | br      %r14 | 
|  | 116 |  | 
|  | 117 | CodeGen/SystemZ/and-04.ll has several examples of this. | 
|  | 118 |  | 
|  | 119 | -- | 
|  | 120 |  | 
|  | 121 | Out-of-range displacements are usually handled by loading the full | 
|  | 122 | address into a register.  In many cases it would be better to create | 
|  | 123 | an anchor point instead.  E.g. for: | 
|  | 124 |  | 
|  | 125 | define void @f4a(i128 *%aptr, i64 %base) { | 
|  | 126 | %addr = add i64 %base, 524288 | 
|  | 127 | %bptr = inttoptr i64 %addr to i128 * | 
|  | 128 | %a = load volatile i128 *%aptr | 
|  | 129 | %b = load i128 *%bptr | 
|  | 130 | %add = add i128 %a, %b | 
|  | 131 | store i128 %add, i128 *%aptr | 
|  | 132 | ret void | 
|  | 133 | } | 
|  | 134 |  | 
|  | 135 | (from CodeGen/SystemZ/int-add-08.ll) we load %base+524288 and %base+524296 | 
|  | 136 | into separate registers, rather than using %base+524288 as a base for both. | 
|  | 137 |  | 
|  | 138 | -- | 
|  | 139 |  | 
|  | 140 | Dynamic stack allocations round the size to 8 bytes and then allocate | 
|  | 141 | that rounded amount.  It would be simpler to subtract the unrounded | 
|  | 142 | size from the copy of the stack pointer and then align the result. | 
|  | 143 | See CodeGen/SystemZ/alloca-01.ll for an example. | 
|  | 144 |  | 
|  | 145 | -- | 
|  | 146 |  | 
| Richard Sandiford | 619859f | 2013-05-15 12:53:31 +0000 | [diff] [blame] | 147 | If needed, we can support 16-byte atomics using LPQ, STPQ and CSDG. | 
|  | 148 |  | 
|  | 149 | -- | 
|  | 150 |  | 
|  | 151 | We might want to model all access registers and use them to spill | 
|  | 152 | 32-bit values. | 
| Ulrich Weigand | aa04768 | 2016-04-11 14:38:47 +0000 | [diff] [blame] | 153 |  | 
|  | 154 | -- | 
|  | 155 |  | 
| Ulrich Weigand | aa04768 | 2016-04-11 14:38:47 +0000 | [diff] [blame] | 156 | We might want to use the 'overflow' condition of eg. AR to support | 
|  | 157 | llvm.sadd.with.overflow.i32 and related instructions - the generated code | 
|  | 158 | for signed overflow check is currently quite bad.  This would improve | 
|  | 159 | the results of using -ftrapv. |