Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 1 | Missing support |
| 2 | =============== |
| 3 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 4 | * The PNaCl LLVM backend expands shufflevector operations into sequences of |
| 5 | insertelement and extractelement operations. For instance: |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 6 | |
| 7 | define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { |
| 8 | entry: |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 9 | %res = shufflevector <4 x i32> %arg1, |
| 10 | <4 x i32> %arg2, |
| 11 | <4 x i32> <i32 4, i32 5, i32 0, i32 1> |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 12 | ret <4 x i32> %res |
| 13 | } |
| 14 | |
| 15 | gets expanded into: |
| 16 | |
| 17 | define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { |
| 18 | entry: |
| 19 | %0 = extractelement <4 x i32> %arg2, i32 0 |
| 20 | %1 = insertelement <4 x i32> undef, i32 %0, i32 0 |
| 21 | %2 = extractelement <4 x i32> %arg2, i32 1 |
| 22 | %3 = insertelement <4 x i32> %1, i32 %2, i32 1 |
| 23 | %4 = extractelement <4 x i32> %arg1, i32 0 |
| 24 | %5 = insertelement <4 x i32> %3, i32 %4, i32 2 |
| 25 | %6 = extractelement <4 x i32> %arg1, i32 1 |
| 26 | %7 = insertelement <4 x i32> %5, i32 %6, i32 3 |
| 27 | ret <4 x i32> %7 |
| 28 | } |
| 29 | |
| 30 | Subzero should recognize these sequences and recombine them into |
| 31 | shuffle operations where appropriate. |
| 32 | |
| 33 | * Add support for vector constants in the backend. The current code |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 34 | materializes the vector constants it needs (eg. for performing icmp on |
| 35 | unsigned operands) using register operations, but this should be changed to |
| 36 | loading them from a constant pool if the register initialization is too |
| 37 | complicated (such as in TargetX8632::makeVectorOfHighOrderBits()). |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 38 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 39 | * [x86 specific] llvm-mc does not allow lea to take a mem128 memory operand |
| 40 | when assembling x86-32 code. The current InstX8632Lea::emit() code uses |
| 41 | Variable::asType() to convert any mem128 Variables into a compatible memory |
| 42 | operand type. However, the emit code does not do any conversions of |
| 43 | OperandX8632Mem, so if an OperandX8632Mem is passed to lea as mem128 the |
| 44 | resulting code will not assemble. One way to fix this is by implementing |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 45 | OperandX8632Mem::asType(). |
| 46 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 47 | * [x86 specific] Lower shl with <4 x i32> using some clever float conversion: |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 48 | http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html |
| 49 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 50 | * [x86 specific] Add support for using aligned mov operations (movaps). This |
| 51 | will require passing alignment information to loads and stores. |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 52 | |
| 53 | x86 SIMD Diversification |
| 54 | ======================== |
| 55 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 56 | * Vector "bitwise" operations have several variant instructions: the AND |
| 57 | operation can be implemented with pand, andpd, or andps. This pattern also |
| 58 | holds for ANDN, OR, and XOR. |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 59 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 60 | * Vector "mov" instructions can be diversified (eg. movdqu instead of movups) |
| 61 | at the cost of a possible performance penalty. |
Matt Wala | 9dbe38e | 2014-08-15 15:02:13 -0700 | [diff] [blame] | 62 | |
Andrew Scull | 57e1268 | 2015-09-16 11:30:19 -0700 | [diff] [blame] | 63 | * Scalar FP arithmetic can be diversified by performing the operations with the |
| 64 | vector version of the instructions. |