| Daniel Sanders | 928920a | 2013-09-27 10:42:22 +0000 | [diff] [blame] | 1 | Code Generation Notes for MSA |
| 2 | ============================= |
| 3 | |
| 4 | Intrinsics are lowered to SelectionDAG nodes where possible in order to enable |
| 5 | optimisation, reduce the size of the ISel matcher, and reduce repetition in |
| 6 | the implementation. In a small number of cases, this can cause different |
| 7 | (semantically equivalent) instructions to be used in place of the requested |
| 8 | instruction, even when no optimisation has taken place. |
| 9 | |
| 10 | Instructions |
| 11 | ============ |
| 12 | |
| 13 | This section describes any quirks of instruction selection for MSA. For |
| 14 | example, two instructions might be equally valid for some given IR and one is |
| 15 | chosen in preference to the other. |
| 16 | |
| Daniel Sanders | 3f6eb54 | 2013-11-12 10:45:18 +0000 | [diff] [blame] | 17 | bclri.b: |
| 18 | It is not possible to emit bclri.b since andi.b covers exactly the |
| 19 | same cases. andi.b should use fractionally less power than bclri.b in |
| 20 | most hardware implementations so it is used in preference to bclri.b. |
| 21 | |
| Daniel Sanders | 928920a | 2013-09-27 10:42:22 +0000 | [diff] [blame] | 22 | vshf.w: |
| 23 | It is not possible to emit vshf.w when the shuffle description is |
| 24 | constant since shf.w covers exactly the same cases. shf.w is used |
| 25 | instead. It is also impossible for the shuffle description to be |
| 26 | unknown at compile-time due to the definition of shufflevector in |
| 27 | LLVM IR. |
| 28 | |
| Daniel Sanders | e7ef0c8 | 2013-10-30 13:07:44 +0000 | [diff] [blame] | 29 | vshf.[bhwd] |
| 30 | When the shuffle description describes a splat operation, splat.[bhwd] |
| 31 | instructions will be selected instead of vshf.[bhwd]. Unlike the ilv*, |
| 32 | and pck* instructions, this is matched from MipsISD::VSHF instead of |
| 33 | a special-case MipsISD node. |
| 34 | |
| Daniel Sanders | 928920a | 2013-09-27 10:42:22 +0000 | [diff] [blame] | 35 | ilvl.d, pckev.d: |
| 36 | It is not possible to emit ilvl.d, or pckev.d since ilvev.d covers the |
| 37 | same shuffle. ilvev.d will be emitted instead. |
| 38 | |
| 39 | ilvr.d, ilvod.d, pckod.d: |
| 40 | It is not possible to emit ilvr.d, or pckod.d since ilvod.d covers the |
| 41 | same shuffle. ilvod.d will be emitted instead. |
| Daniel Sanders | 7e51fe1 | 2013-09-27 11:48:57 +0000 | [diff] [blame] | 42 | |
| Daniel Sanders | e7ef0c8 | 2013-10-30 13:07:44 +0000 | [diff] [blame] | 43 | splat.[bhwd] |
| 44 | The intrinsic will work as expected. However, unlike other intrinsics |
| 45 | it lowers directly to MipsISD::VSHF instead of using common IR. |
| 46 | |
| Daniel Sanders | 7e51fe1 | 2013-09-27 11:48:57 +0000 | [diff] [blame] | 47 | splati.w: |
| 48 | It is not possible to emit splati.w since shf.w covers the same cases. |
| 49 | shf.w will be emitted instead. |
| Daniel Sanders | 7f3d946 | 2013-09-27 13:04:21 +0000 | [diff] [blame] | 50 | |
| Daniel Sanders | d74b130 | 2013-10-30 14:45:14 +0000 | [diff] [blame] | 51 | copy_s.w: |
| Daniel Sanders | 7f3d946 | 2013-09-27 13:04:21 +0000 | [diff] [blame] | 52 | On MIPS32, the copy_u.d intrinsic will emit this instruction instead of |
| 53 | copy_u.w. This is semantically equivalent since the general-purpose |
| 54 | register file is 32-bits wide. |
| Daniel Sanders | d74b130 | 2013-10-30 14:45:14 +0000 | [diff] [blame] | 55 | |
| 56 | binsri.[bhwd], binsli.[bhwd]: |
| 57 | These two operations are equivalent to each other with the operands |
| 58 | swapped and condition inverted. The compiler may use either one as |
| 59 | appropriate. |
| 60 | Furthermore, the compiler may use bsel.[bhwd] for some masks that do |
| 61 | not survive the legalization process (this is a bug and will be fixed). |
| Daniel Sanders | ab94b53 | 2013-10-30 15:20:38 +0000 | [diff] [blame] | 62 | |
| 63 | bmnz.v, bmz.v, bsel.v: |
| 64 | These three operations differ only in the operand that is tied to the |
| Daniel Sanders | df221545 | 2014-03-12 11:54:00 +0000 | [diff] [blame] | 65 | result and the order of the operands. |
| Daniel Sanders | ab94b53 | 2013-10-30 15:20:38 +0000 | [diff] [blame] | 66 | It is (currently) not possible to emit bmz.v, or bsel.v since bmnz.v is |
| 67 | the same operation and will be emitted instead. |
| 68 | In future, the compiler may choose between these three instructions |
| 69 | according to register allocation. |
| Daniel Sanders | df221545 | 2014-03-12 11:54:00 +0000 | [diff] [blame] | 70 | These three operations can be very confusing so here is a mapping |
| 71 | between the instructions and the vselect node in one place: |
| 72 | bmz.v wd, ws, wt/i8 -> (vselect wt/i8, wd, ws) |
| 73 | bmnz.v wd, ws, wt/i8 -> (vselect wt/i8, ws, wd) |
| 74 | bsel.v wd, ws, wt/i8 -> (vselect wd, wt/i8, ws) |
| Daniel Sanders | ab94b53 | 2013-10-30 15:20:38 +0000 | [diff] [blame] | 75 | |
| 76 | bmnzi.b, bmzi.b: |
| 77 | Like their non-immediate counterparts, bmnzi.v and bmzi.v are the same |
| 78 | operation with the operands swapped. bmnzi.v will (currently) be emitted |
| 79 | for both cases. |
| 80 | |
| 81 | bseli.v: |
| 82 | Unlike the non-immediate versions, bseli.v is distinguishable from |
| 83 | bmnzi.b and bmzi.b and can be emitted. |