Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 1 | ============================== |
| 2 | User Guide for AMDGPU Back-end |
| 3 | ============================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with |
| 9 | the R600 family up until the current Volcanic Islands (GCN Gen 3). |
| 10 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 11 | Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_ |
| 12 | for additional documentation. |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 13 | |
Tom Stellard | 3ec09e6 | 2016-04-06 01:29:19 +0000 | [diff] [blame] | 14 | Conventions |
| 15 | =========== |
| 16 | |
| 17 | Address Spaces |
| 18 | -------------- |
| 19 | |
| 20 | The AMDGPU back-end uses the following address space mapping: |
| 21 | |
| 22 | ============= ============================================ |
| 23 | Address Space Memory Space |
| 24 | ============= ============================================ |
| 25 | 0 Private |
| 26 | 1 Global |
| 27 | 2 Constant |
| 28 | 3 Local |
| 29 | 4 Generic (Flat) |
| 30 | 5 Region |
| 31 | ============= ============================================ |
| 32 | |
| 33 | The terminology in the table, aside from the region memory space, is from the |
| 34 | OpenCL standard. |
| 35 | |
| 36 | |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 37 | Assembler |
| 38 | ========= |
| 39 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 40 | AMDGPU backend has LLVM-MC based assembler which is currently in development. |
| 41 | It supports Southern Islands ISA, Sea Islands and Volcanic Islands. |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 42 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 43 | This document describes general syntax for instructions and operands. For more |
| 44 | information about instructions, their semantics and supported combinations |
| 45 | of operands, refer to one of Instruction Set Architecture manuals. |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 46 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 47 | An instruction has the following syntax (register operands are |
| 48 | normally comma-separated while extra operands are space-separated): |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 49 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 50 | *<opcode> <register_operand0>, ... <extra_operand0> ...* |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 51 | |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 52 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 53 | Operands |
| 54 | -------- |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 55 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 56 | The following syntax for register operands is supported: |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 57 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 58 | * SGPR registers: s0, ... or s[0], ... |
| 59 | * VGPR registers: v0, ... or v[0], ... |
| 60 | * TTMP registers: ttmp0, ... or ttmp[0], ... |
| 61 | * Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi) |
| 62 | * Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi) |
| 63 | * Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ... |
| 64 | * Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3] |
| 65 | * Register index expressions: v[2*2], s[1-1:2-1] |
| 66 | * 'off' indicates that an operand is not enabled |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 67 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 68 | The following extra operands are supported: |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 69 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 70 | * offset, offset0, offset1 |
| 71 | * idxen, offen bits |
| 72 | * glc, slc, tfe bits |
| 73 | * waitcnt: integer or combination of counter values |
| 74 | * VOP3 modifiers: |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 75 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 76 | - abs (\| \|), neg (\-) |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 77 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 78 | * DPP modifiers: |
| 79 | |
| 80 | - row_shl, row_shr, row_ror, row_rol |
| 81 | - row_mirror, row_half_mirror, row_bcast |
| 82 | - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm |
| 83 | - row_mask, bank_mask, bound_ctrl |
| 84 | |
| 85 | * SDWA modifiers: |
| 86 | |
| 87 | - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD) |
| 88 | - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE) |
| 89 | - abs, neg, sext |
| 90 | |
| 91 | DS Instructions Examples |
| 92 | ------------------------ |
| 93 | |
| 94 | .. code-block:: nasm |
| 95 | |
| 96 | ds_add_u32 v2, v4 offset:16 |
| 97 | ds_write_src2_b64 v2 offset0:4 offset1:8 |
| 98 | ds_cmpst_f32 v2, v4, v6 |
| 99 | ds_min_rtn_f64 v[8:9], v2, v[4:5] |
| 100 | |
| 101 | |
| 102 | For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual. |
| 103 | |
| 104 | FLAT Instruction Examples |
| 105 | -------------------------- |
| 106 | |
| 107 | .. code-block:: nasm |
| 108 | |
| 109 | flat_load_dword v1, v[3:4] |
| 110 | flat_store_dwordx3 v[3:4], v[5:7] |
| 111 | flat_atomic_swap v1, v[3:4], v5 glc |
| 112 | flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc |
| 113 | flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc |
| 114 | |
| 115 | For full list of supported instructions, refer to "FLAT instructions" in ISA Manual. |
| 116 | |
| 117 | MUBUF Instruction Examples |
| 118 | --------------------------- |
| 119 | |
| 120 | .. code-block:: nasm |
| 121 | |
| 122 | buffer_load_dword v1, off, s[4:7], s1 |
| 123 | buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe |
| 124 | buffer_store_format_xy v[1:2], off, s[4:7], s1 |
| 125 | buffer_wbinvl1 |
| 126 | buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc |
| 127 | |
| 128 | For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual. |
| 129 | |
| 130 | SMRD/SMEM Instruction Examples |
| 131 | ------------------------------- |
| 132 | |
| 133 | .. code-block:: nasm |
| 134 | |
| 135 | s_load_dword s1, s[2:3], 0xfc |
| 136 | s_load_dwordx8 s[8:15], s[2:3], s4 |
| 137 | s_load_dwordx16 s[88:103], s[2:3], s4 |
| 138 | s_dcache_inv_vol |
| 139 | s_memtime s[4:5] |
| 140 | |
| 141 | For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual. |
| 142 | |
| 143 | SOP1 Instruction Examples |
| 144 | -------------------------- |
| 145 | |
| 146 | .. code-block:: nasm |
| 147 | |
| 148 | s_mov_b32 s1, s2 |
| 149 | s_mov_b64 s[0:1], 0x80000000 |
| 150 | s_cmov_b32 s1, 200 |
| 151 | s_wqm_b64 s[2:3], s[4:5] |
| 152 | s_bcnt0_i32_b64 s1, s[2:3] |
| 153 | s_swappc_b64 s[2:3], s[4:5] |
| 154 | s_cbranch_join s[4:5] |
| 155 | |
| 156 | For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual. |
| 157 | |
| 158 | SOP2 Instruction Examples |
| 159 | ------------------------- |
| 160 | |
| 161 | .. code-block:: nasm |
| 162 | |
| 163 | s_add_u32 s1, s2, s3 |
| 164 | s_and_b64 s[2:3], s[4:5], s[6:7] |
| 165 | s_cselect_b32 s1, s2, s3 |
| 166 | s_andn2_b32 s2, s4, s6 |
| 167 | s_lshr_b64 s[2:3], s[4:5], s6 |
| 168 | s_ashr_i32 s2, s4, s6 |
| 169 | s_bfm_b64 s[2:3], s4, s6 |
| 170 | s_bfe_i64 s[2:3], s[4:5], s6 |
| 171 | s_cbranch_g_fork s[4:5], s[6:7] |
| 172 | |
| 173 | For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual. |
| 174 | |
| 175 | SOPC Instruction Examples |
| 176 | -------------------------- |
| 177 | |
| 178 | .. code-block:: nasm |
| 179 | |
| 180 | s_cmp_eq_i32 s1, s2 |
| 181 | s_bitcmp1_b32 s1, s2 |
| 182 | s_bitcmp0_b64 s[2:3], s4 |
| 183 | s_setvskip s3, s5 |
| 184 | |
| 185 | For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual. |
| 186 | |
| 187 | SOPP Instruction Examples |
| 188 | -------------------------- |
| 189 | |
| 190 | .. code-block:: nasm |
| 191 | |
| 192 | s_barrier |
| 193 | s_nop 2 |
| 194 | s_endpgm |
| 195 | s_waitcnt 0 ; Wait for all counters to be 0 |
| 196 | s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above |
| 197 | s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1. |
| 198 | s_sethalt 9 |
| 199 | s_sleep 10 |
| 200 | s_sendmsg 0x1 |
| 201 | s_sendmsg sendmsg(MSG_INTERRUPT) |
| 202 | s_trap 1 |
| 203 | |
| 204 | For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual. |
| 205 | |
| 206 | Unless otherwise mentioned, little verification is performed on the operands |
| 207 | of SOPP Instrucitons, so it is up to the programmer to be familiar with the |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 208 | range or acceptable values. |
| 209 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 210 | Vector ALU Instruction Examples |
| 211 | ------------------------------- |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 212 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 213 | For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA), |
| 214 | the assembler will automatically use optimal encoding based on its operands. |
| 215 | To force specific encoding, one can add a suffix to the opcode of the instruction: |
| 216 | |
| 217 | * _e32 for 32-bit VOP1/VOP2/VOPC |
| 218 | * _e64 for 64-bit VOP3 |
| 219 | * _dpp for VOP_DPP |
| 220 | * _sdwa for VOP_SDWA |
| 221 | |
| 222 | VOP1/VOP2/VOP3/VOPC examples: |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 223 | |
| 224 | .. code-block:: nasm |
| 225 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 226 | v_mov_b32 v1, v2 |
| 227 | v_mov_b32_e32 v1, v2 |
| 228 | v_nop |
| 229 | v_cvt_f64_i32_e32 v[1:2], v2 |
| 230 | v_floor_f32_e32 v1, v2 |
| 231 | v_bfrev_b32_e32 v1, v2 |
| 232 | v_add_f32_e32 v1, v2, v3 |
| 233 | v_mul_i32_i24_e64 v1, v2, 3 |
| 234 | v_mul_i32_i24_e32 v1, -3, v3 |
| 235 | v_mul_i32_i24_e32 v1, -100, v3 |
| 236 | v_addc_u32 v1, s[0:1], v2, v3, s[2:3] |
| 237 | v_max_f16_e32 v1, v2, v3 |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 238 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 239 | VOP_DPP examples: |
Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 240 | |
| 241 | .. code-block:: nasm |
| 242 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 243 | v_mov_b32 v0, v0 quad_perm:[0,2,1,1] |
| 244 | v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 |
| 245 | v_mov_b32 v0, v0 wave_shl:1 |
| 246 | v_mov_b32 v0, v0 row_mirror |
| 247 | v_mov_b32 v0, v0 row_bcast:31 |
| 248 | v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0 |
| 249 | v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 |
| 250 | v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 |
Tom Stellard | 347ac79 | 2015-06-26 21:15:07 +0000 | [diff] [blame] | 251 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 252 | VOP_SDWA examples: |
| 253 | |
| 254 | .. code-block:: nasm |
| 255 | |
| 256 | v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD |
| 257 | v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD |
| 258 | v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1 |
| 259 | v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 |
| 260 | v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0 |
| 261 | |
| 262 | For full list of supported instructions, refer to "Vector ALU instructions". |
| 263 | |
| 264 | HSA Code Object Directives |
| 265 | -------------------------- |
| 266 | |
| 267 | AMDGPU ABI defines auxiliary data in output code object. In assembly source, |
| 268 | one can specify them with assembler directives. |
Tom Stellard | 347ac79 | 2015-06-26 21:15:07 +0000 | [diff] [blame] | 269 | |
| 270 | .hsa_code_object_version major, minor |
| 271 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 272 | |
| 273 | *major* and *minor* are integers that specify the version of the HSA code |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 274 | object that will be generated by the assembler. |
Tom Stellard | 347ac79 | 2015-06-26 21:15:07 +0000 | [diff] [blame] | 275 | |
| 276 | .hsa_code_object_isa [major, minor, stepping, vendor, arch] |
| 277 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 278 | |
| 279 | *major*, *minor*, and *stepping* are all integers that describe the instruction |
| 280 | set architecture (ISA) version of the assembly program. |
| 281 | |
| 282 | *vendor* and *arch* are quoted strings. *vendor* should always be equal to |
| 283 | "AMD" and *arch* should always be equal to "AMDGPU". |
| 284 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 285 | By default, the assembler will derive the ISA version, *vendor*, and *arch* |
| 286 | from the value of the -mcpu option that is passed to the assembler. |
Tom Stellard | 347ac79 | 2015-06-26 21:15:07 +0000 | [diff] [blame] | 287 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 288 | .amdgpu_hsa_kernel (name) |
| 289 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 290 | |
| 291 | This directives specifies that the symbol with given name is a kernel entry point |
| 292 | (label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL. |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 293 | |
| 294 | .amd_kernel_code_t |
| 295 | ^^^^^^^^^^^^^^^^^^ |
| 296 | |
| 297 | This directive marks the beginning of a list of key / value pairs that are used |
| 298 | to specify the amd_kernel_code_t object that will be emitted by the assembler. |
| 299 | The list must be terminated by the *.end_amd_kernel_code_t* directive. For |
| 300 | any amd_kernel_code_t values that are unspecified a default value will be |
| 301 | used. The default value for all keys is 0, with the following exceptions: |
| 302 | |
| 303 | - *kernel_code_version_major* defaults to 1. |
| 304 | - *machine_kind* defaults to 1. |
| 305 | - *machine_version_major*, *machine_version_minor*, and |
| 306 | *machine_version_stepping* are derived from the value of the -mcpu option |
| 307 | that is passed to the assembler. |
| 308 | - *kernel_code_entry_byte_offset* defaults to 256. |
| 309 | - *wavefront_size* defaults to 6. |
| 310 | - *kernarg_segment_alignment*, *group_segment_alignment*, and |
| 311 | *private_segment_alignment* default to 4. Note that alignments are specified |
| 312 | as a power of two, so a value of **n** means an alignment of 2^ **n**. |
| 313 | |
| 314 | The *.amd_kernel_code_t* directive must be placed immediately after the |
| 315 | function label and before any instructions. |
| 316 | |
Nikolay Haustov | 96a56bd | 2016-09-20 09:04:51 +0000 | [diff] [blame] | 317 | For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document, |
| 318 | comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s. |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 319 | |
| 320 | Here is an example of a minimal amd_kernel_code_t specification: |
| 321 | |
Aaron Ballman | 887ad0e | 2016-07-19 17:46:55 +0000 | [diff] [blame] | 322 | .. code-block:: none |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 323 | |
| 324 | .hsa_code_object_version 1,0 |
| 325 | .hsa_code_object_isa |
| 326 | |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame] | 327 | .hsatext |
| 328 | .globl hello_world |
| 329 | .p2align 8 |
| 330 | .amdgpu_hsa_kernel hello_world |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 331 | |
| 332 | hello_world: |
| 333 | |
| 334 | .amd_kernel_code_t |
| 335 | enable_sgpr_kernarg_segment_ptr = 1 |
| 336 | is_ptr64 = 1 |
| 337 | compute_pgm_rsrc1_vgprs = 0 |
| 338 | compute_pgm_rsrc1_sgprs = 0 |
| 339 | compute_pgm_rsrc2_user_sgpr = 2 |
| 340 | kernarg_segment_byte_size = 8 |
| 341 | wavefront_sgpr_count = 2 |
| 342 | workitem_vgpr_count = 3 |
| 343 | .end_amd_kernel_code_t |
| 344 | |
| 345 | s_load_dwordx2 s[0:1], s[0:1] 0x0 |
| 346 | v_mov_b32 v0, 3.14159 |
| 347 | s_waitcnt lgkmcnt(0) |
| 348 | v_mov_b32 v1, s0 |
| 349 | v_mov_b32 v2, s1 |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame] | 350 | flat_store_dword v[1:2], v0 |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 351 | s_endpgm |
Sylvestre Ledru | a7de982 | 2016-02-23 11:17:27 +0000 | [diff] [blame] | 352 | .Lfunc_end0: |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame] | 353 | .size hello_world, .Lfunc_end0-hello_world |