Tom Stellard | 45bb48e | 2015-06-13 03:28:10 +0000 | [diff] [blame] | 1 | ============================== |
| 2 | User Guide for AMDGPU Back-end |
| 3 | ============================== |
| 4 | |
| 5 | Introduction |
| 6 | ============ |
| 7 | |
| 8 | The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with |
| 9 | the R600 family up until the current Volcanic Islands (GCN Gen 3). |
| 10 | |
| 11 | |
| 12 | Assembler |
| 13 | ========= |
| 14 | |
| 15 | The assembler is currently considered experimental. |
| 16 | |
| 17 | For syntax examples look in test/MC/AMDGPU. |
| 18 | |
| 19 | Below some of the currently supported features (modulo bugs). These |
| 20 | all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands |
| 21 | are also supported but may be missing some instructions and have more bugs: |
| 22 | |
| 23 | DS Instructions |
| 24 | --------------- |
| 25 | All DS instructions are supported. |
| 26 | |
| 27 | FLAT Instructions |
| 28 | ------------------ |
| 29 | These instructions are only present in the Sea Islands and Volcanic Islands |
| 30 | instruction set. All FLAT instructions are supported for these architectures |
| 31 | |
| 32 | MUBUF Instructions |
| 33 | ------------------ |
| 34 | All non-atomic MUBUF instructions are supported. |
| 35 | |
| 36 | SMRD Instructions |
| 37 | ----------------- |
| 38 | Only the s_load_dword* SMRD instructions are supported. |
| 39 | |
| 40 | SOP1 Instructions |
| 41 | ----------------- |
| 42 | All SOP1 instructions are supported. |
| 43 | |
| 44 | SOP2 Instructions |
| 45 | ----------------- |
| 46 | All SOP2 instructions are supported. |
| 47 | |
| 48 | SOPC Instructions |
| 49 | ----------------- |
| 50 | All SOPC instructions are supported. |
| 51 | |
| 52 | SOPP Instructions |
| 53 | ----------------- |
| 54 | |
| 55 | Unless otherwise mentioned, all SOPP instructions that have one or more |
| 56 | operands accept integer operands only. No verification is performed |
| 57 | on the operands, so it is up to the programmer to be familiar with the |
| 58 | range or acceptable values. |
| 59 | |
| 60 | s_waitcnt |
| 61 | ^^^^^^^^^ |
| 62 | |
| 63 | s_waitcnt accepts named arguments to specify which memory counter(s) to |
| 64 | wait for. |
| 65 | |
| 66 | .. code-block:: nasm |
| 67 | |
| 68 | // Wait for all counters to be 0 |
| 69 | s_waitcnt 0 |
| 70 | |
| 71 | // Equivalent to s_waitcnt 0. Counter names can also be delimited by |
| 72 | // '&' or ','. |
| 73 | s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) |
| 74 | |
| 75 | // Wait for vmcnt counter to be 1. |
| 76 | s_waitcnt vmcnt(1) |
| 77 | |
| 78 | VOP1, VOP2, VOP3, VOPC Instructions |
| 79 | ----------------------------------- |
| 80 | |
| 81 | All 32-bit and 64-bit encodings should work. |
| 82 | |
| 83 | The assembler will automatically detect which encoding size to use for |
| 84 | VOP1, VOP2, and VOPC instructions based on the operands. If you want to force |
| 85 | a specific encoding size, you can add an _e32 (for 32-bit encoding) or |
| 86 | _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all |
| 87 | instructions support an explicit suffix. These are all valid assembly |
| 88 | strings: |
| 89 | |
| 90 | .. code-block:: nasm |
| 91 | |
| 92 | v_mul_i32_i24 v1, v2, v3 |
| 93 | v_mul_i32_i24_e32 v1, v2, v3 |
| 94 | v_mul_i32_i24_e64 v1, v2, v3 |
Tom Stellard | 347ac79 | 2015-06-26 21:15:07 +0000 | [diff] [blame] | 95 | |
| 96 | Assembler Directives |
| 97 | -------------------- |
| 98 | |
| 99 | .hsa_code_object_version major, minor |
| 100 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 101 | |
| 102 | *major* and *minor* are integers that specify the version of the HSA code |
| 103 | object that will be generated by the assembler. This value will be stored |
| 104 | in an entry of the .note section. |
| 105 | |
| 106 | .hsa_code_object_isa [major, minor, stepping, vendor, arch] |
| 107 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 108 | |
| 109 | *major*, *minor*, and *stepping* are all integers that describe the instruction |
| 110 | set architecture (ISA) version of the assembly program. |
| 111 | |
| 112 | *vendor* and *arch* are quoted strings. *vendor* should always be equal to |
| 113 | "AMD" and *arch* should always be equal to "AMDGPU". |
| 114 | |
| 115 | If no arguments are specified, then the assembler will derive the ISA version, |
| 116 | *vendor*, and *arch* from the value of the -mcpu option that is passed to the |
| 117 | assembler. |
| 118 | |
| 119 | ISA version, *vendor*, and *arch* will all be stored in a single entry of the |
| 120 | .note section. |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 121 | |
| 122 | .amd_kernel_code_t |
| 123 | ^^^^^^^^^^^^^^^^^^ |
| 124 | |
| 125 | This directive marks the beginning of a list of key / value pairs that are used |
| 126 | to specify the amd_kernel_code_t object that will be emitted by the assembler. |
| 127 | The list must be terminated by the *.end_amd_kernel_code_t* directive. For |
| 128 | any amd_kernel_code_t values that are unspecified a default value will be |
| 129 | used. The default value for all keys is 0, with the following exceptions: |
| 130 | |
| 131 | - *kernel_code_version_major* defaults to 1. |
| 132 | - *machine_kind* defaults to 1. |
| 133 | - *machine_version_major*, *machine_version_minor*, and |
| 134 | *machine_version_stepping* are derived from the value of the -mcpu option |
| 135 | that is passed to the assembler. |
| 136 | - *kernel_code_entry_byte_offset* defaults to 256. |
| 137 | - *wavefront_size* defaults to 6. |
| 138 | - *kernarg_segment_alignment*, *group_segment_alignment*, and |
| 139 | *private_segment_alignment* default to 4. Note that alignments are specified |
| 140 | as a power of two, so a value of **n** means an alignment of 2^ **n**. |
| 141 | |
| 142 | The *.amd_kernel_code_t* directive must be placed immediately after the |
| 143 | function label and before any instructions. |
| 144 | |
| 145 | For a full list of amd_kernel_code_t keys, see the examples in |
| 146 | test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different |
| 147 | keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h |
| 148 | |
| 149 | Here is an example of a minimal amd_kernel_code_t specification: |
| 150 | |
| 151 | .. code-block:: nasm |
| 152 | |
| 153 | .hsa_code_object_version 1,0 |
| 154 | .hsa_code_object_isa |
| 155 | |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame^] | 156 | .hsatext |
| 157 | .globl hello_world |
| 158 | .p2align 8 |
| 159 | .amdgpu_hsa_kernel hello_world |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 160 | |
| 161 | hello_world: |
| 162 | |
| 163 | .amd_kernel_code_t |
| 164 | enable_sgpr_kernarg_segment_ptr = 1 |
| 165 | is_ptr64 = 1 |
| 166 | compute_pgm_rsrc1_vgprs = 0 |
| 167 | compute_pgm_rsrc1_sgprs = 0 |
| 168 | compute_pgm_rsrc2_user_sgpr = 2 |
| 169 | kernarg_segment_byte_size = 8 |
| 170 | wavefront_sgpr_count = 2 |
| 171 | workitem_vgpr_count = 3 |
| 172 | .end_amd_kernel_code_t |
| 173 | |
| 174 | s_load_dwordx2 s[0:1], s[0:1] 0x0 |
| 175 | v_mov_b32 v0, 3.14159 |
| 176 | s_waitcnt lgkmcnt(0) |
| 177 | v_mov_b32 v1, s0 |
| 178 | v_mov_b32 v2, s1 |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame^] | 179 | flat_store_dword v[1:2], v0 |
Tom Stellard | ff7416b | 2015-06-26 21:58:31 +0000 | [diff] [blame] | 180 | s_endpgm |
Tom Stellard | b8a91bb | 2016-02-22 18:36:00 +0000 | [diff] [blame^] | 181 | .Lfunc_end0: |
| 182 | .size hello_world, .Lfunc_end0-hello_world |