|  | ============================== | 
|  | User Guide for AMDGPU Back-end | 
|  | ============================== | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with | 
|  | the R600 family up until the current Volcanic Islands (GCN Gen 3). | 
|  |  | 
|  |  | 
|  | Assembler | 
|  | ========= | 
|  |  | 
|  | The assembler is currently considered experimental. | 
|  |  | 
|  | For syntax examples look in test/MC/AMDGPU. | 
|  |  | 
|  | Below some of the currently supported features (modulo bugs).  These | 
|  | all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands | 
|  | are also supported but may be missing some instructions and have more bugs: | 
|  |  | 
|  | DS Instructions | 
|  | --------------- | 
|  | All DS instructions are supported. | 
|  |  | 
|  | FLAT Instructions | 
|  | ------------------ | 
|  | These instructions are only present in the Sea Islands and Volcanic Islands | 
|  | instruction set.  All FLAT instructions are supported for these architectures | 
|  |  | 
|  | MUBUF Instructions | 
|  | ------------------ | 
|  | All non-atomic MUBUF instructions are supported. | 
|  |  | 
|  | SMRD Instructions | 
|  | ----------------- | 
|  | Only the s_load_dword* SMRD instructions are supported. | 
|  |  | 
|  | SOP1 Instructions | 
|  | ----------------- | 
|  | All SOP1 instructions are supported. | 
|  |  | 
|  | SOP2 Instructions | 
|  | ----------------- | 
|  | All SOP2 instructions are supported. | 
|  |  | 
|  | SOPC Instructions | 
|  | ----------------- | 
|  | All SOPC instructions are supported. | 
|  |  | 
|  | SOPP Instructions | 
|  | ----------------- | 
|  |  | 
|  | Unless otherwise mentioned, all SOPP instructions that have one or more | 
|  | operands accept integer operands only.  No verification is performed | 
|  | on the operands, so it is up to the programmer to be familiar with the | 
|  | range or acceptable values. | 
|  |  | 
|  | s_waitcnt | 
|  | ^^^^^^^^^ | 
|  |  | 
|  | s_waitcnt accepts named arguments to specify which memory counter(s) to | 
|  | wait for. | 
|  |  | 
|  | .. code-block:: nasm | 
|  |  | 
|  | // Wait for all counters to be 0 | 
|  | s_waitcnt 0 | 
|  |  | 
|  | // Equivalent to s_waitcnt 0.  Counter names can also be delimited by | 
|  | // '&' or ','. | 
|  | s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) | 
|  |  | 
|  | // Wait for vmcnt counter to be 1. | 
|  | s_waitcnt vmcnt(1) | 
|  |  | 
|  | VOP1, VOP2, VOP3, VOPC Instructions | 
|  | ----------------------------------- | 
|  |  | 
|  | All 32-bit and 64-bit encodings should work. | 
|  |  | 
|  | The assembler will automatically detect which encoding size to use for | 
|  | VOP1, VOP2, and VOPC instructions based on the operands.  If you want to force | 
|  | a specific encoding size, you can add an _e32 (for 32-bit encoding) or | 
|  | _e64 (for 64-bit encoding) suffix to the instruction.  Most, but not all | 
|  | instructions support an explicit suffix.  These are all valid assembly | 
|  | strings: | 
|  |  | 
|  | .. code-block:: nasm | 
|  |  | 
|  | v_mul_i32_i24 v1, v2, v3 | 
|  | v_mul_i32_i24_e32 v1, v2, v3 | 
|  | v_mul_i32_i24_e64 v1, v2, v3 | 
|  |  | 
|  | Assembler Directives | 
|  | -------------------- | 
|  |  | 
|  | .hsa_code_object_version major, minor | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | *major* and *minor* are integers that specify the version of the HSA code | 
|  | object that will be generated by the assembler.  This value will be stored | 
|  | in an entry of the .note section. | 
|  |  | 
|  | .hsa_code_object_isa [major, minor, stepping, vendor, arch] | 
|  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | *major*, *minor*, and *stepping* are all integers that describe the instruction | 
|  | set architecture (ISA) version of the assembly program. | 
|  |  | 
|  | *vendor* and *arch* are quoted strings.  *vendor* should always be equal to | 
|  | "AMD" and *arch* should always be equal to "AMDGPU". | 
|  |  | 
|  | If no arguments are specified, then the assembler will derive the ISA version, | 
|  | *vendor*, and *arch* from the value of the -mcpu option that is passed to the | 
|  | assembler. | 
|  |  | 
|  | ISA version, *vendor*, and *arch* will all be stored in a single entry of the | 
|  | .note section. | 
|  |  | 
|  | .amd_kernel_code_t | 
|  | ^^^^^^^^^^^^^^^^^^ | 
|  |  | 
|  | This directive marks the beginning of a list of key / value pairs that are used | 
|  | to specify the amd_kernel_code_t object that will be emitted by the assembler. | 
|  | The list must be terminated by the *.end_amd_kernel_code_t* directive.  For | 
|  | any amd_kernel_code_t values that are unspecified a default value will be | 
|  | used.  The default value for all keys is 0, with the following exceptions: | 
|  |  | 
|  | - *kernel_code_version_major* defaults to 1. | 
|  | - *machine_kind* defaults to 1. | 
|  | - *machine_version_major*, *machine_version_minor*, and | 
|  | *machine_version_stepping* are derived from the value of the -mcpu option | 
|  | that is passed to the assembler. | 
|  | - *kernel_code_entry_byte_offset* defaults to 256. | 
|  | - *wavefront_size* defaults to 6. | 
|  | - *kernarg_segment_alignment*, *group_segment_alignment*, and | 
|  | *private_segment_alignment* default to 4.  Note that alignments are specified | 
|  | as a power of two, so a value of **n** means an alignment of 2^ **n**. | 
|  |  | 
|  | The *.amd_kernel_code_t* directive must be placed immediately after the | 
|  | function label and before any instructions. | 
|  |  | 
|  | For a full list of amd_kernel_code_t keys, see the examples in | 
|  | test/CodeGen/AMDGPU/hsa.s.  For an explanation of the meanings of the different | 
|  | keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h | 
|  |  | 
|  | Here is an example of a minimal amd_kernel_code_t specification: | 
|  |  | 
|  | .. code-block:: nasm | 
|  |  | 
|  | .hsa_code_object_version 1,0 | 
|  | .hsa_code_object_isa | 
|  |  | 
|  | .text | 
|  |  | 
|  | hello_world: | 
|  |  | 
|  | .amd_kernel_code_t | 
|  | enable_sgpr_kernarg_segment_ptr = 1 | 
|  | is_ptr64 = 1 | 
|  | compute_pgm_rsrc1_vgprs = 0 | 
|  | compute_pgm_rsrc1_sgprs = 0 | 
|  | compute_pgm_rsrc2_user_sgpr = 2 | 
|  | kernarg_segment_byte_size = 8 | 
|  | wavefront_sgpr_count = 2 | 
|  | workitem_vgpr_count = 3 | 
|  | .end_amd_kernel_code_t | 
|  |  | 
|  | s_load_dwordx2 s[0:1], s[0:1] 0x0 | 
|  | v_mov_b32 v0, 3.14159 | 
|  | s_waitcnt lgkmcnt(0) | 
|  | v_mov_b32 v1, s0 | 
|  | v_mov_b32 v2, s1 | 
|  | flat_store_dword v0, v[1:2] | 
|  | s_endpgm |