blob: ffb0e677e9a991095a56fa9a8a6992116afbb450 [file] [log] [blame]
Tom Stellard45bb48e2015-06-13 03:28:10 +00001==============================
2User Guide for AMDGPU Back-end
3==============================
4
5Introduction
6============
7
8The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
9the R600 family up until the current Volcanic Islands (GCN Gen 3).
10
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000011Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
12for additional documentation.
Tom Stellard45bb48e2015-06-13 03:28:10 +000013
Tom Stellard3ec09e62016-04-06 01:29:19 +000014Conventions
15===========
16
17Address Spaces
18--------------
19
20The AMDGPU back-end uses the following address space mapping:
21
22 ============= ============================================
23 Address Space Memory Space
24 ============= ============================================
25 0 Private
26 1 Global
27 2 Constant
28 3 Local
29 4 Generic (Flat)
30 5 Region
31 ============= ============================================
32
33The terminology in the table, aside from the region memory space, is from the
34OpenCL standard.
35
Wei Ding16289cf2017-02-21 18:48:01 +000036Trap Handler ABI
37----------------
38The OS element of the target triple controls the trap handler behavior.
39
40HSA OS
41^^^^^^
42For code objects generated by AMDGPU back-end for the HSA OS, the runtime
43installs a trap handler that supports the s_trap instruction with the following
44usage:
45
46 +--------------+-------------+-------------------+----------------------------+
47 |Usage |Code Sequence|Trap Handler Inputs|Description |
48 +==============+=============+===================+============================+
49 |reserved |s_trap 0x00 | |Reserved by hardware. |
50 +--------------+-------------+-------------------+----------------------------+
51 |HSA debugtrap |s_trap 0x01 |SGPR0-1: queue_ptr |Reserved for HSA debugtrap |
52 |(arg) | |VGPR0: arg |intrinsic (not implemented).|
53 +--------------+-------------+-------------------+----------------------------+
54 |llvm.trap |s_trap 0x02 |SGPR0-1: queue_ptr |Causes dispatch to be |
55 | | | |terminated and its |
56 | | | |associated queue put into |
57 | | | |the error state. |
58 +--------------+-------------+-------------------+----------------------------+
59 |llvm.debugtrap| s_trap 0x03 |SGPR0-1: queue_ptr |If debugger not installed |
60 | | | |handled same as llvm.trap. |
61 +--------------+-------------+-------------------+----------------------------+
62 |debugger |s_trap 0x07 | |Reserved for debugger |
63 |breakpoint | | |breakpoints. |
64 +--------------+-------------+-------------------+----------------------------+
65 |debugger |s_trap 0x08 | |Reserved for debugger. |
66 +--------------+-------------+-------------------+----------------------------+
67 |debugger |s_trap 0xfe | |Reserved for debugger. |
68 +--------------+-------------+-------------------+----------------------------+
69 |debugger |s_trap 0xff | |Reserved for debugger. |
70 +--------------+-------------+-------------------+----------------------------+
71
72Non-HSA OS
73^^^^^^^^^^
74For code objects generated by AMDGPU back-end for non-HSA OS, the runtime does
75not install a trap handler. The llvm.trap and llvm.debugtrap instructions are
76handler as follows:
77
78 =============== ============= ===============================================
79 Usage Code Sequence Description
80 =============== ============= ===============================================
81 llvm.trap s_endpgm Causes wavefront to be terminated.
82 llvm.debugtrap s_nop No operation. Compiler warning generated that
83 there is no trap handler installed.
84 =============== ============= ===============================================
Tom Stellard3ec09e62016-04-06 01:29:19 +000085
Tom Stellard45bb48e2015-06-13 03:28:10 +000086Assembler
87=========
88
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000089AMDGPU backend has LLVM-MC based assembler which is currently in development.
90It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
Tom Stellard45bb48e2015-06-13 03:28:10 +000091
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000092This document describes general syntax for instructions and operands. For more
93information about instructions, their semantics and supported combinations
94of operands, refer to one of Instruction Set Architecture manuals.
Tom Stellard45bb48e2015-06-13 03:28:10 +000095
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000096An instruction has the following syntax (register operands are
97normally comma-separated while extra operands are space-separated):
Tom Stellard45bb48e2015-06-13 03:28:10 +000098
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000099*<opcode> <register_operand0>, ... <extra_operand0> ...*
Tom Stellard45bb48e2015-06-13 03:28:10 +0000100
Tom Stellard45bb48e2015-06-13 03:28:10 +0000101
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000102Operands
103--------
Tom Stellard45bb48e2015-06-13 03:28:10 +0000104
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000105The following syntax for register operands is supported:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000106
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000107* SGPR registers: s0, ... or s[0], ...
108* VGPR registers: v0, ... or v[0], ...
109* TTMP registers: ttmp0, ... or ttmp[0], ...
110* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
111* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
112* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
113* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
114* Register index expressions: v[2*2], s[1-1:2-1]
115* 'off' indicates that an operand is not enabled
Tom Stellard45bb48e2015-06-13 03:28:10 +0000116
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000117The following extra operands are supported:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000118
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000119* offset, offset0, offset1
120* idxen, offen bits
121* glc, slc, tfe bits
122* waitcnt: integer or combination of counter values
123* VOP3 modifiers:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000124
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000125 - abs (\| \|), neg (\-)
Tom Stellard45bb48e2015-06-13 03:28:10 +0000126
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000127* DPP modifiers:
128
129 - row_shl, row_shr, row_ror, row_rol
130 - row_mirror, row_half_mirror, row_bcast
131 - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
132 - row_mask, bank_mask, bound_ctrl
133
134* SDWA modifiers:
135
136 - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
137 - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
138 - abs, neg, sext
139
140DS Instructions Examples
141------------------------
142
143.. code-block:: nasm
144
145 ds_add_u32 v2, v4 offset:16
146 ds_write_src2_b64 v2 offset0:4 offset1:8
147 ds_cmpst_f32 v2, v4, v6
148 ds_min_rtn_f64 v[8:9], v2, v[4:5]
149
150
151For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
152
153FLAT Instruction Examples
154--------------------------
155
156.. code-block:: nasm
157
158 flat_load_dword v1, v[3:4]
159 flat_store_dwordx3 v[3:4], v[5:7]
160 flat_atomic_swap v1, v[3:4], v5 glc
161 flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
162 flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
163
164For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
165
166MUBUF Instruction Examples
167---------------------------
168
169.. code-block:: nasm
170
171 buffer_load_dword v1, off, s[4:7], s1
172 buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
173 buffer_store_format_xy v[1:2], off, s[4:7], s1
174 buffer_wbinvl1
175 buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
176
177For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
178
179SMRD/SMEM Instruction Examples
180-------------------------------
181
182.. code-block:: nasm
183
184 s_load_dword s1, s[2:3], 0xfc
185 s_load_dwordx8 s[8:15], s[2:3], s4
186 s_load_dwordx16 s[88:103], s[2:3], s4
187 s_dcache_inv_vol
188 s_memtime s[4:5]
189
190For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
191
192SOP1 Instruction Examples
193--------------------------
194
195.. code-block:: nasm
196
197 s_mov_b32 s1, s2
198 s_mov_b64 s[0:1], 0x80000000
199 s_cmov_b32 s1, 200
200 s_wqm_b64 s[2:3], s[4:5]
201 s_bcnt0_i32_b64 s1, s[2:3]
202 s_swappc_b64 s[2:3], s[4:5]
203 s_cbranch_join s[4:5]
204
205For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
206
207SOP2 Instruction Examples
208-------------------------
209
210.. code-block:: nasm
211
212 s_add_u32 s1, s2, s3
213 s_and_b64 s[2:3], s[4:5], s[6:7]
214 s_cselect_b32 s1, s2, s3
215 s_andn2_b32 s2, s4, s6
216 s_lshr_b64 s[2:3], s[4:5], s6
217 s_ashr_i32 s2, s4, s6
218 s_bfm_b64 s[2:3], s4, s6
219 s_bfe_i64 s[2:3], s[4:5], s6
220 s_cbranch_g_fork s[4:5], s[6:7]
221
222For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
223
224SOPC Instruction Examples
225--------------------------
226
227.. code-block:: nasm
228
229 s_cmp_eq_i32 s1, s2
230 s_bitcmp1_b32 s1, s2
231 s_bitcmp0_b64 s[2:3], s4
232 s_setvskip s3, s5
233
234For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
235
236SOPP Instruction Examples
237--------------------------
238
239.. code-block:: nasm
240
241 s_barrier
242 s_nop 2
243 s_endpgm
244 s_waitcnt 0 ; Wait for all counters to be 0
245 s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
246 s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
247 s_sethalt 9
248 s_sleep 10
249 s_sendmsg 0x1
250 s_sendmsg sendmsg(MSG_INTERRUPT)
251 s_trap 1
252
253For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
254
255Unless otherwise mentioned, little verification is performed on the operands
Sylvestre Ledrue6ec4412017-01-14 11:37:01 +0000256of SOPP Instructions, so it is up to the programmer to be familiar with the
Tom Stellard45bb48e2015-06-13 03:28:10 +0000257range or acceptable values.
258
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000259Vector ALU Instruction Examples
260-------------------------------
Tom Stellard45bb48e2015-06-13 03:28:10 +0000261
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000262For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
263the assembler will automatically use optimal encoding based on its operands.
264To force specific encoding, one can add a suffix to the opcode of the instruction:
265
266* _e32 for 32-bit VOP1/VOP2/VOPC
267* _e64 for 64-bit VOP3
268* _dpp for VOP_DPP
269* _sdwa for VOP_SDWA
270
271VOP1/VOP2/VOP3/VOPC examples:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000272
273.. code-block:: nasm
274
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000275 v_mov_b32 v1, v2
276 v_mov_b32_e32 v1, v2
277 v_nop
278 v_cvt_f64_i32_e32 v[1:2], v2
279 v_floor_f32_e32 v1, v2
280 v_bfrev_b32_e32 v1, v2
281 v_add_f32_e32 v1, v2, v3
282 v_mul_i32_i24_e64 v1, v2, 3
283 v_mul_i32_i24_e32 v1, -3, v3
284 v_mul_i32_i24_e32 v1, -100, v3
285 v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
286 v_max_f16_e32 v1, v2, v3
Tom Stellard45bb48e2015-06-13 03:28:10 +0000287
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000288VOP_DPP examples:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000289
290.. code-block:: nasm
291
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000292 v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
293 v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
294 v_mov_b32 v0, v0 wave_shl:1
295 v_mov_b32 v0, v0 row_mirror
296 v_mov_b32 v0, v0 row_bcast:31
297 v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
298 v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
299 v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
Tom Stellard347ac792015-06-26 21:15:07 +0000300
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000301VOP_SDWA examples:
302
303.. code-block:: nasm
304
305 v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
306 v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
307 v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
308 v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
309 v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
310
311For full list of supported instructions, refer to "Vector ALU instructions".
312
313HSA Code Object Directives
314--------------------------
315
316AMDGPU ABI defines auxiliary data in output code object. In assembly source,
317one can specify them with assembler directives.
Tom Stellard347ac792015-06-26 21:15:07 +0000318
319.hsa_code_object_version major, minor
320^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
321
322*major* and *minor* are integers that specify the version of the HSA code
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000323object that will be generated by the assembler.
Tom Stellard347ac792015-06-26 21:15:07 +0000324
325.hsa_code_object_isa [major, minor, stepping, vendor, arch]
326^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
327
328*major*, *minor*, and *stepping* are all integers that describe the instruction
329set architecture (ISA) version of the assembly program.
330
331*vendor* and *arch* are quoted strings. *vendor* should always be equal to
332"AMD" and *arch* should always be equal to "AMDGPU".
333
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000334By default, the assembler will derive the ISA version, *vendor*, and *arch*
335from the value of the -mcpu option that is passed to the assembler.
Tom Stellard347ac792015-06-26 21:15:07 +0000336
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000337.amdgpu_hsa_kernel (name)
338^^^^^^^^^^^^^^^^^^^^^^^^^
339
340This directives specifies that the symbol with given name is a kernel entry point
341(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
Tom Stellardff7416b2015-06-26 21:58:31 +0000342
343.amd_kernel_code_t
344^^^^^^^^^^^^^^^^^^
345
346This directive marks the beginning of a list of key / value pairs that are used
347to specify the amd_kernel_code_t object that will be emitted by the assembler.
348The list must be terminated by the *.end_amd_kernel_code_t* directive. For
349any amd_kernel_code_t values that are unspecified a default value will be
350used. The default value for all keys is 0, with the following exceptions:
351
352- *kernel_code_version_major* defaults to 1.
353- *machine_kind* defaults to 1.
354- *machine_version_major*, *machine_version_minor*, and
355 *machine_version_stepping* are derived from the value of the -mcpu option
356 that is passed to the assembler.
357- *kernel_code_entry_byte_offset* defaults to 256.
358- *wavefront_size* defaults to 6.
359- *kernarg_segment_alignment*, *group_segment_alignment*, and
360 *private_segment_alignment* default to 4. Note that alignments are specified
361 as a power of two, so a value of **n** means an alignment of 2^ **n**.
362
363The *.amd_kernel_code_t* directive must be placed immediately after the
364function label and before any instructions.
365
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000366For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
367comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
Tom Stellardff7416b2015-06-26 21:58:31 +0000368
369Here is an example of a minimal amd_kernel_code_t specification:
370
Aaron Ballman887ad0e2016-07-19 17:46:55 +0000371.. code-block:: none
Tom Stellardff7416b2015-06-26 21:58:31 +0000372
373 .hsa_code_object_version 1,0
374 .hsa_code_object_isa
375
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000376 .hsatext
377 .globl hello_world
378 .p2align 8
379 .amdgpu_hsa_kernel hello_world
Tom Stellardff7416b2015-06-26 21:58:31 +0000380
381 hello_world:
382
383 .amd_kernel_code_t
384 enable_sgpr_kernarg_segment_ptr = 1
385 is_ptr64 = 1
386 compute_pgm_rsrc1_vgprs = 0
387 compute_pgm_rsrc1_sgprs = 0
388 compute_pgm_rsrc2_user_sgpr = 2
389 kernarg_segment_byte_size = 8
390 wavefront_sgpr_count = 2
391 workitem_vgpr_count = 3
392 .end_amd_kernel_code_t
393
394 s_load_dwordx2 s[0:1], s[0:1] 0x0
395 v_mov_b32 v0, 3.14159
396 s_waitcnt lgkmcnt(0)
397 v_mov_b32 v1, s0
398 v_mov_b32 v2, s1
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000399 flat_store_dword v[1:2], v0
Tom Stellardff7416b2015-06-26 21:58:31 +0000400 s_endpgm
Sylvestre Ledrua7de9822016-02-23 11:17:27 +0000401 .Lfunc_end0:
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000402 .size hello_world, .Lfunc_end0-hello_world