blob: 81c067b317d3a67d2520515be277ea1d6eac0929 [file] [log] [blame]
Tom Stellard45bb48e2015-06-13 03:28:10 +00001==============================
2User Guide for AMDGPU Back-end
3==============================
4
5Introduction
6============
7
8The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
9the R600 family up until the current Volcanic Islands (GCN Gen 3).
10
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000011Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
12for additional documentation.
Tom Stellard45bb48e2015-06-13 03:28:10 +000013
Tom Stellard3ec09e62016-04-06 01:29:19 +000014Conventions
15===========
16
17Address Spaces
18--------------
19
20The AMDGPU back-end uses the following address space mapping:
21
Konstantin Zhuravlyovd5561e02017-03-08 23:55:44 +000022 ================== =================== ==============
23 LLVM Address Space DWARF Address Space Memory Space
24 ================== =================== ==============
25 0 1 Private
26 1 N/A Global
27 2 N/A Constant
28 3 2 Local
29 4 N/A Generic (Flat)
30 5 N/A Region
31 ================== =================== ==============
Tom Stellard3ec09e62016-04-06 01:29:19 +000032
33The terminology in the table, aside from the region memory space, is from the
34OpenCL standard.
35
Konstantin Zhuravlyovd5561e02017-03-08 23:55:44 +000036LLVM Address Space is used throughout LLVM (for example, in LLVM IR). DWARF
37Address Space is emitted in DWARF, and is used by tools, such as debugger,
38profiler and others.
39
Wei Ding16289cf2017-02-21 18:48:01 +000040Trap Handler ABI
41----------------
42The OS element of the target triple controls the trap handler behavior.
43
44HSA OS
45^^^^^^
46For code objects generated by AMDGPU back-end for the HSA OS, the runtime
47installs a trap handler that supports the s_trap instruction with the following
48usage:
49
50 +--------------+-------------+-------------------+----------------------------+
51 |Usage |Code Sequence|Trap Handler Inputs|Description |
52 +==============+=============+===================+============================+
53 |reserved |s_trap 0x00 | |Reserved by hardware. |
54 +--------------+-------------+-------------------+----------------------------+
55 |HSA debugtrap |s_trap 0x01 |SGPR0-1: queue_ptr |Reserved for HSA debugtrap |
56 |(arg) | |VGPR0: arg |intrinsic (not implemented).|
57 +--------------+-------------+-------------------+----------------------------+
58 |llvm.trap |s_trap 0x02 |SGPR0-1: queue_ptr |Causes dispatch to be |
59 | | | |terminated and its |
60 | | | |associated queue put into |
61 | | | |the error state. |
62 +--------------+-------------+-------------------+----------------------------+
63 |llvm.debugtrap| s_trap 0x03 |SGPR0-1: queue_ptr |If debugger not installed |
64 | | | |handled same as llvm.trap. |
65 +--------------+-------------+-------------------+----------------------------+
66 |debugger |s_trap 0x07 | |Reserved for debugger |
67 |breakpoint | | |breakpoints. |
68 +--------------+-------------+-------------------+----------------------------+
69 |debugger |s_trap 0x08 | |Reserved for debugger. |
70 +--------------+-------------+-------------------+----------------------------+
71 |debugger |s_trap 0xfe | |Reserved for debugger. |
72 +--------------+-------------+-------------------+----------------------------+
73 |debugger |s_trap 0xff | |Reserved for debugger. |
74 +--------------+-------------+-------------------+----------------------------+
75
76Non-HSA OS
77^^^^^^^^^^
78For code objects generated by AMDGPU back-end for non-HSA OS, the runtime does
79not install a trap handler. The llvm.trap and llvm.debugtrap instructions are
80handler as follows:
81
82 =============== ============= ===============================================
83 Usage Code Sequence Description
84 =============== ============= ===============================================
Matt Arsenault3e025382017-04-24 17:49:13 +000085 llvm.trap s_endpgm Causes wavefront to be terminated.
Aaron Ballman678512d2017-04-27 14:33:01 +000086 llvm.debugtrap Nothing Compiler warning generated that there is no trap handler installed.
Wei Ding16289cf2017-02-21 18:48:01 +000087 =============== ============= ===============================================
Tom Stellard3ec09e62016-04-06 01:29:19 +000088
Tom Stellard45bb48e2015-06-13 03:28:10 +000089Assembler
90=========
91
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000092AMDGPU backend has LLVM-MC based assembler which is currently in development.
93It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
Tom Stellard45bb48e2015-06-13 03:28:10 +000094
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000095This document describes general syntax for instructions and operands. For more
96information about instructions, their semantics and supported combinations
97of operands, refer to one of Instruction Set Architecture manuals.
Tom Stellard45bb48e2015-06-13 03:28:10 +000098
Nikolay Haustov96a56bd2016-09-20 09:04:51 +000099An instruction has the following syntax (register operands are
100normally comma-separated while extra operands are space-separated):
Tom Stellard45bb48e2015-06-13 03:28:10 +0000101
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000102*<opcode> <register_operand0>, ... <extra_operand0> ...*
Tom Stellard45bb48e2015-06-13 03:28:10 +0000103
Tom Stellard45bb48e2015-06-13 03:28:10 +0000104
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000105Operands
106--------
Tom Stellard45bb48e2015-06-13 03:28:10 +0000107
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000108The following syntax for register operands is supported:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000109
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000110* SGPR registers: s0, ... or s[0], ...
111* VGPR registers: v0, ... or v[0], ...
112* TTMP registers: ttmp0, ... or ttmp[0], ...
113* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
114* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
115* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
116* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
117* Register index expressions: v[2*2], s[1-1:2-1]
118* 'off' indicates that an operand is not enabled
Tom Stellard45bb48e2015-06-13 03:28:10 +0000119
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000120The following extra operands are supported:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000121
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000122* offset, offset0, offset1
123* idxen, offen bits
124* glc, slc, tfe bits
125* waitcnt: integer or combination of counter values
126* VOP3 modifiers:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000127
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000128 - abs (\| \|), neg (\-)
Tom Stellard45bb48e2015-06-13 03:28:10 +0000129
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000130* DPP modifiers:
131
132 - row_shl, row_shr, row_ror, row_rol
133 - row_mirror, row_half_mirror, row_bcast
134 - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
135 - row_mask, bank_mask, bound_ctrl
136
137* SDWA modifiers:
138
139 - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
140 - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
141 - abs, neg, sext
142
143DS Instructions Examples
144------------------------
145
146.. code-block:: nasm
147
148 ds_add_u32 v2, v4 offset:16
149 ds_write_src2_b64 v2 offset0:4 offset1:8
150 ds_cmpst_f32 v2, v4, v6
151 ds_min_rtn_f64 v[8:9], v2, v[4:5]
152
153
154For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
155
156FLAT Instruction Examples
157--------------------------
158
159.. code-block:: nasm
160
161 flat_load_dword v1, v[3:4]
162 flat_store_dwordx3 v[3:4], v[5:7]
163 flat_atomic_swap v1, v[3:4], v5 glc
164 flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
165 flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
166
167For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
168
169MUBUF Instruction Examples
170---------------------------
171
172.. code-block:: nasm
173
174 buffer_load_dword v1, off, s[4:7], s1
175 buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
176 buffer_store_format_xy v[1:2], off, s[4:7], s1
177 buffer_wbinvl1
178 buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
179
180For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
181
182SMRD/SMEM Instruction Examples
183-------------------------------
184
185.. code-block:: nasm
186
187 s_load_dword s1, s[2:3], 0xfc
188 s_load_dwordx8 s[8:15], s[2:3], s4
189 s_load_dwordx16 s[88:103], s[2:3], s4
190 s_dcache_inv_vol
191 s_memtime s[4:5]
192
193For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
194
195SOP1 Instruction Examples
196--------------------------
197
198.. code-block:: nasm
199
200 s_mov_b32 s1, s2
201 s_mov_b64 s[0:1], 0x80000000
202 s_cmov_b32 s1, 200
203 s_wqm_b64 s[2:3], s[4:5]
204 s_bcnt0_i32_b64 s1, s[2:3]
205 s_swappc_b64 s[2:3], s[4:5]
206 s_cbranch_join s[4:5]
207
208For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
209
210SOP2 Instruction Examples
211-------------------------
212
213.. code-block:: nasm
214
215 s_add_u32 s1, s2, s3
216 s_and_b64 s[2:3], s[4:5], s[6:7]
217 s_cselect_b32 s1, s2, s3
218 s_andn2_b32 s2, s4, s6
219 s_lshr_b64 s[2:3], s[4:5], s6
220 s_ashr_i32 s2, s4, s6
221 s_bfm_b64 s[2:3], s4, s6
222 s_bfe_i64 s[2:3], s[4:5], s6
223 s_cbranch_g_fork s[4:5], s[6:7]
224
225For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
226
227SOPC Instruction Examples
228--------------------------
229
230.. code-block:: nasm
231
232 s_cmp_eq_i32 s1, s2
233 s_bitcmp1_b32 s1, s2
234 s_bitcmp0_b64 s[2:3], s4
235 s_setvskip s3, s5
236
237For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
238
239SOPP Instruction Examples
240--------------------------
241
242.. code-block:: nasm
243
244 s_barrier
245 s_nop 2
246 s_endpgm
247 s_waitcnt 0 ; Wait for all counters to be 0
248 s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
249 s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
250 s_sethalt 9
251 s_sleep 10
252 s_sendmsg 0x1
253 s_sendmsg sendmsg(MSG_INTERRUPT)
254 s_trap 1
255
256For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
257
258Unless otherwise mentioned, little verification is performed on the operands
Sylvestre Ledrue6ec4412017-01-14 11:37:01 +0000259of SOPP Instructions, so it is up to the programmer to be familiar with the
Tom Stellard45bb48e2015-06-13 03:28:10 +0000260range or acceptable values.
261
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000262Vector ALU Instruction Examples
263-------------------------------
Tom Stellard45bb48e2015-06-13 03:28:10 +0000264
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000265For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
266the assembler will automatically use optimal encoding based on its operands.
267To force specific encoding, one can add a suffix to the opcode of the instruction:
268
269* _e32 for 32-bit VOP1/VOP2/VOPC
270* _e64 for 64-bit VOP3
271* _dpp for VOP_DPP
272* _sdwa for VOP_SDWA
273
274VOP1/VOP2/VOP3/VOPC examples:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000275
276.. code-block:: nasm
277
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000278 v_mov_b32 v1, v2
279 v_mov_b32_e32 v1, v2
280 v_nop
281 v_cvt_f64_i32_e32 v[1:2], v2
282 v_floor_f32_e32 v1, v2
283 v_bfrev_b32_e32 v1, v2
284 v_add_f32_e32 v1, v2, v3
285 v_mul_i32_i24_e64 v1, v2, 3
286 v_mul_i32_i24_e32 v1, -3, v3
287 v_mul_i32_i24_e32 v1, -100, v3
288 v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
289 v_max_f16_e32 v1, v2, v3
Tom Stellard45bb48e2015-06-13 03:28:10 +0000290
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000291VOP_DPP examples:
Tom Stellard45bb48e2015-06-13 03:28:10 +0000292
293.. code-block:: nasm
294
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000295 v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
296 v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
297 v_mov_b32 v0, v0 wave_shl:1
298 v_mov_b32 v0, v0 row_mirror
299 v_mov_b32 v0, v0 row_bcast:31
300 v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
301 v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
302 v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
Tom Stellard347ac792015-06-26 21:15:07 +0000303
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000304VOP_SDWA examples:
305
306.. code-block:: nasm
307
308 v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
309 v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
310 v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
311 v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
312 v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
313
314For full list of supported instructions, refer to "Vector ALU instructions".
315
316HSA Code Object Directives
317--------------------------
318
319AMDGPU ABI defines auxiliary data in output code object. In assembly source,
320one can specify them with assembler directives.
Tom Stellard347ac792015-06-26 21:15:07 +0000321
322.hsa_code_object_version major, minor
323^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
324
325*major* and *minor* are integers that specify the version of the HSA code
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000326object that will be generated by the assembler.
Tom Stellard347ac792015-06-26 21:15:07 +0000327
328.hsa_code_object_isa [major, minor, stepping, vendor, arch]
329^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
330
331*major*, *minor*, and *stepping* are all integers that describe the instruction
332set architecture (ISA) version of the assembly program.
333
334*vendor* and *arch* are quoted strings. *vendor* should always be equal to
335"AMD" and *arch* should always be equal to "AMDGPU".
336
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000337By default, the assembler will derive the ISA version, *vendor*, and *arch*
338from the value of the -mcpu option that is passed to the assembler.
Tom Stellard347ac792015-06-26 21:15:07 +0000339
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000340.amdgpu_hsa_kernel (name)
341^^^^^^^^^^^^^^^^^^^^^^^^^
342
343This directives specifies that the symbol with given name is a kernel entry point
344(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
Tom Stellardff7416b2015-06-26 21:58:31 +0000345
346.amd_kernel_code_t
347^^^^^^^^^^^^^^^^^^
348
349This directive marks the beginning of a list of key / value pairs that are used
350to specify the amd_kernel_code_t object that will be emitted by the assembler.
351The list must be terminated by the *.end_amd_kernel_code_t* directive. For
352any amd_kernel_code_t values that are unspecified a default value will be
353used. The default value for all keys is 0, with the following exceptions:
354
355- *kernel_code_version_major* defaults to 1.
356- *machine_kind* defaults to 1.
357- *machine_version_major*, *machine_version_minor*, and
358 *machine_version_stepping* are derived from the value of the -mcpu option
359 that is passed to the assembler.
360- *kernel_code_entry_byte_offset* defaults to 256.
361- *wavefront_size* defaults to 6.
362- *kernarg_segment_alignment*, *group_segment_alignment*, and
363 *private_segment_alignment* default to 4. Note that alignments are specified
364 as a power of two, so a value of **n** means an alignment of 2^ **n**.
365
366The *.amd_kernel_code_t* directive must be placed immediately after the
367function label and before any instructions.
368
Nikolay Haustov96a56bd2016-09-20 09:04:51 +0000369For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
370comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
Tom Stellardff7416b2015-06-26 21:58:31 +0000371
372Here is an example of a minimal amd_kernel_code_t specification:
373
Aaron Ballman887ad0e2016-07-19 17:46:55 +0000374.. code-block:: none
Tom Stellardff7416b2015-06-26 21:58:31 +0000375
376 .hsa_code_object_version 1,0
377 .hsa_code_object_isa
378
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000379 .hsatext
380 .globl hello_world
381 .p2align 8
382 .amdgpu_hsa_kernel hello_world
Tom Stellardff7416b2015-06-26 21:58:31 +0000383
384 hello_world:
385
386 .amd_kernel_code_t
387 enable_sgpr_kernarg_segment_ptr = 1
388 is_ptr64 = 1
389 compute_pgm_rsrc1_vgprs = 0
390 compute_pgm_rsrc1_sgprs = 0
391 compute_pgm_rsrc2_user_sgpr = 2
392 kernarg_segment_byte_size = 8
393 wavefront_sgpr_count = 2
394 workitem_vgpr_count = 3
395 .end_amd_kernel_code_t
396
397 s_load_dwordx2 s[0:1], s[0:1] 0x0
398 v_mov_b32 v0, 3.14159
399 s_waitcnt lgkmcnt(0)
400 v_mov_b32 v1, s0
401 v_mov_b32 v2, s1
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000402 flat_store_dword v[1:2], v0
Tom Stellardff7416b2015-06-26 21:58:31 +0000403 s_endpgm
Sylvestre Ledrua7de9822016-02-23 11:17:27 +0000404 .Lfunc_end0:
Tom Stellardb8a91bb2016-02-22 18:36:00 +0000405 .size hello_world, .Lfunc_end0-hello_world