Blame - llvm/docs/AMDGPUUsage.rst - toolchain/llvm-project

blob: 0824eb86650aea1cef4811faff88e04359cec918 [file] [log] [blame]

Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	1	==============================
				2	User Guide for AMDGPU Back-end
				3	==============================
				4
				5	Introduction
				6	============
				7
				8	The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
				9	the R600 family up until the current Volcanic Islands (GCN Gen 3).
				10
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	11	Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
				12	for additional documentation.
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	13
Tom Stellard	3ec09e6	2016-04-06 01:29:19 +0000	[diff] [blame]	14	Conventions
				15	===========
				16
				17	Address Spaces
				18	--------------
				19
				20	The AMDGPU back-end uses the following address space mapping:
				21
				22	============= ============================================
				23	Address Space Memory Space
				24	============= ============================================
				25	0 Private
				26	1 Global
				27	2 Constant
				28	3 Local
				29	4 Generic (Flat)
				30	5 Region
				31	============= ============================================
				32
				33	The terminology in the table, aside from the region memory space, is from the
				34	OpenCL standard.
				35
				36
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	37	Assembler
				38	=========
				39
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	40	AMDGPU backend has LLVM-MC based assembler which is currently in development.
				41	It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	42
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	43	This document describes general syntax for instructions and operands. For more
				44	information about instructions, their semantics and supported combinations
				45	of operands, refer to one of Instruction Set Architecture manuals.
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	46
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	47	An instruction has the following syntax (register operands are
				48	normally comma-separated while extra operands are space-separated):
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	49
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	50	<opcode> <register_operand0>, ... <extra_operand0> ...
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	51
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	52
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	53	Operands
				54	--------
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	55
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	56	The following syntax for register operands is supported:
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	57
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	58	* SGPR registers: s0, ... or s[0], ...
				59	* VGPR registers: v0, ... or v[0], ...
				60	* TTMP registers: ttmp0, ... or ttmp[0], ...
				61	* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
				62	* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
				63	* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
				64	* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
				65	* Register index expressions: v[2*2], s[1-1:2-1]
				66	* 'off' indicates that an operand is not enabled
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	67
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	68	The following extra operands are supported:
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	69
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	70	* offset, offset0, offset1
				71	* idxen, offen bits
				72	* glc, slc, tfe bits
				73	* waitcnt: integer or combination of counter values
				74	* VOP3 modifiers:
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	75
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	76	- abs (\\| \\|), neg (\-)
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	77
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	78	* DPP modifiers:
				79
				80	- row_shl, row_shr, row_ror, row_rol
				81	- row_mirror, row_half_mirror, row_bcast
				82	- wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
				83	- row_mask, bank_mask, bound_ctrl
				84
				85	* SDWA modifiers:
				86
				87	- dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
				88	- dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
				89	- abs, neg, sext
				90
				91	DS Instructions Examples
				92	------------------------
				93
				94	.. code-block:: nasm
				95
				96	ds_add_u32 v2, v4 offset:16
				97	ds_write_src2_b64 v2 offset0:4 offset1:8
				98	ds_cmpst_f32 v2, v4, v6
				99	ds_min_rtn_f64 v[8:9], v2, v[4:5]
				100
				101
				102	For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
				103
				104	FLAT Instruction Examples
				105	--------------------------
				106
				107	.. code-block:: nasm
				108
				109	flat_load_dword v1, v[3:4]
				110	flat_store_dwordx3 v[3:4], v[5:7]
				111	flat_atomic_swap v1, v[3:4], v5 glc
				112	flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
				113	flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
				114
				115	For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
				116
				117	MUBUF Instruction Examples
				118	---------------------------
				119
				120	.. code-block:: nasm
				121
				122	buffer_load_dword v1, off, s[4:7], s1
				123	buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
				124	buffer_store_format_xy v[1:2], off, s[4:7], s1
				125	buffer_wbinvl1
				126	buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
				127
				128	For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
				129
				130	SMRD/SMEM Instruction Examples
				131	-------------------------------
				132
				133	.. code-block:: nasm
				134
				135	s_load_dword s1, s[2:3], 0xfc
				136	s_load_dwordx8 s[8:15], s[2:3], s4
				137	s_load_dwordx16 s[88:103], s[2:3], s4
				138	s_dcache_inv_vol
				139	s_memtime s[4:5]
				140
				141	For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
				142
				143	SOP1 Instruction Examples
				144	--------------------------
				145
				146	.. code-block:: nasm
				147
				148	s_mov_b32 s1, s2
				149	s_mov_b64 s[0:1], 0x80000000
				150	s_cmov_b32 s1, 200
				151	s_wqm_b64 s[2:3], s[4:5]
				152	s_bcnt0_i32_b64 s1, s[2:3]
				153	s_swappc_b64 s[2:3], s[4:5]
				154	s_cbranch_join s[4:5]
				155
				156	For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
				157
				158	SOP2 Instruction Examples
				159	-------------------------
				160
				161	.. code-block:: nasm
				162
				163	s_add_u32 s1, s2, s3
				164	s_and_b64 s[2:3], s[4:5], s[6:7]
				165	s_cselect_b32 s1, s2, s3
				166	s_andn2_b32 s2, s4, s6
				167	s_lshr_b64 s[2:3], s[4:5], s6
				168	s_ashr_i32 s2, s4, s6
				169	s_bfm_b64 s[2:3], s4, s6
				170	s_bfe_i64 s[2:3], s[4:5], s6
				171	s_cbranch_g_fork s[4:5], s[6:7]
				172
				173	For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
				174
				175	SOPC Instruction Examples
				176	--------------------------
				177
				178	.. code-block:: nasm
				179
				180	s_cmp_eq_i32 s1, s2
				181	s_bitcmp1_b32 s1, s2
				182	s_bitcmp0_b64 s[2:3], s4
				183	s_setvskip s3, s5
				184
				185	For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
				186
				187	SOPP Instruction Examples
				188	--------------------------
				189
				190	.. code-block:: nasm
				191
				192	s_barrier
				193	s_nop 2
				194	s_endpgm
				195	s_waitcnt 0 ; Wait for all counters to be 0
				196	s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
				197	s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
				198	s_sethalt 9
				199	s_sleep 10
				200	s_sendmsg 0x1
				201	s_sendmsg sendmsg(MSG_INTERRUPT)
				202	s_trap 1
				203
				204	For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
				205
				206	Unless otherwise mentioned, little verification is performed on the operands
				207	of SOPP Instrucitons, so it is up to the programmer to be familiar with the
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	208	range or acceptable values.
				209
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	210	Vector ALU Instruction Examples
				211	-------------------------------
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	212
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	213	For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
				214	the assembler will automatically use optimal encoding based on its operands.
				215	To force specific encoding, one can add a suffix to the opcode of the instruction:
				216
				217	* _e32 for 32-bit VOP1/VOP2/VOPC
				218	* _e64 for 64-bit VOP3
				219	* _dpp for VOP_DPP
				220	* _sdwa for VOP_SDWA
				221
				222	VOP1/VOP2/VOP3/VOPC examples:
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	223
				224	.. code-block:: nasm
				225
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	226	v_mov_b32 v1, v2
				227	v_mov_b32_e32 v1, v2
				228	v_nop
				229	v_cvt_f64_i32_e32 v[1:2], v2
				230	v_floor_f32_e32 v1, v2
				231	v_bfrev_b32_e32 v1, v2
				232	v_add_f32_e32 v1, v2, v3
				233	v_mul_i32_i24_e64 v1, v2, 3
				234	v_mul_i32_i24_e32 v1, -3, v3
				235	v_mul_i32_i24_e32 v1, -100, v3
				236	v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
				237	v_max_f16_e32 v1, v2, v3
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	238
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	239	VOP_DPP examples:
Tom Stellard	45bb48e	2015-06-13 03:28:10 +0000	[diff] [blame]	240
				241	.. code-block:: nasm
				242
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	243	v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
				244	v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
				245	v_mov_b32 v0, v0 wave_shl:1
				246	v_mov_b32 v0, v0 row_mirror
				247	v_mov_b32 v0, v0 row_bcast:31
				248	v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
				249	v_add_f32 v0, v0, \|v0\| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
				250	v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
Tom Stellard	347ac79	2015-06-26 21:15:07 +0000	[diff] [blame]	251
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	252	VOP_SDWA examples:
				253
				254	.. code-block:: nasm
				255
				256	v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
				257	v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
				258	v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
				259	v_fract_f32 v0, \|v0\| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
				260	v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
				261
				262	For full list of supported instructions, refer to "Vector ALU instructions".
				263
				264	HSA Code Object Directives
				265	--------------------------
				266
				267	AMDGPU ABI defines auxiliary data in output code object. In assembly source,
				268	one can specify them with assembler directives.
Tom Stellard	347ac79	2015-06-26 21:15:07 +0000	[diff] [blame]	269
				270	.hsa_code_object_version major, minor
				271	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				272
				273	major and minor are integers that specify the version of the HSA code
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	274	object that will be generated by the assembler.
Tom Stellard	347ac79	2015-06-26 21:15:07 +0000	[diff] [blame]	275
				276	.hsa_code_object_isa [major, minor, stepping, vendor, arch]
				277	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				278
				279	major, minor, and stepping are all integers that describe the instruction
				280	set architecture (ISA) version of the assembly program.
				281
				282	vendor and arch are quoted strings. vendor should always be equal to
				283	"AMD" and arch should always be equal to "AMDGPU".
				284
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	285	By default, the assembler will derive the ISA version, vendor, and arch
				286	from the value of the -mcpu option that is passed to the assembler.
Tom Stellard	347ac79	2015-06-26 21:15:07 +0000	[diff] [blame]	287
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	288	.amdgpu_hsa_kernel (name)
				289	^^^^^^^^^^^^^^^^^^^^^^^^^
				290
				291	This directives specifies that the symbol with given name is a kernel entry point
				292	(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
Tom Stellard	ff7416b	2015-06-26 21:58:31 +0000	[diff] [blame]	293
				294	.amd_kernel_code_t
				295	^^^^^^^^^^^^^^^^^^
				296
				297	This directive marks the beginning of a list of key / value pairs that are used
				298	to specify the amd_kernel_code_t object that will be emitted by the assembler.
				299	The list must be terminated by the .end_amd_kernel_code_t directive. For
				300	any amd_kernel_code_t values that are unspecified a default value will be
				301	used. The default value for all keys is 0, with the following exceptions:
				302
				303	- kernel_code_version_major defaults to 1.
				304	- machine_kind defaults to 1.
				305	- machine_version_major, machine_version_minor, and
				306	machine_version_stepping are derived from the value of the -mcpu option
				307	that is passed to the assembler.
				308	- kernel_code_entry_byte_offset defaults to 256.
				309	- wavefront_size defaults to 6.
				310	- kernarg_segment_alignment, group_segment_alignment, and
				311	private_segment_alignment default to 4. Note that alignments are specified
				312	as a power of two, so a value of n means an alignment of 2^ n.
				313
				314	The .amd_kernel_code_t directive must be placed immediately after the
				315	function label and before any instructions.
				316
Nikolay Haustov	96a56bd	2016-09-20 09:04:51 +0000	[diff] [blame]	317	For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
				318	comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
Tom Stellard	ff7416b	2015-06-26 21:58:31 +0000	[diff] [blame]	319
				320	Here is an example of a minimal amd_kernel_code_t specification:
				321
Aaron Ballman	887ad0e	2016-07-19 17:46:55 +0000	[diff] [blame]	322	.. code-block:: none
Tom Stellard	ff7416b	2015-06-26 21:58:31 +0000	[diff] [blame]	323
				324	.hsa_code_object_version 1,0
				325	.hsa_code_object_isa
				326
Tom Stellard	b8a91bb	2016-02-22 18:36:00 +0000	[diff] [blame]	327	.hsatext
				328	.globl hello_world
				329	.p2align 8
				330	.amdgpu_hsa_kernel hello_world
Tom Stellard	ff7416b	2015-06-26 21:58:31 +0000	[diff] [blame]	331
				332	hello_world:
				333
				334	.amd_kernel_code_t
				335	enable_sgpr_kernarg_segment_ptr = 1
				336	is_ptr64 = 1
				337	compute_pgm_rsrc1_vgprs = 0
				338	compute_pgm_rsrc1_sgprs = 0
				339	compute_pgm_rsrc2_user_sgpr = 2
				340	kernarg_segment_byte_size = 8
				341	wavefront_sgpr_count = 2
				342	workitem_vgpr_count = 3
				343	.end_amd_kernel_code_t
				344
				345	s_load_dwordx2 s[0:1], s[0:1] 0x0
				346	v_mov_b32 v0, 3.14159
				347	s_waitcnt lgkmcnt(0)
				348	v_mov_b32 v1, s0
				349	v_mov_b32 v2, s1
Tom Stellard	b8a91bb	2016-02-22 18:36:00 +0000	[diff] [blame]	350	flat_store_dword v[1:2], v0
Tom Stellard	ff7416b	2015-06-26 21:58:31 +0000	[diff] [blame]	351	s_endpgm
Sylvestre Ledru	a7de982	2016-02-23 11:17:27 +0000	[diff] [blame]	352	.Lfunc_end0:
Tom Stellard	b8a91bb	2016-02-22 18:36:00 +0000	[diff] [blame]	353	.size hello_world, .Lfunc_end0-hello_world