[AMDGPU][MC][DOC] Updated AMD GPU assembler description Stage 2: added detailed description of operands See bug 36572: https://bugs.llvm.org/show_bug.cgi?id=36572 llvm-svn: 349368

commit: 47eb63684d22c12b9defb99c394dbd10e26fefff [log] [tgz]
author: Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> Mon Dec 17 17:38:11 2018 +0000
committer: Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com> Mon Dec 17 17:38:11 2018 +0000
tree: f497778b8180320c30d0dda8125e049f5a7d1916
parent: f700c8b25378acf912982788340f4ab10d8a15ec [diff] [blame]
diff --git a/llvm/docs/AMDGPUModifierSyntax.rst b/llvm/docs/AMDGPUModifierSyntax.rst
new file mode 100644
index 0000000..bc2ddd0
--- /dev/null
+++ b/llvm/docs/AMDGPUModifierSyntax.rst

@@ -0,0 +1,1248 @@
+======================================
+Syntax of AMDGPU Instruction Modifiers
+======================================
+
+.. contents::
+   :local:
+
+Conventions
+===========
+
+The following notation is used throughout this document:
+
+    =================== =============================================================
+    Notation            Description
+    =================== =============================================================
+    {0..N}              Any integer value in the range from 0 to N (inclusive).
+    <x>                 Syntax and meaning of *x* is explained elsewhere.
+    =================== =============================================================
+
+.. _amdgpu_syn_modifiers:
+
+Modifiers
+=========
+
+DS Modifiers
+------------
+
+.. _amdgpu_synid_ds_offset8:
+
+ds_offset8
+~~~~~~~~~~
+
+Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
+
+Used with DS instructions which have 2 addresses.
+
+    =================== =====================================================
+    Syntax              Description
+    =================== =====================================================
+    offset:{0..0xFF}    Specifies an unsigned 8-bit offset as a positive
+                        :ref:`integer number <amdgpu_synid_integer_number>`.
+    =================== =====================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:255
+  offset:0xff
+
+.. _amdgpu_synid_ds_offset16:
+
+ds_offset16
+~~~~~~~~~~~
+
+Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
+
+Used with DS instructions which have 1 address.
+
+    ==================== ======================================================
+    Syntax               Description
+    ==================== ======================================================
+    offset:{0..0xFFFF}   Specifies an unsigned 16-bit offset as a positive
+                         :ref:`integer number <amdgpu_synid_integer_number>`.
+    ==================== ======================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:65535
+  offset:0xffff
+
+.. _amdgpu_synid_sw_offset16:
+
+sw_offset16
+~~~~~~~~~~~
+
+This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
+It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
+
+See AMD documentation for more information.
+
+    ======================================================= ===========================================================
+    Syntax                                                  Description
+    ======================================================= ===========================================================
+    offset:{0..0xFFFF}                                      Specifies a 16-bit swizzle pattern.
+    offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3})   Specifies a quad permute mode pattern
+
+                                                            Each number is a lane *id*.
+    offset:swizzle(BITMASK_PERM, "<mask>")                  Specifies a bitmask permute mode pattern.
+
+                                                            The pattern converts a 5-bit lane *id* to another
+                                                            lane *id* with which the lane interacts.
+
+                                                            *mask* is a 5 character sequence which
+                                                            specifies how to transform the bits of the
+                                                            lane *id*. 
+
+                                                            The following characters are allowed:
+
+                                                            * "0" - set bit to 0.
+
+                                                            * "1" - set bit to 1.
+
+                                                            * "p" - preserve bit.
+
+                                                            * "i" - inverse bit.
+
+    offset:swizzle(BROADCAST,{2..32},{0..N})                Specifies a broadcast mode.
+
+                                                            Broadcasts the value of any particular lane to
+                                                            all lanes in its group.
+
+                                                            The first numeric parameter is a group
+                                                            size and must be equal to 2, 4, 8, 16 or 32.
+
+                                                            The second numeric parameter is an index of the
+                                                            lane being broadcasted. 
+
+                                                            The index must not exceed group size.
+    offset:swizzle(SWAP,{1..16})                            Specifies a swap mode.
+
+                                                            Swaps the neighboring groups of
+                                                            1, 2, 4, 8 or 16 lanes.
+    offset:swizzle(REVERSE,{2..32})                         Specifies a reverse mode.
+
+                                                            Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
+    ======================================================= ===========================================================
+
+Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:255
+  offset:0xffff
+  offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
+  offset:swizzle(BITMASK_PERM, "01pi0")
+  offset:swizzle(BROADCAST, 2, 0)
+  offset:swizzle(SWAP, 8)
+  offset:swizzle(REVERSE, 30 + 2)
+
+.. _amdgpu_synid_gds:
+
+gds
+~~~
+
+Specifies whether to use GDS or LDS memory (LDS is the default).
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    gds                                      Use GDS memory.
+    ======================================== ================================================
+
+
+EXP Modifiers
+-------------
+
+.. _amdgpu_synid_done:
+
+done
+~~~~
+
+Specifies if this is the last export from the shader to the target. By default, current
+instruction does not finish an export sequence.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    done                                     Indicates the last export operation.
+    ======================================== ================================================
+
+.. _amdgpu_synid_compr:
+
+compr
+~~~~~
+
+Indicates if the data are compressed (data are not compressed by default).
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    compr                                    Data are compressed.
+    ======================================== ================================================
+
+.. _amdgpu_synid_vm:
+
+vm
+~~
+
+Specifies valid mask flag state (off by default).
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    vm                                       Set valid mask flag.
+    ======================================== ================================================
+
+FLAT Modifiers
+--------------
+
+.. _amdgpu_synid_flat_offset12:
+
+flat_offset12
+~~~~~~~~~~~~~
+
+Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+
+Cannot be used with *global/scratch* opcodes. GFX9 only.
+
+    ================= ======================================================
+    Syntax            Description
+    ================= ======================================================
+    offset:{0..4095}  Specifies a 12-bit unsigned offset as a positive
+                      :ref:`integer number <amdgpu_synid_integer_number>`.
+    ================= ======================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:4095
+  offset:0xff
+
+.. _amdgpu_synid_flat_offset13:
+
+flat_offset13
+~~~~~~~~~~~~~
+
+Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
+
+Can be used with *global/scratch* opcodes only. GFX9 only.
+
+    ============================ =======================================================
+    Syntax                       Description
+    ============================ =======================================================
+    offset:{-4096..+4095}        Specifies a 13-bit signed offset as an
+                                 :ref:`integer number <amdgpu_synid_integer_number>`.
+    ============================ =======================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:-4000
+  offset:0x10
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+nv
+~~
+
+See a description :ref:`here<amdgpu_synid_nv>`.
+
+MIMG Modifiers
+--------------
+
+.. _amdgpu_synid_dmask:
+
+dmask
+~~~~~
+
+Specifies which channels (image components) are used by the operation. By default, no channels
+are used.
+
+    =============== =====================================================
+    Syntax          Description
+    =============== =====================================================
+    dmask:{0..15}   Specifies image channels as a positive
+                    :ref:`integer number <amdgpu_synid_integer_number>`.
+
+                    Each bit corresponds to one of 4 image
+                    components (RGBA).
+
+                    If the specified bit value
+                    is 0, the component is not used, value 1 means
+                    that the component is used.
+    =============== =====================================================
+
+This modifier has some limitations depending on instruction kind:
+
+    =================================================== ========================
+    Instruction Kind                                    Valid dmask Values
+    =================================================== ========================
+    32-bit atomic *cmpswap*                             0x3
+    32-bit atomic instructions except for *cmpswap*     0x1
+    64-bit atomic *cmpswap*                             0xF
+    64-bit atomic instructions except for *cmpswap*     0x3
+    *gather4*                                           0x1, 0x2, 0x4, 0x8
+    Other instructions                                  any value
+    =================================================== ========================
+
+Examples:
+
+.. code-block:: nasm
+
+  dmask:0xf
+  dmask:0b1111
+  dmask:3
+
+.. _amdgpu_synid_unorm:
+
+unorm
+~~~~~
+
+Specifies whether the address is normalized or not (the address is normalized by default).
+
+    ======================== ========================================
+    Syntax                   Description
+    ======================== ========================================
+    unorm                    Force the address to be unnormalized.
+    ======================== ========================================
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+.. _amdgpu_synid_r128:
+
+r128
+~~~~
+
+Specifies texture resource size. The default size is 256 bits.
+
+GFX7 and GFX8 only.
+
+    =================== ================================================
+    Syntax              Description
+    =================== ================================================
+    r128                Specifies 128 bits texture resource size.
+    =================== ================================================
+
+.. WARNING:: Using this modifier should descrease *rsrc* register size from 8 to 4 dwords, but assembler does not currently support this feature.
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+.. _amdgpu_synid_lwe:
+
+lwe
+~~~
+
+Specifies LOD warning status (LOD warning is disabled by default).
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    lwe                                      Enables LOD warning.
+    ======================================== ================================================
+
+.. _amdgpu_synid_da:
+
+da
+~~
+
+Specifies if an array index must be sent to TA. By default, array index is not sent.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    da                                       Send an array-index to TA.
+    ======================================== ================================================
+
+.. _amdgpu_synid_d16:
+
+d16
+~~~
+
+Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    d16                                      Enables 16-bits data mode.
+
+                                             On loads, convert data in memory to 16-bit
+                                             format before storing it in VGPRs.
+
+                                             For stores, convert 16-bit data in VGPRs to
+                                             32 bits before going to memory.
+
+                                             Note that GFX8.0 does not support data packing.
+                                             Each 16-bit data element occupies 1 VGPR.
+
+                                             GFX8.1 and GFX9 support data packing.
+                                             Each pair of 16-bit data elements 
+                                             occupies 1 VGPR.
+    ======================================== ================================================
+
+.. _amdgpu_synid_a16:
+
+a16
+~~~
+
+Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    a16                                      Enables 16-bits image address components.
+    ======================================== ================================================
+
+Miscellaneous Modifiers
+-----------------------
+
+.. _amdgpu_synid_glc:
+
+glc
+~~~
+
+This modifier has different meaning for loads, stores, and atomic operations.
+The default value is off (0).
+
+See AMD documentation for details.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    glc                                      Set glc bit to 1.
+    ======================================== ================================================
+
+.. _amdgpu_synid_slc:
+
+slc
+~~~
+
+Specifies cache policy. The default value is off (0).
+
+See AMD documentation for details.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    slc                                      Set slc bit to 1.
+    ======================================== ================================================
+
+.. _amdgpu_synid_tfe:
+
+tfe
+~~~
+
+Controls access to partially resident textures. The default value is off (0).
+
+See AMD documentation for details.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    tfe                                      Set tfe bit to 1.
+    ======================================== ================================================
+
+.. _amdgpu_synid_nv:
+
+nv
+~~
+
+Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+
+GFX9 only.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    nv                                       Indicates that instruction operates on
+                                             non-volatile memory.
+    ======================================== ================================================
+
+MUBUF/MTBUF Modifiers
+---------------------
+
+.. _amdgpu_synid_idxen:
+
+idxen
+~~~~~
+
+Specifies whether address components include an index. By default, no components are used.
+
+Can be used together with :ref:`offen<amdgpu_synid_offen>`.
+
+Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    idxen                                    Address components include an index.
+    ======================================== ================================================
+
+.. _amdgpu_synid_offen:
+
+offen
+~~~~~
+
+Specifies whether address components include an offset. By default, no components are used.
+
+Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
+
+Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    offen                                    Address components include an offset.
+    ======================================== ================================================
+
+.. _amdgpu_synid_addr64:
+
+addr64
+~~~~~~
+
+Specifies whether a 64-bit address is used. By default, no address is used.
+
+GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
+:ref:`idxen<amdgpu_synid_idxen>` modifiers.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    addr64                                   A 64-bit address is used.
+    ======================================== ================================================
+
+.. _amdgpu_synid_buf_offset12:
+
+buf_offset12
+~~~~~~~~~~~~
+
+Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+
+    =============================== ======================================================
+    Syntax                          Description
+    =============================== ======================================================
+    offset:{0..0xFFF}               Specifies a 12-bit unsigned offset as a positive
+                                    :ref:`integer number <amdgpu_synid_integer_number>`.
+    =============================== ======================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  offset:0
+  offset:0x10
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+slc
+~~~
+
+See a description :ref:`here<amdgpu_synid_slc>`.
+
+.. _amdgpu_synid_lds:
+
+lds
+~~~
+
+Specifies where to store the result: VGPRs or LDS (VGPRs by default).
+
+    ======================================== ===========================
+    Syntax                                   Description
+    ======================================== ===========================
+    lds                                      Store result in LDS.
+    ======================================== ===========================
+
+tfe
+~~~
+
+See a description :ref:`here<amdgpu_synid_tfe>`.
+
+.. _amdgpu_synid_dfmt:
+
+dfmt
+~~~~
+
+TBD
+
+.. _amdgpu_synid_nfmt:
+
+nfmt
+~~~~
+
+TBD
+
+SMRD/SMEM Modifiers
+-------------------
+
+glc
+~~~
+
+See a description :ref:`here<amdgpu_synid_glc>`.
+
+nv
+~~
+
+See a description :ref:`here<amdgpu_synid_nv>`.
+
+VINTRP Modifiers
+----------------
+
+.. _amdgpu_synid_high:
+
+high
+~~~~
+
+Specifies which half of the LDS word to use. Low half of LDS word is used by default.
+GFX9 only.
+
+    ======================================== ================================
+    Syntax                                   Description
+    ======================================== ================================
+    high                                     Use high half of LDS word.
+    ======================================== ================================
+
+VOP1/VOP2 DPP Modifiers
+-----------------------
+
+GFX8 and GFX9 only.
+
+.. _amdgpu_synid_dpp_ctrl:
+
+dpp_ctrl
+~~~~~~~~
+
+Specifies how data are shared between threads. This is a mandatory modifier.
+There is no default value.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    quad_perm:[{0..3},{0..3},{0..3},{0..3}]  Full permute of 4 threads.
+    row_mirror                               Mirror threads within row.
+    row_half_mirror                          Mirror threads within 1/2 row (8 threads).
+    row_bcast:15                             Broadcast 15th thread of each row to next row.
+    row_bcast:31                             Broadcast thread 31 to rows 2 and 3.
+    wave_shl:1                               Wavefront left shift by 1 thread.
+    wave_rol:1                               Wavefront left rotate by 1 thread.
+    wave_shr:1                               Wavefront right shift by 1 thread.
+    wave_ror:1                               Wavefront right rotate by 1 thread.
+    row_shl:{1..15}                          Row shift left by 1-15 threads.
+    row_shr:{1..15}                          Row shift right by 1-15 threads.
+    row_ror:{1..15}                          Row rotate right by 1-15 threads.
+    ======================================== ================================================
+
+Note: Numeric parameters may be specified as either
+:ref:`integer numbers<amdgpu_synid_integer_number>` or
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+
+Examples:
+
+.. code-block:: nasm
+
+  quad_perm:[0, 1, 2, 3]
+  row_shl:3
+
+.. _amdgpu_synid_row_mask:
+
+row_mask
+~~~~~~~~
+
+Controls which rows are enabled for data sharing. By default, all rows are enabled.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+    ======================================== =====================================================
+    Syntax                                   Description
+    ======================================== =====================================================
+    row_mask:{0..15}                         Specifies a *row mask* as a positive
+                                             :ref:`integer number <amdgpu_synid_integer_number>`.
+
+                                             Each of 4 bits in the mask controls one
+                                             row (0 - disabled, 1 - enabled).
+    ======================================== =====================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  row_mask:0xf
+  row_mask:0b1010
+  row_mask:0b1111
+
+.. _amdgpu_synid_bank_mask:
+
+bank_mask
+~~~~~~~~~
+
+Controls which banks are enabled for data sharing. By default, all banks are enabled.
+
+Note. The lanes of a wavefront are organized in four banks and four rows.
+
+    ======================================== =======================================================
+    Syntax                                   Description
+    ======================================== =======================================================
+    bank_mask:{0..15}                        Specifies a *bank mask* as a positive
+                                             :ref:`integer number <amdgpu_synid_integer_number>`.
+
+                                             Each of 4 bits in the mask controls one
+                                             bank (0 - disabled, 1 - enabled).
+    ======================================== =======================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  bank_mask:0x3
+  bank_mask:0b0011
+  bank_mask:0b1111
+
+.. _amdgpu_synid_bound_ctrl:
+
+bound_ctrl
+~~~~~~~~~~
+
+Controls data sharing when accessing an invalid lane. By default, data sharing with
+invalid lanes is disabled.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    bound_ctrl:0                             Enables data sharing with invalid lanes.
+
+                                             Accessing data from an invalid lane will
+                                             return zero.
+    ======================================== ================================================
+
+VOP1/VOP2/VOPC SDWA Modifiers
+-----------------------------
+
+GFX8 and GFX9 only.
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
+
+omod
+~~~~
+
+See a description :ref:`here<amdgpu_synid_omod>`.
+
+GFX9 only.
+
+.. _amdgpu_synid_dst_sel:
+
+dst_sel
+~~~~~~~
+
+Selects which bits in the destination are affected. By default, all bits are affected.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    dst_sel:DWORD                            Use bits 31:0.
+    dst_sel:BYTE_0                           Use bits 7:0.
+    dst_sel:BYTE_1                           Use bits 15:8.
+    dst_sel:BYTE_2                           Use bits 23:16.
+    dst_sel:BYTE_3                           Use bits 31:24.
+    dst_sel:WORD_0                           Use bits 15:0.
+    dst_sel:WORD_1                           Use bits 31:16.
+    ======================================== ================================================
+
+
+.. _amdgpu_synid_dst_unused:
+
+dst_unused
+~~~~~~~~~~
+
+Controls what to do with the bits in the destination which are not selected
+by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
+By default, unused bits are preserved.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    dst_unused:UNUSED_PAD                    Pad with zeros.
+    dst_unused:UNUSED_SEXT                   Sign-extend upper bits, zero lower bits.
+    dst_unused:UNUSED_PRESERVE               Preserve bits.
+    ======================================== ================================================
+
+.. _amdgpu_synid_src0_sel:
+
+src0_sel
+~~~~~~~~
+
+Controls which bits in the src0 are used. By default, all bits are used.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    src0_sel:DWORD                           Use bits 31:0.
+    src0_sel:BYTE_0                          Use bits 7:0.
+    src0_sel:BYTE_1                          Use bits 15:8.
+    src0_sel:BYTE_2                          Use bits 23:16.
+    src0_sel:BYTE_3                          Use bits 31:24.
+    src0_sel:WORD_0                          Use bits 15:0.
+    src0_sel:WORD_1                          Use bits 31:16.
+    ======================================== ================================================
+
+.. _amdgpu_synid_src1_sel:
+
+src1_sel
+~~~~~~~~
+
+Controls which bits in the src1 are used. By default, all bits are used.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    src1_sel:DWORD                           Use bits 31:0.
+    src1_sel:BYTE_0                          Use bits 7:0.
+    src1_sel:BYTE_1                          Use bits 15:8.
+    src1_sel:BYTE_2                          Use bits 23:16.
+    src1_sel:BYTE_3                          Use bits 31:24.
+    src1_sel:WORD_0                          Use bits 15:0.
+    src1_sel:WORD_1                          Use bits 31:16.
+    ======================================== ================================================
+
+.. _amdgpu_synid_sdwa_operand_modifiers:
+
+VOP1/VOP2/VOPC SDWA Operand Modifiers
+-------------------------------------
+
+Operand modifiers are not used separately. They are applied to source operands.
+
+GFX8 and GFX9 only.
+
+abs
+~~~
+
+See a description :ref:`here<amdgpu_synid_abs>`.
+
+neg
+~~~
+
+See a description :ref:`here<amdgpu_synid_neg>`.
+
+.. _amdgpu_synid_sext:
+
+sext
+~~~~
+
+Sign-extends value of a (sub-dword) operand to fill all 32 bits.
+Has no effect for 32-bit operands.
+
+Valid for integer operands only.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    sext(<operand>)                          Sign-extend operand value.
+    ======================================== ================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  sext(v4)
+  sext(v255)
+
+VOP3 Modifiers
+--------------
+
+.. _amdgpu_synid_vop3_op_sel:
+
+vop3_op_sel
+~~~~~~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
+By default, low bits are used for all operands.
+
+The number of values specified with the op_sel modifier must match the number of instruction
+operands (both source and destination). First value controls src0, second value controls src1
+and so on, except that the last value controls destination.
+The value 0 selects the low bits, while 1 selects the high bits.
+
+Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
+by op_sel must be 0.
+
+GFX9 only.
+
+    ======================================== ============================================================
+    Syntax                                   Description
+    ======================================== ============================================================
+    op_sel:[{0..1},{0..1}]                   Select operand bits for instructions with 1 source operand.
+    op_sel:[{0..1},{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
+    op_sel:[{0..1},{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
+    ======================================== ============================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  op_sel:[0,0]
+  op_sel:[0,1]
+
+.. _amdgpu_synid_clamp:
+
+clamp
+~~~~~
+
+Clamp meaning depends on instruction.
+
+For *v_cmp* instructions, clamp modifier indicates that the compare signals
+if a floating point exception occurs. By default, signaling is disabled.
+Not supported by GFX7.
+
+For integer operations, clamp modifier indicates that the result must be clamped
+to the largest and smallest representable value. By default, there is no clamping.
+Integer clamping is not supported by GFX7.
+
+For floating point operations, clamp modifier indicates that the result must be clamped
+to the range [0.0, 1.0]. By default, there is no clamping.
+
+Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    clamp                                    Enables clamping (or signaling).
+    ======================================== ================================================
+
+.. _amdgpu_synid_omod:
+
+omod
+~~~~
+
+Specifies if an output modifier must be applied to the result.
+By default, no output modifiers are applied.
+
+Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
+
+Output modifiers are valid for f32 and f64 floating point results only.
+They must not be used with f16.
+
+Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
+but accepts output modifiers.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    mul:2                                    Multiply the result by 2.
+    mul:4                                    Multiply the result by 4.
+    div:2                                    Multiply the result by 0.5.
+    ======================================== ================================================
+
+.. _amdgpu_synid_vop3_operand_modifiers:
+
+VOP3 Operand Modifiers
+----------------------
+
+Operand modifiers are not used separately. They are applied to source operands.
+
+.. _amdgpu_synid_abs:
+
+abs
+~~~
+
+Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
+Valid for floating point operands only.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    abs(<operand>)                           Get absolute value of operand.
+    \|<operand>|                             The same as above.
+    ======================================== ================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  abs(v36)
+  |v36|
+
+.. _amdgpu_synid_neg:
+
+neg
+~~~
+
+Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
+Valid for floating point operands only.
+
+    ======================================== ================================================
+    Syntax                                   Description
+    ======================================== ================================================
+    neg(<operand>)                           Get negative value of operand.
+    -<operand>                               The same as above.
+    ======================================== ================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  neg(v[0])
+  -v4
+
+VOP3P Modifiers
+---------------
+
+This section describes modifiers of *regular* VOP3P instructions.
+
+*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
+instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
+
+GFX9 only.
+
+.. _amdgpu_synid_op_sel:
+
+op_sel
+~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits as input to the operation
+which results in the lower-half of the destination.
+By default, low bits are used for all operands.
+
+The number of values specified by the *op_sel* modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 selects the low bits, while 1 selects the high bits.
+
+    ================================= =============================================================
+    Syntax                            Description
+    ================================= =============================================================
+    op_sel:[{0..1}]                   Select operand bits for instructions with 1 source operand.
+    op_sel:[{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
+    op_sel:[{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
+    ================================= =============================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  op_sel:[0,0]
+  op_sel:[0,1,0]
+
+.. _amdgpu_synid_op_sel_hi:
+
+op_sel_hi
+~~~~~~~~~
+
+Selects the low [15:0] or high [31:16] operand bits as input to the operation
+which results in the upper-half of the destination.
+By default, high bits are used for all operands.
+
+The number of values specified by the *op_sel_hi* modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 selects the low bits, while 1 selects the high bits.
+
+    =================================== =============================================================
+    Syntax                              Description
+    =================================== =============================================================
+    op_sel_hi:[{0..1}]                  Select operand bits for instructions with 1 source operand.
+    op_sel_hi:[{0..1},{0..1}]           Select operand bits for instructions with 2 source operands.
+    op_sel_hi:[{0..1},{0..1},{0..1}]    Select operand bits for instructions with 3 source operands.
+    =================================== =============================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  op_sel_hi:[0,0]
+  op_sel_hi:[0,0,1]
+
+.. _amdgpu_synid_neg_lo:
+
+neg_lo
+~~~~~~
+
+Specifies whether to change sign of operand values selected by
+:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
+as input to the operation which results in the upper-half of the destination.
+
+The number of values specified by this modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates that the corresponding operand value is used unmodified,
+the value 1 indicates that negative value of the operand must be used.
+
+By default, operand values are used unmodified.
+
+This modifier is valid for floating point operands only.
+
+    ================================ ==================================================================
+    Syntax                           Description
+    ================================ ==================================================================
+    neg_lo:[{0..1}]                  Select affected operands for instructions with 1 source operand.
+    neg_lo:[{0..1},{0..1}]           Select affected operands for instructions with 2 source operands.
+    neg_lo:[{0..1},{0..1},{0..1}]    Select affected operands for instructions with 3 source operands.
+    ================================ ==================================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  neg_lo:[0]
+  neg_lo:[0,1]
+
+.. _amdgpu_synid_neg_hi:
+
+neg_hi
+~~~~~~
+
+Specifies whether to change sign of operand values selected by
+:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
+as input to the operation which results in the upper-half of the destination.
+
+The number of values specified by this modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates that the corresponding operand value is used unmodified,
+the value 1 indicates that negative value of the operand must be used.
+
+By default, operand values are used unmodified.
+
+This modifier is valid for floating point operands only.
+
+    =============================== ==================================================================
+    Syntax                          Description
+    =============================== ==================================================================
+    neg_hi:[{0..1}]                 Select affected operands for instructions with 1 source operand.
+    neg_hi:[{0..1},{0..1}]          Select affected operands for instructions with 2 source operands.
+    neg_hi:[{0..1},{0..1},{0..1}]   Select affected operands for instructions with 3 source operands.
+    =============================== ==================================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  neg_hi:[1,0]
+  neg_hi:[0,1,1]
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
+
+.. _amdgpu_synid_mad_mix:
+
+VOP3P V_MAD_MIX Modifiers
+-------------------------
+
+*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
+use *op_sel* and *op_sel_hi* modifiers 
+in a manner different from *regular* VOP3P instructions.
+
+See a description below.
+
+GFX9 only.
+
+.. _amdgpu_synid_mad_mix_op_sel:
+
+mad_mix_op_sel
+~~~~~~~~~~~~~~
+
+This operand has meaning only for 16-bit source operands as indicated by
+:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
+It specifies to select either the low [15:0] or high [31:16] operand bits
+as input to the operation.
+
+The number of values specified by the *op_sel* modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
+
+By default, low bits are used for all operands.
+
+    =============================== ================================================
+    Syntax                          Description
+    =============================== ================================================
+    op_sel:[{0..1},{0..1},{0..1}]   Select location of each 16-bit source operand.
+    =============================== ================================================
+
+Examples:
+
+.. code-block:: nasm
+
+  op_sel:[0,1]
+
+.. _amdgpu_synid_mad_mix_op_sel_hi:
+
+mad_mix_op_sel_hi
+~~~~~~~~~~~~~~~~~
+
+Selects the size of source operands: either 32 bits or 16 bits.
+By default, 32 bits are used for all source operands.
+
+The number of values specified by the *op_sel_hi* modifier must match the number of source
+operands. First value controls src0, second value controls src1 and so on.
+
+The value 0 indicates 32 bits, the value 1 indicates 16 bits.
+
+The location of 16 bits in the operand may be specified by
+:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`.
+
+    ======================================== ====================================
+    Syntax                                   Description
+    ======================================== ====================================
+    op_sel_hi:[{0..1},{0..1},{0..1}]         Select size of each source operand.
+    ======================================== ====================================
+
+Examples:
+
+.. code-block:: nasm
+
+  op_sel_hi:[1,1,1]
+
+abs
+~~~
+
+See a description :ref:`here<amdgpu_synid_abs>`.
+
+neg
+~~~
+
+See a description :ref:`here<amdgpu_synid_neg>`.
+
+clamp
+~~~~~
+
+See a description :ref:`here<amdgpu_synid_clamp>`.
commit	47eb63684d22c12b9defb99c394dbd10e26fefff	[log] [tgz]
author	Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>	Mon Dec 17 17:38:11 2018 +0000
committer	Dmitry Preobrazhensky <dmitry.preobrazhensky@amd.com>	Mon Dec 17 17:38:11 2018 +0000
tree	f497778b8180320c30d0dda8125e049f5a7d1916
parent	f700c8b25378acf912982788340f4ab10d8a15ec [diff] [blame]