blob: aa4655c4366f2ddf22c238abec37c4d2e45db115 [file] [log] [blame]
Alex Lorenz3d311772015-08-06 22:55:19 +00001========================================
2Machine IR (MIR) Format Reference Manual
3========================================
4
5.. contents::
6 :local:
7
8.. warning::
9 This is a work in progress.
10
11Introduction
12============
13
14This document is a reference manual for the Machine IR (MIR) serialization
15format. MIR is a human readable serialization format that is used to represent
16LLVM's :ref:`machine specific intermediate representation
17<machine code representation>`.
18
19The MIR serialization format is designed to be used for testing the code
20generation passes in LLVM.
21
22Overview
23========
24
25The MIR serialization format uses a YAML container. YAML is a standard
26data serialization language, and the full YAML language spec can be read at
27`yaml.org
28<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
29
30A MIR file is split up into a series of `YAML documents`_. The first document
31can contain an optional embedded LLVM IR module, and the rest of the documents
32contain the serialized machine functions.
33
34.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
35
Alex Lorenzea788c42015-08-21 22:58:33 +000036MIR Testing Guide
37=================
38
39You can use the MIR format for testing in two different ways:
40
41- You can write MIR tests that invoke a single code generation pass using the
Matthias Braune6185b72017-04-13 22:14:45 +000042 ``-run-pass`` option in llc.
Alex Lorenzea788c42015-08-21 22:58:33 +000043
Matthias Braune6185b72017-04-13 22:14:45 +000044- You can use llc's ``-stop-after`` option with existing or new LLVM assembly
Alex Lorenzea788c42015-08-21 22:58:33 +000045 tests and check the MIR output of a specific code generation pass.
46
47Testing Individual Code Generation Passes
48-----------------------------------------
49
Matthias Braune6185b72017-04-13 22:14:45 +000050The ``-run-pass`` option in llc allows you to create MIR tests that invoke just
51a single code generation pass. When this option is used, llc will parse an
52input MIR file, run the specified code generation pass(es), and output the
53resulting MIR code.
Alex Lorenzea788c42015-08-21 22:58:33 +000054
Matthias Braune6185b72017-04-13 22:14:45 +000055You can generate an input MIR file for the test by using the ``-stop-after`` or
56``-stop-before`` option in llc. For example, if you would like to write a test
57for the post register allocation pseudo instruction expansion pass, you can
58specify the machine copy propagation pass in the ``-stop-after`` option, as it
59runs just before the pass that we are trying to test:
Alex Lorenzea788c42015-08-21 22:58:33 +000060
Matthias Braune6185b72017-04-13 22:14:45 +000061 ``llc -stop-after=machine-cp bug-trigger.ll > test.mir``
Alex Lorenzea788c42015-08-21 22:58:33 +000062
63After generating the input MIR file, you'll have to add a run line that uses
64the ``-run-pass`` option to it. In order to test the post register allocation
65pseudo instruction expansion pass on X86-64, a run line like the one shown
66below can be used:
67
Matthias Braune6185b72017-04-13 22:14:45 +000068 ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s``
Alex Lorenzea788c42015-08-21 22:58:33 +000069
70The MIR files are target dependent, so they have to be placed in the target
Matthias Braune6185b72017-04-13 22:14:45 +000071specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
72specify a target triple or a target architecture either in the run line or in
73the embedded LLVM IR module.
Alex Lorenzea788c42015-08-21 22:58:33 +000074
75Limitations
76-----------
77
78Currently the MIR format has several limitations in terms of which state it
79can serialize:
80
81- The target-specific state in the target-specific ``MachineFunctionInfo``
82 subclasses isn't serialized at the moment.
83
84- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
85 SystemZ backends) aren't serialized at the moment.
86
87- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
88
89- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
90 instructions and the variable debug information from MMI is serialized right
91 now.
92
93These limitations impose restrictions on what you can test with the MIR format.
94For now, tests that would like to test some behaviour that depends on the state
95of certain ``MCSymbol`` operands or the exception handling state in MMI, can't
96use the MIR format. As well as that, tests that test some behaviour that
97depends on the state of the target specific ``MachineFunctionInfo`` or
98``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
99
Alex Lorenz3d311772015-08-06 22:55:19 +0000100High Level Structure
101====================
102
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000103.. _embedded-module:
104
Alex Lorenz3d311772015-08-06 22:55:19 +0000105Embedded Module
106---------------
107
108When the first YAML document contains a `YAML block literal string`_, the MIR
109parser will treat this string as an LLVM assembly language string that
110represents an embedded LLVM IR module.
111Here is an example of a YAML document that contains an LLVM module:
112
113.. code-block:: llvm
114
Alex Lorenz3d311772015-08-06 22:55:19 +0000115 define i32 @inc(i32* %x) {
116 entry:
117 %0 = load i32, i32* %x
118 %1 = add i32 %0, 1
119 store i32 %1, i32* %x
120 ret i32 %1
121 }
Alex Lorenz3d311772015-08-06 22:55:19 +0000122
123.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
124
125Machine Functions
126-----------------
127
128The remaining YAML documents contain the machine functions. This is an example
129of such YAML document:
130
Renato Golin124f2592016-07-20 12:16:38 +0000131.. code-block:: text
Alex Lorenz3d311772015-08-06 22:55:19 +0000132
133 ---
134 name: inc
135 tracksRegLiveness: true
136 liveins:
137 - { reg: '%rdi' }
Alex Lorenz98461672015-08-14 00:36:10 +0000138 body: |
139 bb.0.entry:
140 liveins: %rdi
141
142 %eax = MOV32rm %rdi, 1, _, 0, _
143 %eax = INC32r killed %eax, implicit-def dead %eflags
144 MOV32mr killed %rdi, 1, _, 0, _, %eax
145 RETQ %eax
Alex Lorenz3d311772015-08-06 22:55:19 +0000146 ...
147
148The document above consists of attributes that represent the various
149properties and data structures in a machine function.
150
151The attribute ``name`` is required, and its value should be identical to the
152name of a function that this machine function is based on.
153
Alex Lorenz98461672015-08-14 00:36:10 +0000154The attribute ``body`` is a `YAML block literal string`_. Its value represents
155the function's machine basic blocks and their machine instructions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000156
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000157Machine Instructions Format Reference
158=====================================
159
160The machine basic blocks and their instructions are represented using a custom,
161human readable serialization language. This language is used in the
162`YAML block literal string`_ that corresponds to the machine function's body.
163
164A source string that uses this language contains a list of machine basic
165blocks, which are described in the section below.
166
167Machine Basic Blocks
168--------------------
169
170A machine basic block is defined in a single block definition source construct
171that contains the block's ID.
172The example below defines two blocks that have an ID of zero and one:
173
Renato Golin124f2592016-07-20 12:16:38 +0000174.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000175
176 bb.0:
177 <instructions>
178 bb.1:
179 <instructions>
180
181A machine basic block can also have a name. It should be specified after the ID
182in the block's definition:
183
Renato Golin124f2592016-07-20 12:16:38 +0000184.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000185
186 bb.0.entry: ; This block's name is "entry"
187 <instructions>
188
189The block's name should be identical to the name of the IR block that this
190machine block is based on.
191
192Block References
193^^^^^^^^^^^^^^^^
194
195The machine basic blocks are identified by their ID numbers. Individual
196blocks are referenced using the following syntax:
197
Renato Golin124f2592016-07-20 12:16:38 +0000198.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000199
200 %bb.<id>[.<name>]
201
202Examples:
203
204.. code-block:: llvm
205
206 %bb.0
207 %bb.1.then
208
209Successors
210^^^^^^^^^^
211
212The machine basic block's successors have to be specified before any of the
213instructions:
214
Renato Golin124f2592016-07-20 12:16:38 +0000215.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000216
217 bb.0.entry:
218 successors: %bb.1.then, %bb.2.else
219 <instructions>
220 bb.1.then:
221 <instructions>
222 bb.2.else:
223 <instructions>
224
225The branch weights can be specified in brackets after the successor blocks.
226The example below defines a block that has two successors with branch weights
227of 32 and 16:
228
Renato Golin124f2592016-07-20 12:16:38 +0000229.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000230
231 bb.0.entry:
232 successors: %bb.1.then(32), %bb.2.else(16)
233
Alex Lorenzb981d372015-08-21 21:17:01 +0000234.. _bb-liveins:
235
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000236Live In Registers
237^^^^^^^^^^^^^^^^^
238
239The machine basic block's live in registers have to be specified before any of
240the instructions:
241
Renato Golin124f2592016-07-20 12:16:38 +0000242.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000243
244 bb.0.entry:
245 liveins: %edi, %esi
246
247The list of live in registers and successors can be empty. The language also
248allows multiple live in register and successor lists - they are combined into
249one list by the parser.
250
251Miscellaneous Attributes
252^^^^^^^^^^^^^^^^^^^^^^^^
253
254The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
255specified in brackets after the block's definition:
256
Renato Golin124f2592016-07-20 12:16:38 +0000257.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000258
259 bb.0.entry (address-taken):
260 <instructions>
261 bb.2.else (align 4):
262 <instructions>
263 bb.3(landing-pad, align 4):
264 <instructions>
265
266.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
267 preserved.
268
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000269Machine Instructions
270--------------------
271
Alex Lorenzb981d372015-08-21 21:17:01 +0000272A machine instruction is composed of a name,
273:ref:`machine operands <machine-operands>`,
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000274:ref:`instruction flags <instruction-flags>`, and machine memory operands.
275
276The instruction's name is usually specified before the operands. The example
277below shows an instance of the X86 ``RETQ`` instruction with a single machine
278operand:
279
Renato Golin124f2592016-07-20 12:16:38 +0000280.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000281
282 RETQ %eax
283
284However, if the machine instruction has one or more explicitly defined register
285operands, the instruction's name has to be specified after them. The example
286below shows an instance of the AArch64 ``LDPXpost`` instruction with three
287defined register operands:
288
Renato Golin124f2592016-07-20 12:16:38 +0000289.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000290
291 %sp, %fp, %lr = LDPXpost %sp, 2
292
293The instruction names are serialized using the exact definitions from the
294target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
295similar instruction names like ``TSTri`` and ``tSTRi`` represent different
296machine instructions.
297
298.. _instruction-flags:
299
300Instruction Flags
301^^^^^^^^^^^^^^^^^
302
303The flag ``frame-setup`` can be specified before the instruction's name:
304
Renato Golin124f2592016-07-20 12:16:38 +0000305.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000306
307 %fp = frame-setup ADDXri %sp, 0, 0
308
Alex Lorenzb981d372015-08-21 21:17:01 +0000309.. _registers:
310
311Registers
312---------
313
314Registers are one of the key primitives in the machine instructions
315serialization language. They are primarly used in the
316:ref:`register machine operands <register-operands>`,
317but they can also be used in a number of other places, like the
318:ref:`basic block's live in list <bb-liveins>`.
319
320The physical registers are identified by their name. They use the following
321syntax:
322
Renato Golin124f2592016-07-20 12:16:38 +0000323.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000324
325 %<name>
326
327The example below shows three X86 physical registers:
328
Renato Golin124f2592016-07-20 12:16:38 +0000329.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000330
331 %eax
332 %r15
333 %eflags
334
335The virtual registers are identified by their ID number. They use the following
336syntax:
337
Renato Golin124f2592016-07-20 12:16:38 +0000338.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000339
340 %<id>
341
342Example:
343
Renato Golin124f2592016-07-20 12:16:38 +0000344.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000345
346 %0
347
348The null registers are represented using an underscore ('``_``'). They can also be
349represented using a '``%noreg``' named register, although the former syntax
350is preferred.
351
352.. _machine-operands:
353
354Machine Operands
355----------------
356
357There are seventeen different kinds of machine operands, and all of them, except
358the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
359just printed out - they can't be parsed back yet.
360
361Immediate Operands
362^^^^^^^^^^^^^^^^^^
363
364The immediate machine operands are untyped, 64-bit signed integers. The
365example below shows an instance of the X86 ``MOV32ri`` instruction that has an
366immediate machine operand ``-42``:
367
Renato Golin124f2592016-07-20 12:16:38 +0000368.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000369
370 %eax = MOV32ri -42
371
372.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands.
373
374.. _register-operands:
375
376Register Operands
377^^^^^^^^^^^^^^^^^
378
379The :ref:`register <registers>` primitive is used to represent the register
380machine operands. The register operands can also have optional
381:ref:`register flags <register-flags>`,
Alex Lorenz37e02622015-09-08 11:39:47 +0000382:ref:`a subregister index <subregister-indices>`,
383and a reference to the tied register operand.
Alex Lorenzb981d372015-08-21 21:17:01 +0000384The full syntax of a register operand is shown below:
385
Renato Golin124f2592016-07-20 12:16:38 +0000386.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000387
388 [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
389
390This example shows an instance of the X86 ``XOR32rr`` instruction that has
3915 register operands with different register flags:
392
Renato Golin124f2592016-07-20 12:16:38 +0000393.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000394
395 dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
396
397.. _register-flags:
398
399Register Flags
400~~~~~~~~~~~~~~
401
402The table below shows all of the possible register flags along with the
403corresponding internal ``llvm::RegState`` representation:
404
405.. list-table::
406 :header-rows: 1
407
408 * - Flag
409 - Internal Value
410
411 * - ``implicit``
412 - ``RegState::Implicit``
413
414 * - ``implicit-def``
415 - ``RegState::ImplicitDefine``
416
417 * - ``def``
418 - ``RegState::Define``
419
420 * - ``dead``
421 - ``RegState::Dead``
422
423 * - ``killed``
424 - ``RegState::Kill``
425
426 * - ``undef``
427 - ``RegState::Undef``
428
429 * - ``internal``
430 - ``RegState::InternalRead``
431
432 * - ``early-clobber``
433 - ``RegState::EarlyClobber``
434
435 * - ``debug-use``
436 - ``RegState::Debug``
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000437
Alex Lorenz37e02622015-09-08 11:39:47 +0000438.. _subregister-indices:
439
440Subregister Indices
441~~~~~~~~~~~~~~~~~~~
442
443The register machine operands can reference a portion of a register by using
444the subregister indices. The example below shows an instance of the ``COPY``
445pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
446lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
447
Renato Golin124f2592016-07-20 12:16:38 +0000448.. code-block:: text
Alex Lorenz37e02622015-09-08 11:39:47 +0000449
450 %1 = COPY %0:sub_8bit
451
452The names of the subregister indices are target specific, and are typically
453defined in the target's ``*RegisterInfo.td`` file.
454
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000455Global Value Operands
456^^^^^^^^^^^^^^^^^^^^^
457
458The global value machine operands reference the global values from the
459:ref:`embedded LLVM IR module <embedded-module>`.
460The example below shows an instance of the X86 ``MOV64rm`` instruction that has
461a global value operand named ``G``:
462
Renato Golin124f2592016-07-20 12:16:38 +0000463.. code-block:: text
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000464
465 %rax = MOV64rm %rip, 1, _, @G, _
466
467The named global values are represented using an identifier with the '@' prefix.
468If the identifier doesn't match the regular expression
469`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
470
471The unnamed global values are represented using an unsigned numeric value with
472the '@' prefix, like in the following examples: ``@0``, ``@989``.
473
Alex Lorenz3d311772015-08-06 22:55:19 +0000474.. TODO: Describe the parsers default behaviour when optional YAML attributes
475 are missing.
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000476.. TODO: Describe the syntax for the bundled instructions.
Alex Lorenzb981d372015-08-21 21:17:01 +0000477.. TODO: Describe the syntax for virtual register YAML definitions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000478.. TODO: Describe the machine function's YAML flag attributes.
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000479.. TODO: Describe the syntax for the external symbol and register
Alex Lorenz3d311772015-08-06 22:55:19 +0000480 mask machine operands.
481.. TODO: Describe the frame information YAML mapping.
482.. TODO: Describe the syntax of the stack object machine operands and their
483 YAML definitions.
484.. TODO: Describe the syntax of the constant pool machine operands and their
485 YAML definitions.
486.. TODO: Describe the syntax of the jump table machine operands and their
487 YAML definitions.
488.. TODO: Describe the syntax of the block address machine operands.
489.. TODO: Describe the syntax of the CFI index machine operands.
490.. TODO: Describe the syntax of the metadata machine operands, and the
491 instructions debug location attribute.
492.. TODO: Describe the syntax of the target index machine operands.
493.. TODO: Describe the syntax of the register live out machine operands.
494.. TODO: Describe the syntax of the machine memory operands.