blob: d5e227a2018c637b6c5bfd366f8926090f1018db [file] [log] [blame]
Alex Lorenz3d311772015-08-06 22:55:19 +00001========================================
2Machine IR (MIR) Format Reference Manual
3========================================
4
5.. contents::
6 :local:
7
8.. warning::
9 This is a work in progress.
10
11Introduction
12============
13
14This document is a reference manual for the Machine IR (MIR) serialization
15format. MIR is a human readable serialization format that is used to represent
16LLVM's :ref:`machine specific intermediate representation
17<machine code representation>`.
18
19The MIR serialization format is designed to be used for testing the code
20generation passes in LLVM.
21
22Overview
23========
24
25The MIR serialization format uses a YAML container. YAML is a standard
26data serialization language, and the full YAML language spec can be read at
27`yaml.org
28<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
29
30A MIR file is split up into a series of `YAML documents`_. The first document
31can contain an optional embedded LLVM IR module, and the rest of the documents
32contain the serialized machine functions.
33
34.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
35
Alex Lorenzea788c42015-08-21 22:58:33 +000036MIR Testing Guide
37=================
38
39You can use the MIR format for testing in two different ways:
40
41- You can write MIR tests that invoke a single code generation pass using the
Matthias Braune6185b72017-04-13 22:14:45 +000042 ``-run-pass`` option in llc.
Alex Lorenzea788c42015-08-21 22:58:33 +000043
Matthias Braune6185b72017-04-13 22:14:45 +000044- You can use llc's ``-stop-after`` option with existing or new LLVM assembly
Alex Lorenzea788c42015-08-21 22:58:33 +000045 tests and check the MIR output of a specific code generation pass.
46
47Testing Individual Code Generation Passes
48-----------------------------------------
49
Matthias Braune6185b72017-04-13 22:14:45 +000050The ``-run-pass`` option in llc allows you to create MIR tests that invoke just
51a single code generation pass. When this option is used, llc will parse an
52input MIR file, run the specified code generation pass(es), and output the
53resulting MIR code.
Alex Lorenzea788c42015-08-21 22:58:33 +000054
Matthias Braune6185b72017-04-13 22:14:45 +000055You can generate an input MIR file for the test by using the ``-stop-after`` or
56``-stop-before`` option in llc. For example, if you would like to write a test
57for the post register allocation pseudo instruction expansion pass, you can
58specify the machine copy propagation pass in the ``-stop-after`` option, as it
59runs just before the pass that we are trying to test:
Alex Lorenzea788c42015-08-21 22:58:33 +000060
Matthias Braune6185b72017-04-13 22:14:45 +000061 ``llc -stop-after=machine-cp bug-trigger.ll > test.mir``
Alex Lorenzea788c42015-08-21 22:58:33 +000062
63After generating the input MIR file, you'll have to add a run line that uses
64the ``-run-pass`` option to it. In order to test the post register allocation
65pseudo instruction expansion pass on X86-64, a run line like the one shown
66below can be used:
67
Matthias Braune6185b72017-04-13 22:14:45 +000068 ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s``
Alex Lorenzea788c42015-08-21 22:58:33 +000069
70The MIR files are target dependent, so they have to be placed in the target
Matthias Braune6185b72017-04-13 22:14:45 +000071specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
72specify a target triple or a target architecture either in the run line or in
73the embedded LLVM IR module.
Alex Lorenzea788c42015-08-21 22:58:33 +000074
Matthias Braun836c3832017-04-13 23:45:14 +000075Simplifying MIR files
76^^^^^^^^^^^^^^^^^^^^^
77
78The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose;
79Tests are more accessible and future proof when simplified:
80
81- Machine function attributes often have default values or the test works just
82 as well with default values. Typical candidates for this are: `alignment:`,
83 `exposesReturnsTwice`, `legalized`, `regBankSelected`, `selected`.
84 The whole `frameInfo` section is often unnecessary if there is no special
85 frame usage in the function. `tracksRegLiveness` on the other hand is often
86 necessary for some passes that care about block livein lists.
87
88- The (global) `liveins:` list is typically only interesting for early
89 instruction selection passes and can be removed when testing later passes.
90 The per-block `liveins:` on the other hand are necessary if
91 `tracksRegLiveness` is true.
92
93- Branch probability data in block `successors:` lists can be dropped if the
94 test doesn't depend on it. Example:
95 `successors: %bb.1(0x40000000), %bb.2(0x40000000)` can be replaced with
96 `successors: %bb.1, %bb.2`.
97
98- MIR code contains a whole IR module. This is necessary because there are
99 no equivalents in MIR for global variables, references to external functions,
100 function attributes, metadata, debug info. Instead some MIR data references
101 the IR constructs. You can often remove them if the test doesn't depend on
102 them.
103
104- Alias Analysis is performed on IR values. These are referenced by memory
105 operands in MIR. Example: `:: (load 8 from %ir.foobar, !alias.scope !9)`.
106 If the test doesn't depend on (good) alias analysis the references can be
107 dropped: `:: (load 8)`
108
109- MIR blocks can reference IR blocks for debug printing, profile information
110 or debug locations. Example: `bb.42.myblock` in MIR references the IR block
111 `myblock`. It is usually possible to drop the `.myblock` reference and simply
112 use `bb.42`.
113
114- If there are no memory operands or blocks referencing the IR then the
115 IR function can be replaced by a parameterless dummy function like
116 `define @func() { ret void }`.
117
118- It is possible to drop the whole IR section of the MIR file if it only
119 contains dummy functions (see above). The .mir loader will create the
120 IR functions automatically in this case.
121
Alex Lorenzea788c42015-08-21 22:58:33 +0000122Limitations
123-----------
124
125Currently the MIR format has several limitations in terms of which state it
126can serialize:
127
128- The target-specific state in the target-specific ``MachineFunctionInfo``
129 subclasses isn't serialized at the moment.
130
131- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
132 SystemZ backends) aren't serialized at the moment.
133
134- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
135
136- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
137 instructions and the variable debug information from MMI is serialized right
138 now.
139
140These limitations impose restrictions on what you can test with the MIR format.
141For now, tests that would like to test some behaviour that depends on the state
142of certain ``MCSymbol`` operands or the exception handling state in MMI, can't
143use the MIR format. As well as that, tests that test some behaviour that
144depends on the state of the target specific ``MachineFunctionInfo`` or
145``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
146
Alex Lorenz3d311772015-08-06 22:55:19 +0000147High Level Structure
148====================
149
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000150.. _embedded-module:
151
Alex Lorenz3d311772015-08-06 22:55:19 +0000152Embedded Module
153---------------
154
155When the first YAML document contains a `YAML block literal string`_, the MIR
156parser will treat this string as an LLVM assembly language string that
157represents an embedded LLVM IR module.
158Here is an example of a YAML document that contains an LLVM module:
159
160.. code-block:: llvm
161
Alex Lorenz3d311772015-08-06 22:55:19 +0000162 define i32 @inc(i32* %x) {
163 entry:
164 %0 = load i32, i32* %x
165 %1 = add i32 %0, 1
166 store i32 %1, i32* %x
167 ret i32 %1
168 }
Alex Lorenz3d311772015-08-06 22:55:19 +0000169
170.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
171
172Machine Functions
173-----------------
174
175The remaining YAML documents contain the machine functions. This is an example
176of such YAML document:
177
Renato Golin124f2592016-07-20 12:16:38 +0000178.. code-block:: text
Alex Lorenz3d311772015-08-06 22:55:19 +0000179
180 ---
181 name: inc
182 tracksRegLiveness: true
183 liveins:
184 - { reg: '%rdi' }
Alex Lorenz98461672015-08-14 00:36:10 +0000185 body: |
186 bb.0.entry:
187 liveins: %rdi
188
189 %eax = MOV32rm %rdi, 1, _, 0, _
190 %eax = INC32r killed %eax, implicit-def dead %eflags
191 MOV32mr killed %rdi, 1, _, 0, _, %eax
192 RETQ %eax
Alex Lorenz3d311772015-08-06 22:55:19 +0000193 ...
194
195The document above consists of attributes that represent the various
196properties and data structures in a machine function.
197
198The attribute ``name`` is required, and its value should be identical to the
199name of a function that this machine function is based on.
200
Alex Lorenz98461672015-08-14 00:36:10 +0000201The attribute ``body`` is a `YAML block literal string`_. Its value represents
202the function's machine basic blocks and their machine instructions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000203
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000204Machine Instructions Format Reference
205=====================================
206
207The machine basic blocks and their instructions are represented using a custom,
208human readable serialization language. This language is used in the
209`YAML block literal string`_ that corresponds to the machine function's body.
210
211A source string that uses this language contains a list of machine basic
212blocks, which are described in the section below.
213
214Machine Basic Blocks
215--------------------
216
217A machine basic block is defined in a single block definition source construct
218that contains the block's ID.
219The example below defines two blocks that have an ID of zero and one:
220
Renato Golin124f2592016-07-20 12:16:38 +0000221.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000222
223 bb.0:
224 <instructions>
225 bb.1:
226 <instructions>
227
228A machine basic block can also have a name. It should be specified after the ID
229in the block's definition:
230
Renato Golin124f2592016-07-20 12:16:38 +0000231.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000232
233 bb.0.entry: ; This block's name is "entry"
234 <instructions>
235
236The block's name should be identical to the name of the IR block that this
237machine block is based on.
238
239Block References
240^^^^^^^^^^^^^^^^
241
242The machine basic blocks are identified by their ID numbers. Individual
243blocks are referenced using the following syntax:
244
Renato Golin124f2592016-07-20 12:16:38 +0000245.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000246
247 %bb.<id>[.<name>]
248
249Examples:
250
251.. code-block:: llvm
252
253 %bb.0
254 %bb.1.then
255
256Successors
257^^^^^^^^^^
258
259The machine basic block's successors have to be specified before any of the
260instructions:
261
Renato Golin124f2592016-07-20 12:16:38 +0000262.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000263
264 bb.0.entry:
265 successors: %bb.1.then, %bb.2.else
266 <instructions>
267 bb.1.then:
268 <instructions>
269 bb.2.else:
270 <instructions>
271
272The branch weights can be specified in brackets after the successor blocks.
273The example below defines a block that has two successors with branch weights
274of 32 and 16:
275
Renato Golin124f2592016-07-20 12:16:38 +0000276.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000277
278 bb.0.entry:
279 successors: %bb.1.then(32), %bb.2.else(16)
280
Alex Lorenzb981d372015-08-21 21:17:01 +0000281.. _bb-liveins:
282
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000283Live In Registers
284^^^^^^^^^^^^^^^^^
285
286The machine basic block's live in registers have to be specified before any of
287the instructions:
288
Renato Golin124f2592016-07-20 12:16:38 +0000289.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000290
291 bb.0.entry:
292 liveins: %edi, %esi
293
294The list of live in registers and successors can be empty. The language also
295allows multiple live in register and successor lists - they are combined into
296one list by the parser.
297
298Miscellaneous Attributes
299^^^^^^^^^^^^^^^^^^^^^^^^
300
301The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
302specified in brackets after the block's definition:
303
Renato Golin124f2592016-07-20 12:16:38 +0000304.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000305
306 bb.0.entry (address-taken):
307 <instructions>
308 bb.2.else (align 4):
309 <instructions>
310 bb.3(landing-pad, align 4):
311 <instructions>
312
313.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
314 preserved.
315
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000316Machine Instructions
317--------------------
318
Alex Lorenzb981d372015-08-21 21:17:01 +0000319A machine instruction is composed of a name,
320:ref:`machine operands <machine-operands>`,
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000321:ref:`instruction flags <instruction-flags>`, and machine memory operands.
322
323The instruction's name is usually specified before the operands. The example
324below shows an instance of the X86 ``RETQ`` instruction with a single machine
325operand:
326
Renato Golin124f2592016-07-20 12:16:38 +0000327.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000328
329 RETQ %eax
330
331However, if the machine instruction has one or more explicitly defined register
332operands, the instruction's name has to be specified after them. The example
333below shows an instance of the AArch64 ``LDPXpost`` instruction with three
334defined register operands:
335
Renato Golin124f2592016-07-20 12:16:38 +0000336.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000337
338 %sp, %fp, %lr = LDPXpost %sp, 2
339
340The instruction names are serialized using the exact definitions from the
341target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
342similar instruction names like ``TSTri`` and ``tSTRi`` represent different
343machine instructions.
344
345.. _instruction-flags:
346
347Instruction Flags
348^^^^^^^^^^^^^^^^^
349
350The flag ``frame-setup`` can be specified before the instruction's name:
351
Renato Golin124f2592016-07-20 12:16:38 +0000352.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000353
354 %fp = frame-setup ADDXri %sp, 0, 0
355
Alex Lorenzb981d372015-08-21 21:17:01 +0000356.. _registers:
357
358Registers
359---------
360
361Registers are one of the key primitives in the machine instructions
362serialization language. They are primarly used in the
363:ref:`register machine operands <register-operands>`,
364but they can also be used in a number of other places, like the
365:ref:`basic block's live in list <bb-liveins>`.
366
367The physical registers are identified by their name. They use the following
368syntax:
369
Renato Golin124f2592016-07-20 12:16:38 +0000370.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000371
372 %<name>
373
374The example below shows three X86 physical registers:
375
Renato Golin124f2592016-07-20 12:16:38 +0000376.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000377
378 %eax
379 %r15
380 %eflags
381
382The virtual registers are identified by their ID number. They use the following
383syntax:
384
Renato Golin124f2592016-07-20 12:16:38 +0000385.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000386
387 %<id>
388
389Example:
390
Renato Golin124f2592016-07-20 12:16:38 +0000391.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000392
393 %0
394
395The null registers are represented using an underscore ('``_``'). They can also be
396represented using a '``%noreg``' named register, although the former syntax
397is preferred.
398
399.. _machine-operands:
400
401Machine Operands
402----------------
403
404There are seventeen different kinds of machine operands, and all of them, except
405the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
406just printed out - they can't be parsed back yet.
407
408Immediate Operands
409^^^^^^^^^^^^^^^^^^
410
411The immediate machine operands are untyped, 64-bit signed integers. The
412example below shows an instance of the X86 ``MOV32ri`` instruction that has an
413immediate machine operand ``-42``:
414
Renato Golin124f2592016-07-20 12:16:38 +0000415.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000416
417 %eax = MOV32ri -42
418
419.. TODO: Describe the CIMM (Rare) and FPIMM immediate operands.
420
421.. _register-operands:
422
423Register Operands
424^^^^^^^^^^^^^^^^^
425
426The :ref:`register <registers>` primitive is used to represent the register
427machine operands. The register operands can also have optional
428:ref:`register flags <register-flags>`,
Alex Lorenz37e02622015-09-08 11:39:47 +0000429:ref:`a subregister index <subregister-indices>`,
430and a reference to the tied register operand.
Alex Lorenzb981d372015-08-21 21:17:01 +0000431The full syntax of a register operand is shown below:
432
Renato Golin124f2592016-07-20 12:16:38 +0000433.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000434
435 [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
436
437This example shows an instance of the X86 ``XOR32rr`` instruction that has
4385 register operands with different register flags:
439
Renato Golin124f2592016-07-20 12:16:38 +0000440.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000441
442 dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
443
444.. _register-flags:
445
446Register Flags
447~~~~~~~~~~~~~~
448
449The table below shows all of the possible register flags along with the
450corresponding internal ``llvm::RegState`` representation:
451
452.. list-table::
453 :header-rows: 1
454
455 * - Flag
456 - Internal Value
457
458 * - ``implicit``
459 - ``RegState::Implicit``
460
461 * - ``implicit-def``
462 - ``RegState::ImplicitDefine``
463
464 * - ``def``
465 - ``RegState::Define``
466
467 * - ``dead``
468 - ``RegState::Dead``
469
470 * - ``killed``
471 - ``RegState::Kill``
472
473 * - ``undef``
474 - ``RegState::Undef``
475
476 * - ``internal``
477 - ``RegState::InternalRead``
478
479 * - ``early-clobber``
480 - ``RegState::EarlyClobber``
481
482 * - ``debug-use``
483 - ``RegState::Debug``
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000484
Alex Lorenz37e02622015-09-08 11:39:47 +0000485.. _subregister-indices:
486
487Subregister Indices
488~~~~~~~~~~~~~~~~~~~
489
490The register machine operands can reference a portion of a register by using
491the subregister indices. The example below shows an instance of the ``COPY``
492pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
493lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
494
Renato Golin124f2592016-07-20 12:16:38 +0000495.. code-block:: text
Alex Lorenz37e02622015-09-08 11:39:47 +0000496
497 %1 = COPY %0:sub_8bit
498
499The names of the subregister indices are target specific, and are typically
500defined in the target's ``*RegisterInfo.td`` file.
501
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000502Global Value Operands
503^^^^^^^^^^^^^^^^^^^^^
504
505The global value machine operands reference the global values from the
506:ref:`embedded LLVM IR module <embedded-module>`.
507The example below shows an instance of the X86 ``MOV64rm`` instruction that has
508a global value operand named ``G``:
509
Renato Golin124f2592016-07-20 12:16:38 +0000510.. code-block:: text
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000511
512 %rax = MOV64rm %rip, 1, _, @G, _
513
514The named global values are represented using an identifier with the '@' prefix.
515If the identifier doesn't match the regular expression
516`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
517
518The unnamed global values are represented using an unsigned numeric value with
519the '@' prefix, like in the following examples: ``@0``, ``@989``.
520
Alex Lorenz3d311772015-08-06 22:55:19 +0000521.. TODO: Describe the parsers default behaviour when optional YAML attributes
522 are missing.
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000523.. TODO: Describe the syntax for the bundled instructions.
Alex Lorenzb981d372015-08-21 21:17:01 +0000524.. TODO: Describe the syntax for virtual register YAML definitions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000525.. TODO: Describe the machine function's YAML flag attributes.
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000526.. TODO: Describe the syntax for the external symbol and register
Alex Lorenz3d311772015-08-06 22:55:19 +0000527 mask machine operands.
528.. TODO: Describe the frame information YAML mapping.
529.. TODO: Describe the syntax of the stack object machine operands and their
530 YAML definitions.
531.. TODO: Describe the syntax of the constant pool machine operands and their
532 YAML definitions.
533.. TODO: Describe the syntax of the jump table machine operands and their
534 YAML definitions.
535.. TODO: Describe the syntax of the block address machine operands.
536.. TODO: Describe the syntax of the CFI index machine operands.
537.. TODO: Describe the syntax of the metadata machine operands, and the
538 instructions debug location attribute.
539.. TODO: Describe the syntax of the target index machine operands.
540.. TODO: Describe the syntax of the register live out machine operands.
541.. TODO: Describe the syntax of the machine memory operands.