blob: e217b792e62e9aaf50ff089ece34677b137d18f7 [file] [log] [blame]
Alex Lorenz3d311772015-08-06 22:55:19 +00001========================================
2Machine IR (MIR) Format Reference Manual
3========================================
4
5.. contents::
6 :local:
7
8.. warning::
9 This is a work in progress.
10
11Introduction
12============
13
14This document is a reference manual for the Machine IR (MIR) serialization
15format. MIR is a human readable serialization format that is used to represent
16LLVM's :ref:`machine specific intermediate representation
17<machine code representation>`.
18
19The MIR serialization format is designed to be used for testing the code
20generation passes in LLVM.
21
22Overview
23========
24
25The MIR serialization format uses a YAML container. YAML is a standard
26data serialization language, and the full YAML language spec can be read at
27`yaml.org
28<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_.
29
30A MIR file is split up into a series of `YAML documents`_. The first document
31can contain an optional embedded LLVM IR module, and the rest of the documents
32contain the serialized machine functions.
33
34.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132
35
Alex Lorenzea788c42015-08-21 22:58:33 +000036MIR Testing Guide
37=================
38
39You can use the MIR format for testing in two different ways:
40
41- You can write MIR tests that invoke a single code generation pass using the
Matthias Braune6185b72017-04-13 22:14:45 +000042 ``-run-pass`` option in llc.
Alex Lorenzea788c42015-08-21 22:58:33 +000043
Matthias Braune6185b72017-04-13 22:14:45 +000044- You can use llc's ``-stop-after`` option with existing or new LLVM assembly
Alex Lorenzea788c42015-08-21 22:58:33 +000045 tests and check the MIR output of a specific code generation pass.
46
47Testing Individual Code Generation Passes
48-----------------------------------------
49
Matthias Braune6185b72017-04-13 22:14:45 +000050The ``-run-pass`` option in llc allows you to create MIR tests that invoke just
51a single code generation pass. When this option is used, llc will parse an
52input MIR file, run the specified code generation pass(es), and output the
53resulting MIR code.
Alex Lorenzea788c42015-08-21 22:58:33 +000054
Matthias Braune6185b72017-04-13 22:14:45 +000055You can generate an input MIR file for the test by using the ``-stop-after`` or
56``-stop-before`` option in llc. For example, if you would like to write a test
57for the post register allocation pseudo instruction expansion pass, you can
58specify the machine copy propagation pass in the ``-stop-after`` option, as it
59runs just before the pass that we are trying to test:
Alex Lorenzea788c42015-08-21 22:58:33 +000060
Matthias Braune6185b72017-04-13 22:14:45 +000061 ``llc -stop-after=machine-cp bug-trigger.ll > test.mir``
Alex Lorenzea788c42015-08-21 22:58:33 +000062
63After generating the input MIR file, you'll have to add a run line that uses
64the ``-run-pass`` option to it. In order to test the post register allocation
65pseudo instruction expansion pass on X86-64, a run line like the one shown
66below can be used:
67
Matthias Braune6185b72017-04-13 22:14:45 +000068 ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s``
Alex Lorenzea788c42015-08-21 22:58:33 +000069
70The MIR files are target dependent, so they have to be placed in the target
Matthias Braune6185b72017-04-13 22:14:45 +000071specific test directories (``lib/CodeGen/TARGETNAME``). They also need to
72specify a target triple or a target architecture either in the run line or in
73the embedded LLVM IR module.
Alex Lorenzea788c42015-08-21 22:58:33 +000074
Matthias Braun836c3832017-04-13 23:45:14 +000075Simplifying MIR files
76^^^^^^^^^^^^^^^^^^^^^
77
78The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose;
79Tests are more accessible and future proof when simplified:
80
Matthias Braun89401142017-05-05 21:09:30 +000081- Use the ``-simplify-mir`` option with llc.
82
Matthias Braun836c3832017-04-13 23:45:14 +000083- Machine function attributes often have default values or the test works just
84 as well with default values. Typical candidates for this are: `alignment:`,
85 `exposesReturnsTwice`, `legalized`, `regBankSelected`, `selected`.
86 The whole `frameInfo` section is often unnecessary if there is no special
87 frame usage in the function. `tracksRegLiveness` on the other hand is often
88 necessary for some passes that care about block livein lists.
89
90- The (global) `liveins:` list is typically only interesting for early
91 instruction selection passes and can be removed when testing later passes.
92 The per-block `liveins:` on the other hand are necessary if
93 `tracksRegLiveness` is true.
94
95- Branch probability data in block `successors:` lists can be dropped if the
96 test doesn't depend on it. Example:
97 `successors: %bb.1(0x40000000), %bb.2(0x40000000)` can be replaced with
98 `successors: %bb.1, %bb.2`.
99
100- MIR code contains a whole IR module. This is necessary because there are
101 no equivalents in MIR for global variables, references to external functions,
102 function attributes, metadata, debug info. Instead some MIR data references
103 the IR constructs. You can often remove them if the test doesn't depend on
104 them.
105
106- Alias Analysis is performed on IR values. These are referenced by memory
107 operands in MIR. Example: `:: (load 8 from %ir.foobar, !alias.scope !9)`.
108 If the test doesn't depend on (good) alias analysis the references can be
109 dropped: `:: (load 8)`
110
111- MIR blocks can reference IR blocks for debug printing, profile information
112 or debug locations. Example: `bb.42.myblock` in MIR references the IR block
113 `myblock`. It is usually possible to drop the `.myblock` reference and simply
114 use `bb.42`.
115
116- If there are no memory operands or blocks referencing the IR then the
117 IR function can be replaced by a parameterless dummy function like
118 `define @func() { ret void }`.
119
120- It is possible to drop the whole IR section of the MIR file if it only
121 contains dummy functions (see above). The .mir loader will create the
122 IR functions automatically in this case.
123
Alex Lorenzea788c42015-08-21 22:58:33 +0000124Limitations
125-----------
126
127Currently the MIR format has several limitations in terms of which state it
128can serialize:
129
130- The target-specific state in the target-specific ``MachineFunctionInfo``
131 subclasses isn't serialized at the moment.
132
133- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and
134 SystemZ backends) aren't serialized at the moment.
135
136- The ``MCSymbol`` machine operands are only printed, they can't be parsed.
137
138- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI
139 instructions and the variable debug information from MMI is serialized right
140 now.
141
142These limitations impose restrictions on what you can test with the MIR format.
143For now, tests that would like to test some behaviour that depends on the state
144of certain ``MCSymbol`` operands or the exception handling state in MMI, can't
145use the MIR format. As well as that, tests that test some behaviour that
146depends on the state of the target specific ``MachineFunctionInfo`` or
147``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment.
148
Alex Lorenz3d311772015-08-06 22:55:19 +0000149High Level Structure
150====================
151
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000152.. _embedded-module:
153
Alex Lorenz3d311772015-08-06 22:55:19 +0000154Embedded Module
155---------------
156
157When the first YAML document contains a `YAML block literal string`_, the MIR
158parser will treat this string as an LLVM assembly language string that
159represents an embedded LLVM IR module.
160Here is an example of a YAML document that contains an LLVM module:
161
162.. code-block:: llvm
163
Alex Lorenz3d311772015-08-06 22:55:19 +0000164 define i32 @inc(i32* %x) {
165 entry:
166 %0 = load i32, i32* %x
167 %1 = add i32 %0, 1
168 store i32 %1, i32* %x
169 ret i32 %1
170 }
Alex Lorenz3d311772015-08-06 22:55:19 +0000171
172.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688
173
174Machine Functions
175-----------------
176
177The remaining YAML documents contain the machine functions. This is an example
178of such YAML document:
179
Renato Golin124f2592016-07-20 12:16:38 +0000180.. code-block:: text
Alex Lorenz3d311772015-08-06 22:55:19 +0000181
182 ---
183 name: inc
184 tracksRegLiveness: true
185 liveins:
186 - { reg: '%rdi' }
Alex Lorenz98461672015-08-14 00:36:10 +0000187 body: |
188 bb.0.entry:
189 liveins: %rdi
190
191 %eax = MOV32rm %rdi, 1, _, 0, _
192 %eax = INC32r killed %eax, implicit-def dead %eflags
193 MOV32mr killed %rdi, 1, _, 0, _, %eax
194 RETQ %eax
Alex Lorenz3d311772015-08-06 22:55:19 +0000195 ...
196
197The document above consists of attributes that represent the various
198properties and data structures in a machine function.
199
200The attribute ``name`` is required, and its value should be identical to the
201name of a function that this machine function is based on.
202
Alex Lorenz98461672015-08-14 00:36:10 +0000203The attribute ``body`` is a `YAML block literal string`_. Its value represents
204the function's machine basic blocks and their machine instructions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000205
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000206Machine Instructions Format Reference
207=====================================
208
209The machine basic blocks and their instructions are represented using a custom,
210human readable serialization language. This language is used in the
211`YAML block literal string`_ that corresponds to the machine function's body.
212
213A source string that uses this language contains a list of machine basic
214blocks, which are described in the section below.
215
216Machine Basic Blocks
217--------------------
218
219A machine basic block is defined in a single block definition source construct
220that contains the block's ID.
221The example below defines two blocks that have an ID of zero and one:
222
Renato Golin124f2592016-07-20 12:16:38 +0000223.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000224
225 bb.0:
226 <instructions>
227 bb.1:
228 <instructions>
229
230A machine basic block can also have a name. It should be specified after the ID
231in the block's definition:
232
Renato Golin124f2592016-07-20 12:16:38 +0000233.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000234
235 bb.0.entry: ; This block's name is "entry"
236 <instructions>
237
238The block's name should be identical to the name of the IR block that this
239machine block is based on.
240
241Block References
242^^^^^^^^^^^^^^^^
243
244The machine basic blocks are identified by their ID numbers. Individual
245blocks are referenced using the following syntax:
246
Renato Golin124f2592016-07-20 12:16:38 +0000247.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000248
Francis Visoiu Mistrih25528d62017-12-04 17:18:51 +0000249 %bb.<id>
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000250
Francis Visoiu Mistrih25528d62017-12-04 17:18:51 +0000251Example:
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000252
253.. code-block:: llvm
254
255 %bb.0
Francis Visoiu Mistrih25528d62017-12-04 17:18:51 +0000256
257The following syntax is also supported, but the former syntax is preferred for
258block references:
259
260.. code-block:: text
261
262 %bb.<id>[.<name>]
263
264Example:
265
266.. code-block:: llvm
267
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000268 %bb.1.then
269
270Successors
271^^^^^^^^^^
272
273The machine basic block's successors have to be specified before any of the
274instructions:
275
Renato Golin124f2592016-07-20 12:16:38 +0000276.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000277
278 bb.0.entry:
279 successors: %bb.1.then, %bb.2.else
280 <instructions>
281 bb.1.then:
282 <instructions>
283 bb.2.else:
284 <instructions>
285
286The branch weights can be specified in brackets after the successor blocks.
287The example below defines a block that has two successors with branch weights
288of 32 and 16:
289
Renato Golin124f2592016-07-20 12:16:38 +0000290.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000291
292 bb.0.entry:
293 successors: %bb.1.then(32), %bb.2.else(16)
294
Alex Lorenzb981d372015-08-21 21:17:01 +0000295.. _bb-liveins:
296
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000297Live In Registers
298^^^^^^^^^^^^^^^^^
299
300The machine basic block's live in registers have to be specified before any of
301the instructions:
302
Renato Golin124f2592016-07-20 12:16:38 +0000303.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000304
305 bb.0.entry:
306 liveins: %edi, %esi
307
308The list of live in registers and successors can be empty. The language also
309allows multiple live in register and successor lists - they are combined into
310one list by the parser.
311
312Miscellaneous Attributes
313^^^^^^^^^^^^^^^^^^^^^^^^
314
315The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be
316specified in brackets after the block's definition:
317
Renato Golin124f2592016-07-20 12:16:38 +0000318.. code-block:: text
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000319
320 bb.0.entry (address-taken):
321 <instructions>
322 bb.2.else (align 4):
323 <instructions>
324 bb.3(landing-pad, align 4):
325 <instructions>
326
327.. TODO: Describe the way the reference to an unnamed LLVM IR block can be
328 preserved.
329
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000330Machine Instructions
331--------------------
332
Alex Lorenzb981d372015-08-21 21:17:01 +0000333A machine instruction is composed of a name,
334:ref:`machine operands <machine-operands>`,
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000335:ref:`instruction flags <instruction-flags>`, and machine memory operands.
336
337The instruction's name is usually specified before the operands. The example
338below shows an instance of the X86 ``RETQ`` instruction with a single machine
339operand:
340
Renato Golin124f2592016-07-20 12:16:38 +0000341.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000342
343 RETQ %eax
344
345However, if the machine instruction has one or more explicitly defined register
346operands, the instruction's name has to be specified after them. The example
347below shows an instance of the AArch64 ``LDPXpost`` instruction with three
348defined register operands:
349
Renato Golin124f2592016-07-20 12:16:38 +0000350.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000351
352 %sp, %fp, %lr = LDPXpost %sp, 2
353
354The instruction names are serialized using the exact definitions from the
355target's ``*InstrInfo.td`` files, and they are case sensitive. This means that
356similar instruction names like ``TSTri`` and ``tSTRi`` represent different
357machine instructions.
358
359.. _instruction-flags:
360
361Instruction Flags
362^^^^^^^^^^^^^^^^^
363
364The flag ``frame-setup`` can be specified before the instruction's name:
365
Renato Golin124f2592016-07-20 12:16:38 +0000366.. code-block:: text
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000367
368 %fp = frame-setup ADDXri %sp, 0, 0
369
Alex Lorenzb981d372015-08-21 21:17:01 +0000370.. _registers:
371
372Registers
373---------
374
375Registers are one of the key primitives in the machine instructions
376serialization language. They are primarly used in the
377:ref:`register machine operands <register-operands>`,
378but they can also be used in a number of other places, like the
379:ref:`basic block's live in list <bb-liveins>`.
380
381The physical registers are identified by their name. They use the following
382syntax:
383
Renato Golin124f2592016-07-20 12:16:38 +0000384.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000385
386 %<name>
387
388The example below shows three X86 physical registers:
389
Renato Golin124f2592016-07-20 12:16:38 +0000390.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000391
392 %eax
393 %r15
394 %eflags
395
396The virtual registers are identified by their ID number. They use the following
397syntax:
398
Renato Golin124f2592016-07-20 12:16:38 +0000399.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000400
401 %<id>
402
403Example:
404
Renato Golin124f2592016-07-20 12:16:38 +0000405.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000406
407 %0
408
409The null registers are represented using an underscore ('``_``'). They can also be
410represented using a '``%noreg``' named register, although the former syntax
411is preferred.
412
413.. _machine-operands:
414
415Machine Operands
416----------------
417
418There are seventeen different kinds of machine operands, and all of them, except
419the ``MCSymbol`` operand, can be serialized. The ``MCSymbol`` operands are
420just printed out - they can't be parsed back yet.
421
422Immediate Operands
423^^^^^^^^^^^^^^^^^^
424
425The immediate machine operands are untyped, 64-bit signed integers. The
426example below shows an instance of the X86 ``MOV32ri`` instruction that has an
427immediate machine operand ``-42``:
428
Renato Golin124f2592016-07-20 12:16:38 +0000429.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000430
431 %eax = MOV32ri -42
432
Francis Visoiu Mistrih6c4ca712017-12-08 11:40:06 +0000433For integers > 64bit, we use a special machine operand, ``MO_CImmediate``,
434which stores the immediate in a ``ConstantInt`` using an ``APInt`` (LLVM's
435arbitrary precision integers).
436
437.. TODO: Describe the FPIMM immediate operands.
Alex Lorenzb981d372015-08-21 21:17:01 +0000438
439.. _register-operands:
440
441Register Operands
442^^^^^^^^^^^^^^^^^
443
444The :ref:`register <registers>` primitive is used to represent the register
445machine operands. The register operands can also have optional
446:ref:`register flags <register-flags>`,
Alex Lorenz37e02622015-09-08 11:39:47 +0000447:ref:`a subregister index <subregister-indices>`,
448and a reference to the tied register operand.
Alex Lorenzb981d372015-08-21 21:17:01 +0000449The full syntax of a register operand is shown below:
450
Renato Golin124f2592016-07-20 12:16:38 +0000451.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000452
453 [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ]
454
455This example shows an instance of the X86 ``XOR32rr`` instruction that has
4565 register operands with different register flags:
457
Renato Golin124f2592016-07-20 12:16:38 +0000458.. code-block:: text
Alex Lorenzb981d372015-08-21 21:17:01 +0000459
460 dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al
461
462.. _register-flags:
463
464Register Flags
465~~~~~~~~~~~~~~
466
467The table below shows all of the possible register flags along with the
468corresponding internal ``llvm::RegState`` representation:
469
470.. list-table::
471 :header-rows: 1
472
473 * - Flag
474 - Internal Value
475
476 * - ``implicit``
477 - ``RegState::Implicit``
478
479 * - ``implicit-def``
480 - ``RegState::ImplicitDefine``
481
482 * - ``def``
483 - ``RegState::Define``
484
485 * - ``dead``
486 - ``RegState::Dead``
487
488 * - ``killed``
489 - ``RegState::Kill``
490
491 * - ``undef``
492 - ``RegState::Undef``
493
494 * - ``internal``
495 - ``RegState::InternalRead``
496
497 * - ``early-clobber``
498 - ``RegState::EarlyClobber``
499
500 * - ``debug-use``
501 - ``RegState::Debug``
Alex Lorenz3a4a60c2015-08-15 01:06:06 +0000502
Alex Lorenz37e02622015-09-08 11:39:47 +0000503.. _subregister-indices:
504
505Subregister Indices
506~~~~~~~~~~~~~~~~~~~
507
508The register machine operands can reference a portion of a register by using
509the subregister indices. The example below shows an instance of the ``COPY``
510pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8
511lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:
512
Renato Golin124f2592016-07-20 12:16:38 +0000513.. code-block:: text
Alex Lorenz37e02622015-09-08 11:39:47 +0000514
515 %1 = COPY %0:sub_8bit
516
517The names of the subregister indices are target specific, and are typically
518defined in the target's ``*RegisterInfo.td`` file.
519
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000520Global Value Operands
521^^^^^^^^^^^^^^^^^^^^^
522
523The global value machine operands reference the global values from the
524:ref:`embedded LLVM IR module <embedded-module>`.
525The example below shows an instance of the X86 ``MOV64rm`` instruction that has
526a global value operand named ``G``:
527
Renato Golin124f2592016-07-20 12:16:38 +0000528.. code-block:: text
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000529
530 %rax = MOV64rm %rip, 1, _, @G, _
531
532The named global values are represented using an identifier with the '@' prefix.
533If the identifier doesn't match the regular expression
534`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted.
535
536The unnamed global values are represented using an unsigned numeric value with
537the '@' prefix, like in the following examples: ``@0``, ``@989``.
538
Alex Lorenz3d311772015-08-06 22:55:19 +0000539.. TODO: Describe the parsers default behaviour when optional YAML attributes
540 are missing.
Alex Lorenz8eadc3f2015-08-21 17:26:38 +0000541.. TODO: Describe the syntax for the bundled instructions.
Alex Lorenzb981d372015-08-21 21:17:01 +0000542.. TODO: Describe the syntax for virtual register YAML definitions.
Alex Lorenz3d311772015-08-06 22:55:19 +0000543.. TODO: Describe the machine function's YAML flag attributes.
Alex Lorenzd4990eb2015-09-08 11:38:16 +0000544.. TODO: Describe the syntax for the external symbol and register
Alex Lorenz3d311772015-08-06 22:55:19 +0000545 mask machine operands.
546.. TODO: Describe the frame information YAML mapping.
547.. TODO: Describe the syntax of the stack object machine operands and their
548 YAML definitions.
549.. TODO: Describe the syntax of the constant pool machine operands and their
550 YAML definitions.
551.. TODO: Describe the syntax of the jump table machine operands and their
552 YAML definitions.
553.. TODO: Describe the syntax of the block address machine operands.
554.. TODO: Describe the syntax of the CFI index machine operands.
555.. TODO: Describe the syntax of the metadata machine operands, and the
556 instructions debug location attribute.
557.. TODO: Describe the syntax of the target index machine operands.
558.. TODO: Describe the syntax of the register live out machine operands.
559.. TODO: Describe the syntax of the machine memory operands.