blob: e1282e102ebefbbac93881d03c70d84fe73d87a1 [file] [log] [blame]
Kevin Enderby6e86da52012-10-24 23:30:22 +00001.. _marked_up_disassembly:
2
3=======================================
4LLVM's Optional Rich Disassembly Output
5=======================================
6
7.. contents::
8 :local:
9
10Introduction
11============
12
13LLVM's default disassembly output is raw text. To allow consumers more ability
14to introspect the instructions' textual representation or to reformat for a more
15user friendly display there is an optional rich disassembly output.
16
17This optional output is sufficient to reference into individual portions of the
18instruction text. This is intended for clients like disassemblers, list file
19generators, and pretty-printers, which need more than the raw instructions and
20the ability to print them.
21
22To provide this functionality the assembly text is marked up with annotations.
23The markup is simple enough in syntax to be robust even in the case of version
24mismatches between consumers and producers. That is, the syntax generally does
25not carry semantics beyond "this text has an annotation," so consumers can
26simply ignore annotations they do not understand or do not care about.
27
28After calling ``LLVMCreateDisasm()`` to create a disassembler context the
29optional output is enable with this call:
30
31.. code-block:: c
32
33 LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup);
34
35Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings
36with the marked up annotations.
37
38Instruction Annotations
39=======================
40
41.. _contextual markups:
42
43Contextual markups
44------------------
45
46Annoated assembly display will supply contextual markup to help clients more
47efficiently implement things like pretty printers. Most markup will be target
48independent, so clients can effectively provide good display without any target
49specific knowledge.
50
51Annotated assembly goes through the normal instruction printer, but optionally
52includes contextual tags on portions of the instruction string. An annotation
53is any '<' '>' delimited section of text(1).
54
55.. code-block:: bat
56
57 annotation: '<' tag-name tag-modifier-list ':' annotated-text '>'
58 tag-name: identifier
59 tag-modifier-list: comma delimited identifier list
60
61The tag-name is an identifier which gives the type of the annotation. For the
62first pass, this will be very simple, with memory references, registers, and
63immediates having the tag names "mem", "reg", and "imm", respectively.
64
65The tag-modifier-list is typically additional target-specific context, such as
66register class.
67
68Clients should accept and ignore any tag-names or tag-modifiers they do not
69understand, allowing the annotations to grow in richness without breaking older
70clients.
71
72For example, a possible annotation of an ARM load of a stack-relative location
73might be annotated as:
74
75.. code-block:: nasm
76
77 ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]>
78
79
801: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal '<' character is output as '<<' in an annotated assembly string.
81
82C API Details
83-------------
84
85The intended consumers of this information use the C API, therefore the new C
86API function for the disassembler will be added to provide an option to produce
87disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the
88``LLVMDisassembler_Option_UseMarkup`` option (see above).