Kevin Enderby | 9356608 | 2012-10-24 23:30:22 +0000 | [diff] [blame] | 1 | ======================================= |
| 2 | LLVM's Optional Rich Disassembly Output |
| 3 | ======================================= |
| 4 | |
| 5 | .. contents:: |
| 6 | :local: |
| 7 | |
| 8 | Introduction |
| 9 | ============ |
| 10 | |
| 11 | LLVM's default disassembly output is raw text. To allow consumers more ability |
| 12 | to introspect the instructions' textual representation or to reformat for a more |
| 13 | user friendly display there is an optional rich disassembly output. |
| 14 | |
| 15 | This optional output is sufficient to reference into individual portions of the |
| 16 | instruction text. This is intended for clients like disassemblers, list file |
| 17 | generators, and pretty-printers, which need more than the raw instructions and |
| 18 | the ability to print them. |
| 19 | |
| 20 | To provide this functionality the assembly text is marked up with annotations. |
| 21 | The markup is simple enough in syntax to be robust even in the case of version |
| 22 | mismatches between consumers and producers. That is, the syntax generally does |
| 23 | not carry semantics beyond "this text has an annotation," so consumers can |
| 24 | simply ignore annotations they do not understand or do not care about. |
| 25 | |
| 26 | After calling ``LLVMCreateDisasm()`` to create a disassembler context the |
| 27 | optional output is enable with this call: |
| 28 | |
| 29 | .. code-block:: c |
| 30 | |
| 31 | LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup); |
| 32 | |
| 33 | Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings |
| 34 | with the marked up annotations. |
| 35 | |
| 36 | Instruction Annotations |
| 37 | ======================= |
| 38 | |
| 39 | .. _contextual markups: |
| 40 | |
| 41 | Contextual markups |
| 42 | ------------------ |
| 43 | |
| 44 | Annoated assembly display will supply contextual markup to help clients more |
| 45 | efficiently implement things like pretty printers. Most markup will be target |
| 46 | independent, so clients can effectively provide good display without any target |
| 47 | specific knowledge. |
| 48 | |
| 49 | Annotated assembly goes through the normal instruction printer, but optionally |
| 50 | includes contextual tags on portions of the instruction string. An annotation |
| 51 | is any '<' '>' delimited section of text(1). |
| 52 | |
| 53 | .. code-block:: bat |
| 54 | |
| 55 | annotation: '<' tag-name tag-modifier-list ':' annotated-text '>' |
| 56 | tag-name: identifier |
| 57 | tag-modifier-list: comma delimited identifier list |
| 58 | |
| 59 | The tag-name is an identifier which gives the type of the annotation. For the |
| 60 | first pass, this will be very simple, with memory references, registers, and |
| 61 | immediates having the tag names "mem", "reg", and "imm", respectively. |
| 62 | |
| 63 | The tag-modifier-list is typically additional target-specific context, such as |
| 64 | register class. |
| 65 | |
Kevin Enderby | 9356608 | 2012-10-24 23:30:22 +0000 | [diff] [blame] | 66 | Clients should accept and ignore any tag-names or tag-modifiers they do not |
| 67 | understand, allowing the annotations to grow in richness without breaking older |
| 68 | clients. |
| 69 | |
| 70 | For example, a possible annotation of an ARM load of a stack-relative location |
| 71 | might be annotated as: |
| 72 | |
Renato Golin | 124f259 | 2016-07-20 12:16:38 +0000 | [diff] [blame^] | 73 | .. code-block:: text |
Kevin Enderby | 9356608 | 2012-10-24 23:30:22 +0000 | [diff] [blame] | 74 | |
| 75 | ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]> |
| 76 | |
| 77 | |
| 78 | 1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal '<' character is output as '<<' in an annotated assembly string. |
| 79 | |
| 80 | C API Details |
| 81 | ------------- |
| 82 | |
| 83 | The intended consumers of this information use the C API, therefore the new C |
| 84 | API function for the disassembler will be added to provide an option to produce |
| 85 | disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the |
| 86 | ``LLVMDisassembler_Option_UseMarkup`` option (see above). |