|  | ======================================= | 
|  | LLVM's Optional Rich Disassembly Output | 
|  | ======================================= | 
|  |  | 
|  | .. contents:: | 
|  | :local: | 
|  |  | 
|  | Introduction | 
|  | ============ | 
|  |  | 
|  | LLVM's default disassembly output is raw text. To allow consumers more ability | 
|  | to introspect the instructions' textual representation or to reformat for a more | 
|  | user friendly display there is an optional rich disassembly output. | 
|  |  | 
|  | This optional output is sufficient to reference into individual portions of the | 
|  | instruction text. This is intended for clients like disassemblers, list file | 
|  | generators, and pretty-printers, which need more than the raw instructions and | 
|  | the ability to print them. | 
|  |  | 
|  | To provide this functionality the assembly text is marked up with annotations. | 
|  | The markup is simple enough in syntax to be robust even in the case of version | 
|  | mismatches between consumers and producers. That is, the syntax generally does | 
|  | not carry semantics beyond "this text has an annotation," so consumers can | 
|  | simply ignore annotations they do not understand or do not care about. | 
|  |  | 
|  | After calling ``LLVMCreateDisasm()`` to create a disassembler context the | 
|  | optional output is enable with this call: | 
|  |  | 
|  | .. code-block:: c | 
|  |  | 
|  | LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup); | 
|  |  | 
|  | Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings | 
|  | with the marked up annotations. | 
|  |  | 
|  | Instruction Annotations | 
|  | ======================= | 
|  |  | 
|  | .. _contextual markups: | 
|  |  | 
|  | Contextual markups | 
|  | ------------------ | 
|  |  | 
|  | Annoated assembly display will supply contextual markup to help clients more | 
|  | efficiently implement things like pretty printers. Most markup will be target | 
|  | independent, so clients can effectively provide good display without any target | 
|  | specific knowledge. | 
|  |  | 
|  | Annotated assembly goes through the normal instruction printer, but optionally | 
|  | includes contextual tags on portions of the instruction string. An annotation | 
|  | is any '<' '>' delimited section of text(1). | 
|  |  | 
|  | .. code-block:: bat | 
|  |  | 
|  | annotation: '<' tag-name tag-modifier-list ':' annotated-text '>' | 
|  | tag-name: identifier | 
|  | tag-modifier-list: comma delimited identifier list | 
|  |  | 
|  | The tag-name is an identifier which gives the type of the annotation. For the | 
|  | first pass, this will be very simple, with memory references, registers, and | 
|  | immediates having the tag names "mem", "reg", and "imm", respectively. | 
|  |  | 
|  | The tag-modifier-list is typically additional target-specific context, such as | 
|  | register class. | 
|  |  | 
|  | Clients should accept and ignore any tag-names or tag-modifiers they do not | 
|  | understand, allowing the annotations to grow in richness without breaking older | 
|  | clients. | 
|  |  | 
|  | For example, a possible annotation of an ARM load of a stack-relative location | 
|  | might be annotated as: | 
|  |  | 
|  | .. code-block:: text | 
|  |  | 
|  | ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]> | 
|  |  | 
|  |  | 
|  | 1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character.  For example, a literal '<' character is output as '<<' in an annotated assembly string. | 
|  |  | 
|  | C API Details | 
|  | ------------- | 
|  |  | 
|  | The intended consumers of this information use the C API, therefore the new C | 
|  | API function for the disassembler will be added to provide an option to produce | 
|  | disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the | 
|  | ``LLVMDisassembler_Option_UseMarkup`` option (see above). |