| |
| Note, 11 May 2009. The XML format evolved over several versions, |
| as expected. This file describes 3 different versions of the |
| format (called Protocols 1, 2 and 3 respectively). As of 11 May 09 |
| a fourth version, Protocol 4, was defined, and that is described |
| in xml-output-protocol4.txt. |
| |
| The original May 2005 introduction follows. These comments are |
| correct up to and including Protocol 3, which was used in the Valgrind |
| 3.4.x series. However, there were some more significant changes in |
| the format and the required flags for Valgrind, in Protocol 4. |
| |
| ---------------------- |
| |
| As of May 2005, Valgrind can produce its output in XML form. The |
| intention is to provide an easily parsed, stable format which is |
| suitable for GUIs to read. |
| |
| |
| Design goals |
| ~~~~~~~~~~~~ |
| |
| * Produce XML output which is easily parsed |
| |
| * Have a stable output format which does not change much over time, so |
| that investments in parser-writing by GUI developers is not lost as |
| new versions of Valgrind appear. |
| |
| * Have an extensible output format, so that future changes to the |
| format do not break backwards compatibility with existing parsers of |
| it. |
| |
| * Produce output in a form which suitable for both offline GUIs (run |
| all the way to the end, then examine output) and interactive GUIs |
| (parse XML incrementally, update display as we go). |
| |
| * Put as much information as possible into the XML and let the GUIs |
| decide what to show the user (a.k.a provide mechanism, not policy). |
| |
| * Make XML which is actually parseable by standard XML tools. |
| |
| |
| How to use |
| ~~~~~~~~~~ |
| |
| Run with flag --xml=yes. That's all. Note however several |
| caveats. |
| |
| * At the present time only Memcheck is supported. The scheme extends |
| easily enough to cover Helgrind if needed. |
| |
| * When XML output is selected, various other settings are made. |
| This is in order that the output format is more controlled. |
| The settings which are changed are: |
| |
| - Suppression generation is disabled, as that would require user |
| input. |
| |
| - Attaching to GDB is disabled for the same reason. |
| |
| - The verbosity level is set to 1 (-v). |
| |
| - Error limits are disabled. Usually if the program generates a lot |
| of errors, Valgrind slows down and eventually stops collecting |
| them. When outputting XML this is not the case. |
| |
| - VEX emulation warnings are not shown. |
| |
| - File descriptor leak checking is disabled. This could be |
| re-enabled at some future point. |
| |
| - Maximum-detail leak checking is selected (--leak-check=full). |
| |
| |
| The output format |
| ~~~~~~~~~~~~~~~~~ |
| For the most part this should be self descriptive. It is printed in a |
| sort-of human-readable way for easy understanding. You may want to |
| read the rest of this together with the results of "valgrind --xml=yes |
| memcheck/tests/xml1" as an example. |
| |
| All tags are balanced: a <foo> tag is always closed by </foo>. Hence |
| in the description that follows, mention of a tag <foo> implicitly |
| means there is a matching closing tag </foo>. |
| |
| Symbols in CAPITALS are nonterminals in the grammar and are defined |
| somewhere below. The root nonterminal is TOPLEVEL. |
| |
| The following nonterminals are not described further: |
| INT is a 64-bit signed decimal integer. |
| TEXT is arbitrary text. |
| HEX64 is a 64-bit hexadecimal number, with leading "0x". |
| |
| Text strings are escaped so as to remove the <, > and & characters |
| which would otherwise mess up parsing. They are replaced respectively |
| with the standard encodings "<", ">" and "&" respectively. |
| Note this is not (yet) done throughout, only for function names in |
| <frame>..</frame> tags-pairs. |
| |
| |
| TOPLEVEL |
| -------- |
| |
| The first line output is always this: |
| |
| <?xml version="1.0"?> |
| |
| All remaining output is contained within the tag-pair |
| <valgrindoutput>. |
| |
| Inside that, the first entity is an indication of the protocol |
| version. This is provided so that existing parsers can identify XML |
| created by future versions of Valgrind merely by observing that the |
| protocol version is one they don't understand. Hence TOPLEVEL is: |
| |
| <?xml version="1.0"?> |
| <valgrindoutput> |
| <protocolversion>INT<protocolversion> |
| PROTOCOL |
| </valgrindoutput> |
| |
| Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions |
| 3.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version |
| 3. |
| |
| |
| PROTOCOL for version 3 |
| ---------------------- |
| Changes in 3.4.X (tentative): (jrs, 1 March 2008) |
| |
| * There may be more than one <logfilequalifier> clause. |
| |
| * Some errors may have two <auxwhat> blocks, rather than just one |
| (resulting from merge of the DATASYMS branch) |
| |
| * Some errors may have an ORIGIN component, indicating the origins of |
| uninitialised values. This results from the merge of the |
| OTRACK_BY_INSTRUMENTATION branch. |
| |
| |
| PROTOCOL for version 2 |
| ---------------------- |
| Version 2 is identical in every way to version 1, except that the time |
| string in |
| |
| <time>human-readable-time-string</time> |
| |
| has changed format, and is also elapsed wallclock time since process |
| start, and not local time or any such. In fact version 1 does not |
| define the format of the string so in some ways this revision is |
| irrelevant. |
| |
| |
| PROTOCOL for version 1 |
| ---------------------- |
| This is the main top-level construction. Roughly speaking, it |
| contains a load of preamble, the errors from the run of the |
| program, and the result of the final leak check. Hence the |
| following in sequence: |
| |
| * Various preamble lines which give version info for the various |
| components. The text in them can be anything; it is not intended |
| for interpretation by the GUI: |
| |
| <preamble> |
| <line>Misc version/copyright text</line> (zero or more of) |
| </preamble> |
| |
| * The PID of this process and of its parent: |
| |
| <pid>INT</pid> |
| <ppid>INT</ppid> |
| |
| * The name of the tool being used: |
| |
| <tool>TEXT</tool> |
| |
| * OPTIONALLY, if --log-file-qualifier=VAR flag was given: |
| |
| <logfilequalifier> <var>VAR</var> <value>$VAR</value> |
| </logfilequalifier> |
| |
| That is, both the name of the environment variable and its value |
| are given. |
| [update: as of v3.3.0, this is not present, as the --log-file-qualifier |
| option has been removed, replaced by the %q format specifier in --log-file.] |
| |
| * OPTIONALLY, if --xml-user-comment=STRING was given: |
| |
| <usercomment>STRING</usercomment> |
| |
| STRING is not escaped in any way, so that it itself may be a piece |
| of XML with arbitrary tags etc. |
| |
| * The program and args: first those pertaining to Valgrind itself, and |
| then those pertaining to the program to be run under Valgrind (the |
| client): |
| |
| <args> |
| <vargv> |
| <exe>TEXT</exe> |
| <arg>TEXT</arg> (zero or more of) |
| </vargv> |
| <argv> |
| <exe>TEXT</exe> |
| <arg>TEXT</arg> (zero or more of) |
| </argv> |
| </args> |
| |
| * The following, indicating that the program has now started: |
| |
| <status> <state>RUNNING</state> |
| <time>human-readable-time-string</time> |
| </status> |
| |
| * Zero or more of (either ERROR or ERRORCOUNTS). |
| |
| * The following, indicating that the program has now finished, and |
| that the wrapup (leak checking) is happening. |
| |
| <status> <state>FINISHED</state> |
| <time>human-readable-time-string</time> |
| </status> |
| |
| * SUPPCOUNTS, indicating how many times each suppression was used. |
| |
| * Zero or more ERRORs, each of which is a complaint from the |
| leak checker. |
| |
| That's it. |
| |
| |
| ERROR |
| ----- |
| This shows an error, and is the most complex nonterminal. The format |
| is as follows: |
| |
| <error> |
| <unique>HEX64</unique> |
| <tid>INT</tid> |
| <kind>KIND</kind> |
| <what>TEXT</what> |
| |
| optionally: <leakedbytes>INT</leakedbytes> |
| optionally: <leakedblocks>INT</leakedblocks> |
| |
| STACK |
| |
| optionally: <auxwhat>TEXT</auxwhat> |
| optionally: STACK |
| optionally: ORIGIN |
| |
| </error> |
| |
| * Each error contains a unique, arbitrary 64-bit hex number. This is |
| used to refer to the error in ERRORCOUNTS nonterminals (see below). |
| |
| * The <tid> tag indicates the Valgrind thread number. This value |
| is arbitrary but may be used to determine which threads produced |
| which errors (at least, the first instance of each error). |
| |
| * The <kind> tag specifies one of a small number of fixed error |
| types (enumerated below), so that GUIs may roughly categorise |
| errors by type if they want. |
| |
| * The <what> tag gives a human-understandable description of the |
| error. |
| |
| * For <kind> tags specifying a KIND of the form "Leak_*", the |
| optional <leakedbytes> and <leakedblocks> indicate the number of |
| bytes and blocks leaked by this error. |
| |
| * The primary STACK for this error, indicating where it occurred. |
| |
| * Some error types may have auxiliary information attached: |
| |
| <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable |
| description (usually of invalid addresses) |
| |
| STACK gives an auxiliary stack (usually the allocation/free |
| point of a block). If this STACK is present then |
| <auxwhat>TEXT</auxwhat> will precede it. |
| |
| |
| KIND |
| ---- |
| This is a small enumeration indicating roughly the nature of an error. |
| The possible values are: |
| |
| InvalidFree |
| |
| free/delete/delete[] on an invalid pointer |
| |
| MismatchedFree |
| |
| free/delete/delete[] does not match allocation function |
| (eg doing new[] then free on the result) |
| |
| InvalidRead |
| |
| read of an invalid address |
| |
| InvalidWrite |
| |
| write of an invalid address |
| |
| InvalidJump |
| |
| jump to an invalid address |
| |
| Overlap |
| |
| args overlap other otherwise bogus in eg memcpy |
| |
| InvalidMemPool |
| |
| invalid mem pool specified in client request |
| |
| UninitCondition |
| |
| conditional jump/move depends on undefined value |
| |
| UninitValue |
| |
| other use of undefined value (primarily memory addresses) |
| |
| SyscallParam |
| |
| system call params are undefined or point to |
| undefined/unaddressible memory |
| |
| ClientCheck |
| |
| "error" resulting from a client check request |
| |
| Leak_DefinitelyLost |
| |
| memory leak; the referenced blocks are definitely lost |
| |
| Leak_IndirectlyLost |
| |
| memory leak; the referenced blocks are lost because all pointers |
| to them are also in leaked blocks |
| |
| Leak_PossiblyLost |
| |
| memory leak; only interior pointers to referenced blocks were |
| found |
| |
| Leak_StillReachable |
| |
| memory leak; pointers to un-freed blocks are still available |
| |
| |
| STACK |
| ----- |
| STACK indicates locations in the program being debugged. A STACK |
| is one or more FRAMEs. The first is the innermost frame, the |
| next its caller, etc. |
| |
| <stack> |
| one or more FRAME |
| </stack> |
| |
| |
| FRAME |
| ----- |
| FRAME records a single program location: |
| |
| <frame> |
| <ip>HEX64</ip> |
| optionally <obj>TEXT</obj> |
| optionally <fn>TEXT</fn> |
| optionally <dir>TEXT</dir> |
| optionally <file>TEXT</file> |
| optionally <line>INT</line> |
| </frame> |
| |
| Only the <ip> field is guaranteed to be present. It indicates a |
| code ("instruction pointer") address. |
| |
| The optional fields, if present, appear in the order stated: |
| |
| * obj: gives the name of the ELF object containing the code address |
| |
| * fn: gives the name of the function containing the code address |
| |
| * dir: gives the source directory associated with the name specified |
| by <file>. Note the current implementation often does not |
| put anything useful in this field. |
| |
| * file: gives the name of the source file containing the code address |
| |
| * line: gives the line number in the source file |
| |
| |
| ORIGIN |
| ------ |
| ORIGIN shows the origin of uninitialised data in errors that involve |
| uninitialised data. STACK shows the origin of the uninitialised |
| value. TEXT gives a human-understandable hint as to the meaning of |
| the information in STACK. |
| |
| <origin> |
| <what>TEXT<what> |
| STACK |
| </origin> |
| |
| |
| ERRORCOUNTS |
| ----------- |
| This specifies, for each error that has been so far presented, |
| the number of occurrences of that error. |
| |
| <errorcounts> |
| zero or more of |
| <pair> <count>INT</count> <unique>HEX64</unique> </pair> |
| </errorcounts> |
| |
| Each <pair> gives the current error count <count> for the error with |
| unique tag </unique>. The counts do not have to give a count for each |
| error so far presented - partial information is allowable. |
| |
| As at Valgrind rev 3793, error counts are only emitted at program |
| termination. However, it is perfectly acceptable to periodically emit |
| error counts as the program is running. Doing so would facilitate a |
| GUI to dynamically update its error-count display as the program runs. |
| |
| |
| SUPPCOUNTS |
| ---------- |
| A SUPPCOUNTS block appears exactly once, after the program terminates. |
| It specifies the number of times each error-suppression was used. |
| Suppressions not mentioned were used zero times. |
| |
| <suppcounts> |
| zero or more of |
| <pair> <count>INT</count> <name>TEXT</name> </pair> |
| </suppcounts> |
| |
| The <name> is as specified in the suppression name fields in .supp |
| files. |
| |