sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 1 | |
| 2 | As of May 2005, Valgrind can produce its output in XML form. The |
| 3 | intention is to provide an easily parsed, stable format which is |
| 4 | suitable for GUIs to read. |
| 5 | |
| 6 | |
| 7 | Design goals |
| 8 | ~~~~~~~~~~~~ |
| 9 | |
| 10 | * Produce XML output which is easily parsed |
| 11 | |
| 12 | * Have a stable output format which does not change much over time, so |
| 13 | that investments in parser-writing by GUI developers is not lost as |
| 14 | new versions of Valgrind appear. |
| 15 | |
| 16 | * Have an extensive output format, so that future changes to the |
| 17 | format do not break backwards compatibility with existing parsers of |
| 18 | it. |
| 19 | |
| 20 | * Produce output in a form which suitable for both offline GUIs (run |
| 21 | all the way to the end, then examine output) and interactive GUIs |
| 22 | (parse XML incrementally, update display as we go). |
| 23 | |
| 24 | * Put as much information as possible into the XML and let the GUIs |
| 25 | decide what to show the user (a.k.a provide mechanism, not policy). |
| 26 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 27 | * Make XML which is actually parseable by standard XML tools. |
| 28 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 29 | |
| 30 | How to use |
| 31 | ~~~~~~~~~~ |
| 32 | |
| 33 | Run with flag --xml=yes. That's all. Note however several |
| 34 | caveats. |
| 35 | |
| 36 | * At the present time only Memcheck is supported. The scheme extends |
| 37 | easily enough to cover Addrcheck and Helgrind if needed. |
| 38 | |
| 39 | * When XML output is selected, various other settings are made. |
| 40 | This is in order that the output format is more controlled. |
| 41 | The settings which are changed are: |
| 42 | |
| 43 | - Suppression generation is disabled, as that would require user |
| 44 | input. |
| 45 | |
| 46 | - Attaching to GDB is disabled for the same reason. |
| 47 | |
| 48 | - The verbosity level is set to 1 (-v). |
| 49 | |
| 50 | - Error limits are disabled. Usually if the program generates a lot |
| 51 | of errors, Valgrind slows down and eventually stops collecting |
| 52 | them. When outputting XML this is not the case. |
| 53 | |
| 54 | - VEX emulation warnings are not shown. |
| 55 | |
| 56 | - File descriptor leak checking is disabled. This could be |
| 57 | re-enabled at some future point. |
| 58 | |
| 59 | - Maximum-detail leak checking is selected (--leak-check=full). |
| 60 | |
| 61 | |
| 62 | The output format |
| 63 | ~~~~~~~~~~~~~~~~~ |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 64 | For the most part this should be self descriptive. It is printed in a |
| 65 | sort-of human-readable way for easy understanding. You may want to |
| 66 | read the rest of this together with the results of "valgrind --xml=yes |
| 67 | memcheck/tests/xml1" as an example. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 68 | |
| 69 | All tags are balanced: a <foo> tag is always closed by </foo>. Hence |
| 70 | in the description that follows, mention of a tag <foo> implicitly |
| 71 | means there is a matching closing tag </foo>. |
| 72 | |
| 73 | Symbols in CAPITALS are nonterminals in the grammar and are defined |
| 74 | somewhere below. The root nonterminal is TOPLEVEL. |
| 75 | |
| 76 | The following nonterminals are not described further: |
| 77 | INT is a 64-bit signed decimal integer. |
| 78 | TEXT is arbitrary text. |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 79 | HEX64 is a 64-bit hexadecimal number, with leading "0x". |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 80 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 81 | Text strings are escaped so as to remove the <, > and & characters |
| 82 | which would otherwise mess up parsing. They are replaced respectively |
| 83 | with the standard encodings "<", ">" and "&" respectively. |
| 84 | Note this is not (yet) done throughout, only for function names in |
| 85 | <frame>..</frame> tags-pairs. |
| 86 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 87 | |
| 88 | TOPLEVEL |
| 89 | -------- |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 90 | |
| 91 | The first line output is always this: |
| 92 | |
| 93 | <?xml version="1.0"?> |
| 94 | |
| 95 | All remaining output is contained within the tag-pair |
| 96 | <valgrindoutput>. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 97 | |
| 98 | Inside that, the first entity is an indication of the protocol |
| 99 | version. This is provided so that existing parsers can identify XML |
| 100 | created by future versions of Valgrind merely by observing that the |
| 101 | protocol version is one they don't understand. Hence TOPLEVEL is: |
| 102 | |
sewardj | 8665d8e | 2005-06-01 17:35:23 +0000 | [diff] [blame] | 103 | <?xml version="1.0"?> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 104 | <valgrindoutput> |
| 105 | <protocolversion>INT<protocolversion> |
| 106 | VERSION1STUFF |
| 107 | </valgrindoutput> |
| 108 | |
| 109 | The only currently defined protocol version number is 1. This |
| 110 | document only defines protocol version 1. |
| 111 | |
| 112 | |
| 113 | VERSION1STUFF |
| 114 | ------------- |
| 115 | This is the main top-level construction. Roughly speaking, it |
| 116 | contains a load of preamble, the errors from the run of the |
| 117 | program, and the result of the final leak check. Hence the |
| 118 | following in sequence: |
| 119 | |
| 120 | * Various preamble lines which give version info for the various |
| 121 | components. The text in them can be anything; it is not intended |
| 122 | for interpretation by the GUI: |
| 123 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 124 | <preamble> |
| 125 | <line>Misc version/copyright text</line> (zero or more of) |
| 126 | </preamble> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 127 | |
| 128 | * The PID of this process and of its parent: |
| 129 | |
| 130 | <pid>INT</pid> |
| 131 | <ppid>INT</ppid> |
| 132 | |
| 133 | * The name of the tool being used: |
| 134 | |
| 135 | <tool>TEXT</tool> |
| 136 | |
sewardj | 8665d8e | 2005-06-01 17:35:23 +0000 | [diff] [blame] | 137 | * The program and args being run. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 138 | |
| 139 | <argv> |
sewardj | 8665d8e | 2005-06-01 17:35:23 +0000 | [diff] [blame] | 140 | <exe>TEXT</exe> |
| 141 | <arg>TEXT</arg> (zero or more of) |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 142 | </argv> |
| 143 | |
| 144 | * The following, indicating that the program has now started: |
| 145 | |
| 146 | <status>RUNNING</status> |
| 147 | |
| 148 | * Zero or more of (either ERROR or ERRORCOUNTS). |
| 149 | |
| 150 | * The following, indicating that the program has now finished, and |
| 151 | that the wrapup (leak checking) is happening. |
| 152 | |
| 153 | <status>FINISHED</status> |
| 154 | |
| 155 | * SUPPCOUNTS, indicating how many times each suppression was used. |
| 156 | |
| 157 | * Zero or more ERRORs, each of which is a complaint from the |
| 158 | leak checker. |
| 159 | |
| 160 | That's it. |
| 161 | |
| 162 | |
| 163 | ERROR |
| 164 | ----- |
| 165 | This shows an error, and is the most complex nonterminal. The format |
| 166 | is as follows: |
| 167 | |
| 168 | <error> |
| 169 | <unique>HEX64</unique> |
| 170 | <tid>INT</tid> |
| 171 | <kind>KIND</kind> |
| 172 | <what>TEXT</what> |
| 173 | |
| 174 | optionally: <leakedbytes>INT</leakedbytes> |
| 175 | optionally: <leakedblocks>INT</leakedblocks> |
| 176 | |
| 177 | STACK |
| 178 | |
| 179 | optionally: <auxwhat>TEXT</auxwhat> |
| 180 | optionally: STACK |
| 181 | |
| 182 | </error> |
| 183 | |
| 184 | * Each error contains a unique, arbitrary 64-bit hex number. This is |
| 185 | used to refer to the error in ERRORCOUNTS nonterminals (see below). |
| 186 | |
| 187 | * The <tid> tag indicates the Valgrind thread number. This value |
| 188 | is arbitrary but may be used to determine which threads produced |
| 189 | which errors (at least, the first instance of each error). |
| 190 | |
| 191 | * The <kind> tag specifies one of a small number of fixed error |
| 192 | types (enumerated below), so that GUIs may roughly categorise |
| 193 | errors by type if they want. |
| 194 | |
| 195 | * The <what> tag gives a human-understandable description of the |
| 196 | error. |
| 197 | |
| 198 | * For <kind> tags specifying a KIND of the form "Leak_*", the |
| 199 | optional <leakedbytes> and <leakedblocks> indicate the number of |
| 200 | bytes and blocks leaked by this error. |
| 201 | |
| 202 | * The primary STACK for this error, indicating where it occurred. |
| 203 | |
| 204 | * Some error types may have auxiliary information attached: |
| 205 | |
| 206 | <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable |
| 207 | description (usually of invalid addresses) |
| 208 | |
| 209 | STACK gives an auxiliary stack (usually the allocation/free |
| 210 | point of a block). If this STACK is present then |
| 211 | <auxwhat>TEXT</auxwhat> will precede it. |
| 212 | |
| 213 | |
| 214 | KIND |
| 215 | ---- |
| 216 | This is a small enumeration indicating roughly the nature of an error. |
| 217 | The possible values are: |
| 218 | |
| 219 | InvalidFree |
| 220 | |
| 221 | free/delete/delete[] on an invalid pointer |
| 222 | |
| 223 | MismatchedFree |
| 224 | |
| 225 | free/delete/delete[] does not match allocation function |
| 226 | (eg doing new[] then free on the result) |
| 227 | |
| 228 | InvalidRead |
| 229 | |
| 230 | read of an invalid address |
| 231 | |
| 232 | InvalidWrite |
| 233 | |
| 234 | write of an invalid address |
| 235 | |
| 236 | InvalidJump |
| 237 | |
| 238 | jump to an invalid address |
| 239 | |
| 240 | Overlap |
| 241 | |
| 242 | args overlap other otherwise bogus in eg memcpy |
| 243 | |
| 244 | InvalidMemPool |
| 245 | |
| 246 | invalid mem pool specified in client request |
| 247 | |
| 248 | UninitCondition |
| 249 | |
| 250 | conditional jump/move depends on undefined value |
| 251 | |
| 252 | UninitValue |
| 253 | |
| 254 | other use of undefined value (primarily memory addresses) |
| 255 | |
| 256 | SyscallParam |
| 257 | |
| 258 | system call params are undefined or point to |
| 259 | undefined/unaddressible memory |
| 260 | |
| 261 | ClientCheck |
| 262 | |
| 263 | "error" resulting from a client check request |
| 264 | |
| 265 | Leak_DefinitelyLost |
| 266 | |
| 267 | memory leak; the referenced blocks are definitely lost |
| 268 | |
| 269 | Leak_IndirectlyLost |
| 270 | |
| 271 | memory leak; the referenced blocks are lost because all pointers |
| 272 | to them are also in leaked blocks |
| 273 | |
| 274 | Leak_PossiblyLost |
| 275 | |
| 276 | memory leak; only interior pointers to referenced blocks were |
| 277 | found |
| 278 | |
| 279 | Leak_StillReachable |
| 280 | |
| 281 | memory leak; pointers to un-freed blocks are still available |
| 282 | |
| 283 | |
| 284 | STACK |
| 285 | ----- |
| 286 | STACK indicates locations in the program being debugged. A STACK |
| 287 | is one or more FRAMEs. The first is the innermost frame, the |
| 288 | next its caller, etc. |
| 289 | |
| 290 | <stack> |
| 291 | one or more FRAME |
| 292 | </stack> |
| 293 | |
| 294 | |
| 295 | FRAME |
| 296 | ----- |
| 297 | FRAME records a single program location: |
| 298 | |
| 299 | <frame> |
| 300 | <ip>HEX64</ip> |
| 301 | optionally <obj>TEXT</obj> |
| 302 | optionally <fn>TEXT</fn> |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 303 | optionally <dir>TEXT</dir> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 304 | optionally <file>TEXT</file> |
| 305 | optionally <line>INT</line> |
| 306 | </frame> |
| 307 | |
| 308 | Only the <ip> field is guaranteed to be present. It indicates a |
| 309 | code ("instruction pointer") address. |
| 310 | |
| 311 | The optional fields, if present, appear in the order stated: |
| 312 | |
| 313 | * obj: gives the name of the ELF object containing the code address |
| 314 | |
| 315 | * fn: gives the name of the function containing the code address |
| 316 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 317 | * dir: gives the source directory associated with the name specified |
| 318 | by <file>. Note the current implementation often does not |
| 319 | put anything useful in this field. |
| 320 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 321 | * file: gives the name of the source file containing the code address |
| 322 | |
| 323 | * line: gives the line number in the source file |
| 324 | |
| 325 | |
| 326 | ERRORCOUNTS |
| 327 | ----------- |
| 328 | This specifies, for each error that has been so far presented, |
| 329 | the number of occurrences of that error. |
| 330 | |
| 331 | <errorcounts> |
| 332 | zero or more of |
| 333 | <pair> <count>INT</count> <unique>HEX64</unique> </pair> |
| 334 | </errorcounts> |
| 335 | |
| 336 | Each <pair> gives the current error count <count> for the error with |
| 337 | unique tag </unique>. The counts do not have to give a count for each |
| 338 | error so far presented - partial information is allowable. |
| 339 | |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 340 | As at Valgrind rev 3793, error counts are only emitted at program |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 341 | termination. However, it is perfectly acceptable to periodically emit |
| 342 | error counts as the program is running. Doing so would facilitate a |
| 343 | GUI to dynamically update its error-count display as the program runs. |
| 344 | |
| 345 | |
| 346 | SUPPCOUNTS |
| 347 | ---------- |
| 348 | A SUPPCOUNTS block appears exactly once, after the program terminates. |
| 349 | It specifies the number of times each error-suppression was used. |
| 350 | Suppressions not mentioned were used zero times. |
| 351 | |
| 352 | <suppcounts> |
| 353 | zero or more of |
sewardj | 7c9e57c | 2005-05-24 14:21:45 +0000 | [diff] [blame] | 354 | <pair> <count>INT</count> <name>TEXT</name> </pair> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 355 | </suppcounts> |
| 356 | |
| 357 | The <name> is as specified in the suppression name fields in .supp |
| 358 | files. |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 359 | |