sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 1 | |
| 2 | As of May 2005, Valgrind can produce its output in XML form. The |
| 3 | intention is to provide an easily parsed, stable format which is |
| 4 | suitable for GUIs to read. |
| 5 | |
| 6 | |
| 7 | Design goals |
| 8 | ~~~~~~~~~~~~ |
| 9 | |
| 10 | * Produce XML output which is easily parsed |
| 11 | |
| 12 | * Have a stable output format which does not change much over time, so |
| 13 | that investments in parser-writing by GUI developers is not lost as |
| 14 | new versions of Valgrind appear. |
| 15 | |
| 16 | * Have an extensive output format, so that future changes to the |
| 17 | format do not break backwards compatibility with existing parsers of |
| 18 | it. |
| 19 | |
| 20 | * Produce output in a form which suitable for both offline GUIs (run |
| 21 | all the way to the end, then examine output) and interactive GUIs |
| 22 | (parse XML incrementally, update display as we go). |
| 23 | |
| 24 | * Put as much information as possible into the XML and let the GUIs |
| 25 | decide what to show the user (a.k.a provide mechanism, not policy). |
| 26 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 27 | * Make XML which is actually parseable by standard XML tools. |
| 28 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 29 | |
| 30 | How to use |
| 31 | ~~~~~~~~~~ |
| 32 | |
de | e6ca7bd | 2005-08-03 18:58:45 +0000 | [diff] [blame] | 33 | Run with flag --xml=yes. That`s all. Note however several |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 34 | caveats. |
| 35 | |
| 36 | * At the present time only Memcheck is supported. The scheme extends |
njn | 1d0825f | 2006-03-27 11:37:07 +0000 | [diff] [blame] | 37 | easily enough to cover Helgrind if needed. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 38 | |
| 39 | * When XML output is selected, various other settings are made. |
| 40 | This is in order that the output format is more controlled. |
| 41 | The settings which are changed are: |
| 42 | |
| 43 | - Suppression generation is disabled, as that would require user |
| 44 | input. |
| 45 | |
| 46 | - Attaching to GDB is disabled for the same reason. |
| 47 | |
| 48 | - The verbosity level is set to 1 (-v). |
| 49 | |
| 50 | - Error limits are disabled. Usually if the program generates a lot |
| 51 | of errors, Valgrind slows down and eventually stops collecting |
| 52 | them. When outputting XML this is not the case. |
| 53 | |
| 54 | - VEX emulation warnings are not shown. |
| 55 | |
| 56 | - File descriptor leak checking is disabled. This could be |
| 57 | re-enabled at some future point. |
| 58 | |
| 59 | - Maximum-detail leak checking is selected (--leak-check=full). |
| 60 | |
| 61 | |
| 62 | The output format |
| 63 | ~~~~~~~~~~~~~~~~~ |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 64 | For the most part this should be self descriptive. It is printed in a |
| 65 | sort-of human-readable way for easy understanding. You may want to |
| 66 | read the rest of this together with the results of "valgrind --xml=yes |
| 67 | memcheck/tests/xml1" as an example. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 68 | |
| 69 | All tags are balanced: a <foo> tag is always closed by </foo>. Hence |
| 70 | in the description that follows, mention of a tag <foo> implicitly |
| 71 | means there is a matching closing tag </foo>. |
| 72 | |
| 73 | Symbols in CAPITALS are nonterminals in the grammar and are defined |
| 74 | somewhere below. The root nonterminal is TOPLEVEL. |
| 75 | |
| 76 | The following nonterminals are not described further: |
| 77 | INT is a 64-bit signed decimal integer. |
| 78 | TEXT is arbitrary text. |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 79 | HEX64 is a 64-bit hexadecimal number, with leading "0x". |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 80 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 81 | Text strings are escaped so as to remove the <, > and & characters |
| 82 | which would otherwise mess up parsing. They are replaced respectively |
| 83 | with the standard encodings "<", ">" and "&" respectively. |
| 84 | Note this is not (yet) done throughout, only for function names in |
| 85 | <frame>..</frame> tags-pairs. |
| 86 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 87 | |
| 88 | TOPLEVEL |
| 89 | -------- |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 90 | |
| 91 | The first line output is always this: |
| 92 | |
| 93 | <?xml version="1.0"?> |
| 94 | |
| 95 | All remaining output is contained within the tag-pair |
| 96 | <valgrindoutput>. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 97 | |
| 98 | Inside that, the first entity is an indication of the protocol |
| 99 | version. This is provided so that existing parsers can identify XML |
| 100 | created by future versions of Valgrind merely by observing that the |
de | e6ca7bd | 2005-08-03 18:58:45 +0000 | [diff] [blame] | 101 | protocol version is one they don`t understand. Hence TOPLEVEL is: |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 102 | |
sewardj | 8665d8e | 2005-06-01 17:35:23 +0000 | [diff] [blame] | 103 | <?xml version="1.0"?> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 104 | <valgrindoutput> |
| 105 | <protocolversion>INT<protocolversion> |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 106 | PROTOCOL |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 107 | </valgrindoutput> |
| 108 | |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 109 | Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame^] | 110 | 3.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version |
| 111 | 3. |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 112 | |
| 113 | |
| 114 | PROTOCOL for version 3 |
| 115 | ---------------------- |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame^] | 116 | Changes in 3.4.X (tentative): (jrs, 1 March 2008) |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 117 | |
| 118 | * There may be more than one <logfilequalifier> clause, depending on |
| 119 | how this pans out. (AshleyP perhaps to investigate) |
| 120 | |
| 121 | * Some errors may have two <auxwhat> blocks, rather than just one |
| 122 | (resulting from merge of the DATASYMS branch) |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 123 | |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame^] | 124 | * Some errors may have an ORIGIN component, indicating the origins of |
| 125 | uninitialised values. This results from the merge of the |
| 126 | OTRACK_BY_INSTRUMENTATION branch. |
| 127 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 128 | |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 129 | PROTOCOL for version 2 |
| 130 | ---------------------- |
| 131 | Version 2 is identical in every way to version 1, except that the time |
| 132 | string in |
| 133 | |
| 134 | <time>human-readable-time-string</time> |
| 135 | |
| 136 | has changed format, and is also elapsed wallclock time since process |
| 137 | start, and not local time or any such. In fact version 1 does not |
| 138 | define the format of the string so in some ways this revision is |
| 139 | irrelevant. |
| 140 | |
| 141 | |
| 142 | PROTOCOL for version 1 |
| 143 | ---------------------- |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 144 | This is the main top-level construction. Roughly speaking, it |
| 145 | contains a load of preamble, the errors from the run of the |
| 146 | program, and the result of the final leak check. Hence the |
| 147 | following in sequence: |
| 148 | |
| 149 | * Various preamble lines which give version info for the various |
| 150 | components. The text in them can be anything; it is not intended |
| 151 | for interpretation by the GUI: |
| 152 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 153 | <preamble> |
| 154 | <line>Misc version/copyright text</line> (zero or more of) |
| 155 | </preamble> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 156 | |
| 157 | * The PID of this process and of its parent: |
| 158 | |
| 159 | <pid>INT</pid> |
| 160 | <ppid>INT</ppid> |
| 161 | |
| 162 | * The name of the tool being used: |
| 163 | |
| 164 | <tool>TEXT</tool> |
| 165 | |
sewardj | ad31116 | 2005-07-19 11:25:02 +0000 | [diff] [blame] | 166 | * OPTIONALLY, if --log-file-qualifier=VAR flag was given: |
| 167 | |
| 168 | <logfilequalifier> <var>VAR</var> <value>$VAR</value> |
| 169 | </logfilequalifier> |
| 170 | |
| 171 | That is, both the name of the environment variable and its value |
| 172 | are given. |
njn | 374a36d | 2007-11-23 01:41:32 +0000 | [diff] [blame] | 173 | [update: as of v3.3.0, this is not present, as the --log-file-qualifier |
| 174 | option has been removed, replaced by the %q format specifier in --log-file.] |
sewardj | ad31116 | 2005-07-19 11:25:02 +0000 | [diff] [blame] | 175 | |
sewardj | e5e1f82 | 2005-07-19 14:59:41 +0000 | [diff] [blame] | 176 | * OPTIONALLY, if --xml-user-comment=STRING was given: |
| 177 | |
| 178 | <usercomment>STRING</usercomment> |
| 179 | |
| 180 | STRING is not escaped in any way, so that it itself may be a piece |
| 181 | of XML with arbitrary tags etc. |
| 182 | |
sewardj | b8a3dac | 2005-07-19 12:39:11 +0000 | [diff] [blame] | 183 | * The program and args: first those pertaining to Valgrind itself, and |
| 184 | then those pertaining to the program to be run under Valgrind (the |
| 185 | client): |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 186 | |
sewardj | b8a3dac | 2005-07-19 12:39:11 +0000 | [diff] [blame] | 187 | <args> |
| 188 | <vargv> |
| 189 | <exe>TEXT</exe> |
| 190 | <arg>TEXT</arg> (zero or more of) |
| 191 | </vargv> |
| 192 | <argv> |
| 193 | <exe>TEXT</exe> |
| 194 | <arg>TEXT</arg> (zero or more of) |
| 195 | </argv> |
| 196 | </args> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 197 | |
| 198 | * The following, indicating that the program has now started: |
| 199 | |
sewardj | 33e6042 | 2005-07-24 07:33:15 +0000 | [diff] [blame] | 200 | <status> <state>RUNNING</state> |
| 201 | <time>human-readable-time-string</time> |
sewardj | 68cde6f | 2005-07-19 12:17:51 +0000 | [diff] [blame] | 202 | </status> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 203 | |
| 204 | * Zero or more of (either ERROR or ERRORCOUNTS). |
| 205 | |
| 206 | * The following, indicating that the program has now finished, and |
| 207 | that the wrapup (leak checking) is happening. |
| 208 | |
sewardj | 33e6042 | 2005-07-24 07:33:15 +0000 | [diff] [blame] | 209 | <status> <state>FINISHED</state> |
| 210 | <time>human-readable-time-string</time> |
sewardj | 68cde6f | 2005-07-19 12:17:51 +0000 | [diff] [blame] | 211 | </status> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 212 | |
| 213 | * SUPPCOUNTS, indicating how many times each suppression was used. |
| 214 | |
| 215 | * Zero or more ERRORs, each of which is a complaint from the |
| 216 | leak checker. |
| 217 | |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 218 | That's it. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 219 | |
| 220 | |
| 221 | ERROR |
| 222 | ----- |
| 223 | This shows an error, and is the most complex nonterminal. The format |
| 224 | is as follows: |
| 225 | |
| 226 | <error> |
| 227 | <unique>HEX64</unique> |
| 228 | <tid>INT</tid> |
| 229 | <kind>KIND</kind> |
| 230 | <what>TEXT</what> |
| 231 | |
| 232 | optionally: <leakedbytes>INT</leakedbytes> |
| 233 | optionally: <leakedblocks>INT</leakedblocks> |
| 234 | |
| 235 | STACK |
| 236 | |
| 237 | optionally: <auxwhat>TEXT</auxwhat> |
| 238 | optionally: STACK |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame^] | 239 | optionally: ORIGIN |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 240 | |
| 241 | </error> |
| 242 | |
| 243 | * Each error contains a unique, arbitrary 64-bit hex number. This is |
| 244 | used to refer to the error in ERRORCOUNTS nonterminals (see below). |
| 245 | |
| 246 | * The <tid> tag indicates the Valgrind thread number. This value |
| 247 | is arbitrary but may be used to determine which threads produced |
| 248 | which errors (at least, the first instance of each error). |
| 249 | |
| 250 | * The <kind> tag specifies one of a small number of fixed error |
| 251 | types (enumerated below), so that GUIs may roughly categorise |
| 252 | errors by type if they want. |
| 253 | |
| 254 | * The <what> tag gives a human-understandable description of the |
| 255 | error. |
| 256 | |
| 257 | * For <kind> tags specifying a KIND of the form "Leak_*", the |
| 258 | optional <leakedbytes> and <leakedblocks> indicate the number of |
| 259 | bytes and blocks leaked by this error. |
| 260 | |
| 261 | * The primary STACK for this error, indicating where it occurred. |
| 262 | |
| 263 | * Some error types may have auxiliary information attached: |
| 264 | |
| 265 | <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable |
| 266 | description (usually of invalid addresses) |
| 267 | |
| 268 | STACK gives an auxiliary stack (usually the allocation/free |
| 269 | point of a block). If this STACK is present then |
| 270 | <auxwhat>TEXT</auxwhat> will precede it. |
| 271 | |
| 272 | |
| 273 | KIND |
| 274 | ---- |
| 275 | This is a small enumeration indicating roughly the nature of an error. |
| 276 | The possible values are: |
| 277 | |
| 278 | InvalidFree |
| 279 | |
| 280 | free/delete/delete[] on an invalid pointer |
| 281 | |
| 282 | MismatchedFree |
| 283 | |
| 284 | free/delete/delete[] does not match allocation function |
| 285 | (eg doing new[] then free on the result) |
| 286 | |
| 287 | InvalidRead |
| 288 | |
| 289 | read of an invalid address |
| 290 | |
| 291 | InvalidWrite |
| 292 | |
| 293 | write of an invalid address |
| 294 | |
| 295 | InvalidJump |
| 296 | |
| 297 | jump to an invalid address |
| 298 | |
| 299 | Overlap |
| 300 | |
| 301 | args overlap other otherwise bogus in eg memcpy |
| 302 | |
| 303 | InvalidMemPool |
| 304 | |
| 305 | invalid mem pool specified in client request |
| 306 | |
| 307 | UninitCondition |
| 308 | |
| 309 | conditional jump/move depends on undefined value |
| 310 | |
| 311 | UninitValue |
| 312 | |
| 313 | other use of undefined value (primarily memory addresses) |
| 314 | |
| 315 | SyscallParam |
| 316 | |
| 317 | system call params are undefined or point to |
| 318 | undefined/unaddressible memory |
| 319 | |
| 320 | ClientCheck |
| 321 | |
| 322 | "error" resulting from a client check request |
| 323 | |
| 324 | Leak_DefinitelyLost |
| 325 | |
| 326 | memory leak; the referenced blocks are definitely lost |
| 327 | |
| 328 | Leak_IndirectlyLost |
| 329 | |
| 330 | memory leak; the referenced blocks are lost because all pointers |
| 331 | to them are also in leaked blocks |
| 332 | |
| 333 | Leak_PossiblyLost |
| 334 | |
| 335 | memory leak; only interior pointers to referenced blocks were |
| 336 | found |
| 337 | |
| 338 | Leak_StillReachable |
| 339 | |
| 340 | memory leak; pointers to un-freed blocks are still available |
| 341 | |
| 342 | |
| 343 | STACK |
| 344 | ----- |
| 345 | STACK indicates locations in the program being debugged. A STACK |
| 346 | is one or more FRAMEs. The first is the innermost frame, the |
| 347 | next its caller, etc. |
| 348 | |
| 349 | <stack> |
| 350 | one or more FRAME |
| 351 | </stack> |
| 352 | |
| 353 | |
| 354 | FRAME |
| 355 | ----- |
| 356 | FRAME records a single program location: |
| 357 | |
| 358 | <frame> |
| 359 | <ip>HEX64</ip> |
| 360 | optionally <obj>TEXT</obj> |
| 361 | optionally <fn>TEXT</fn> |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 362 | optionally <dir>TEXT</dir> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 363 | optionally <file>TEXT</file> |
| 364 | optionally <line>INT</line> |
| 365 | </frame> |
| 366 | |
| 367 | Only the <ip> field is guaranteed to be present. It indicates a |
| 368 | code ("instruction pointer") address. |
| 369 | |
| 370 | The optional fields, if present, appear in the order stated: |
| 371 | |
| 372 | * obj: gives the name of the ELF object containing the code address |
| 373 | |
| 374 | * fn: gives the name of the function containing the code address |
| 375 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 376 | * dir: gives the source directory associated with the name specified |
| 377 | by <file>. Note the current implementation often does not |
| 378 | put anything useful in this field. |
| 379 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 380 | * file: gives the name of the source file containing the code address |
| 381 | |
| 382 | * line: gives the line number in the source file |
| 383 | |
| 384 | |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame^] | 385 | ORIGIN |
| 386 | ------ |
| 387 | ORIGIN shows the origin of uninitialised data in errors that involve |
| 388 | uninitialised data. STACK shows the origin of the uninitialised |
| 389 | value. TEXT gives a human-understandable hint as to the meaning of |
| 390 | the information in STACK. |
| 391 | |
| 392 | <origin> |
| 393 | <what>TEXT<what> |
| 394 | STACK |
| 395 | </origin> |
| 396 | |
| 397 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 398 | ERRORCOUNTS |
| 399 | ----------- |
| 400 | This specifies, for each error that has been so far presented, |
| 401 | the number of occurrences of that error. |
| 402 | |
| 403 | <errorcounts> |
| 404 | zero or more of |
| 405 | <pair> <count>INT</count> <unique>HEX64</unique> </pair> |
| 406 | </errorcounts> |
| 407 | |
| 408 | Each <pair> gives the current error count <count> for the error with |
| 409 | unique tag </unique>. The counts do not have to give a count for each |
| 410 | error so far presented - partial information is allowable. |
| 411 | |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 412 | As at Valgrind rev 3793, error counts are only emitted at program |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 413 | termination. However, it is perfectly acceptable to periodically emit |
| 414 | error counts as the program is running. Doing so would facilitate a |
| 415 | GUI to dynamically update its error-count display as the program runs. |
| 416 | |
| 417 | |
| 418 | SUPPCOUNTS |
| 419 | ---------- |
| 420 | A SUPPCOUNTS block appears exactly once, after the program terminates. |
| 421 | It specifies the number of times each error-suppression was used. |
| 422 | Suppressions not mentioned were used zero times. |
| 423 | |
| 424 | <suppcounts> |
| 425 | zero or more of |
sewardj | 7c9e57c | 2005-05-24 14:21:45 +0000 | [diff] [blame] | 426 | <pair> <count>INT</count> <name>TEXT</name> </pair> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 427 | </suppcounts> |
| 428 | |
| 429 | The <name> is as specified in the suppression name fields in .supp |
| 430 | files. |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 431 | |