sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 1 | |
sewardj | 6ea37fe | 2009-07-15 14:52:52 +0000 | [diff] [blame] | 2 | Note, 11 May 2009. The XML format evolved over several versions, |
| 3 | as expected. This file describes 3 different versions of the |
| 4 | format (called Protocols 1, 2 and 3 respectively). As of 11 May 09 |
| 5 | a fourth version, Protocol 4, was defined, and that is described |
| 6 | in xml-output-protocol4.txt. |
| 7 | |
| 8 | The original May 2005 introduction follows. These comments are |
| 9 | correct up to and including Protocol 3, which was used in the Valgrind |
| 10 | 3.4.x series. However, there were some more significant changes in |
| 11 | the format and the required flags for Valgrind, in Protocol 4. |
| 12 | |
| 13 | ---------------------- |
| 14 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 15 | As of May 2005, Valgrind can produce its output in XML form. The |
| 16 | intention is to provide an easily parsed, stable format which is |
| 17 | suitable for GUIs to read. |
| 18 | |
| 19 | |
| 20 | Design goals |
| 21 | ~~~~~~~~~~~~ |
| 22 | |
| 23 | * Produce XML output which is easily parsed |
| 24 | |
| 25 | * Have a stable output format which does not change much over time, so |
| 26 | that investments in parser-writing by GUI developers is not lost as |
| 27 | new versions of Valgrind appear. |
| 28 | |
sewardj | 6ea37fe | 2009-07-15 14:52:52 +0000 | [diff] [blame] | 29 | * Have an extensible output format, so that future changes to the |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 30 | format do not break backwards compatibility with existing parsers of |
| 31 | it. |
| 32 | |
| 33 | * Produce output in a form which suitable for both offline GUIs (run |
| 34 | all the way to the end, then examine output) and interactive GUIs |
| 35 | (parse XML incrementally, update display as we go). |
| 36 | |
| 37 | * Put as much information as possible into the XML and let the GUIs |
| 38 | decide what to show the user (a.k.a provide mechanism, not policy). |
| 39 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 40 | * Make XML which is actually parseable by standard XML tools. |
| 41 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 42 | |
| 43 | How to use |
| 44 | ~~~~~~~~~~ |
| 45 | |
sewardj | 6ea37fe | 2009-07-15 14:52:52 +0000 | [diff] [blame] | 46 | Run with flag --xml=yes. That's all. Note however several |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 47 | caveats. |
| 48 | |
| 49 | * At the present time only Memcheck is supported. The scheme extends |
njn | 1d0825f | 2006-03-27 11:37:07 +0000 | [diff] [blame] | 50 | easily enough to cover Helgrind if needed. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 51 | |
| 52 | * When XML output is selected, various other settings are made. |
| 53 | This is in order that the output format is more controlled. |
| 54 | The settings which are changed are: |
| 55 | |
| 56 | - Suppression generation is disabled, as that would require user |
| 57 | input. |
| 58 | |
| 59 | - Attaching to GDB is disabled for the same reason. |
| 60 | |
| 61 | - The verbosity level is set to 1 (-v). |
| 62 | |
| 63 | - Error limits are disabled. Usually if the program generates a lot |
| 64 | of errors, Valgrind slows down and eventually stops collecting |
| 65 | them. When outputting XML this is not the case. |
| 66 | |
| 67 | - VEX emulation warnings are not shown. |
| 68 | |
| 69 | - File descriptor leak checking is disabled. This could be |
| 70 | re-enabled at some future point. |
| 71 | |
| 72 | - Maximum-detail leak checking is selected (--leak-check=full). |
| 73 | |
| 74 | |
| 75 | The output format |
| 76 | ~~~~~~~~~~~~~~~~~ |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 77 | For the most part this should be self descriptive. It is printed in a |
| 78 | sort-of human-readable way for easy understanding. You may want to |
| 79 | read the rest of this together with the results of "valgrind --xml=yes |
| 80 | memcheck/tests/xml1" as an example. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 81 | |
| 82 | All tags are balanced: a <foo> tag is always closed by </foo>. Hence |
| 83 | in the description that follows, mention of a tag <foo> implicitly |
| 84 | means there is a matching closing tag </foo>. |
| 85 | |
| 86 | Symbols in CAPITALS are nonterminals in the grammar and are defined |
| 87 | somewhere below. The root nonterminal is TOPLEVEL. |
| 88 | |
| 89 | The following nonterminals are not described further: |
| 90 | INT is a 64-bit signed decimal integer. |
| 91 | TEXT is arbitrary text. |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 92 | HEX64 is a 64-bit hexadecimal number, with leading "0x". |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 93 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 94 | Text strings are escaped so as to remove the <, > and & characters |
| 95 | which would otherwise mess up parsing. They are replaced respectively |
| 96 | with the standard encodings "<", ">" and "&" respectively. |
| 97 | Note this is not (yet) done throughout, only for function names in |
| 98 | <frame>..</frame> tags-pairs. |
| 99 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 100 | |
| 101 | TOPLEVEL |
| 102 | -------- |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 103 | |
| 104 | The first line output is always this: |
| 105 | |
| 106 | <?xml version="1.0"?> |
| 107 | |
| 108 | All remaining output is contained within the tag-pair |
| 109 | <valgrindoutput>. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 110 | |
| 111 | Inside that, the first entity is an indication of the protocol |
| 112 | version. This is provided so that existing parsers can identify XML |
| 113 | created by future versions of Valgrind merely by observing that the |
sewardj | 6ea37fe | 2009-07-15 14:52:52 +0000 | [diff] [blame] | 114 | protocol version is one they don't understand. Hence TOPLEVEL is: |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 115 | |
sewardj | 8665d8e | 2005-06-01 17:35:23 +0000 | [diff] [blame] | 116 | <?xml version="1.0"?> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 117 | <valgrindoutput> |
| 118 | <protocolversion>INT<protocolversion> |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 119 | PROTOCOL |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 120 | </valgrindoutput> |
| 121 | |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 122 | Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame] | 123 | 3.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version |
| 124 | 3. |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 125 | |
| 126 | |
| 127 | PROTOCOL for version 3 |
| 128 | ---------------------- |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame] | 129 | Changes in 3.4.X (tentative): (jrs, 1 March 2008) |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 130 | |
sewardj | 4efbaa7 | 2008-06-04 06:51:58 +0000 | [diff] [blame] | 131 | * There may be more than one <logfilequalifier> clause. |
sewardj | b8b79ad | 2008-03-03 01:35:41 +0000 | [diff] [blame] | 132 | |
| 133 | * Some errors may have two <auxwhat> blocks, rather than just one |
| 134 | (resulting from merge of the DATASYMS branch) |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 135 | |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame] | 136 | * Some errors may have an ORIGIN component, indicating the origins of |
| 137 | uninitialised values. This results from the merge of the |
| 138 | OTRACK_BY_INSTRUMENTATION branch. |
| 139 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 140 | |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 141 | PROTOCOL for version 2 |
| 142 | ---------------------- |
| 143 | Version 2 is identical in every way to version 1, except that the time |
| 144 | string in |
| 145 | |
| 146 | <time>human-readable-time-string</time> |
| 147 | |
| 148 | has changed format, and is also elapsed wallclock time since process |
| 149 | start, and not local time or any such. In fact version 1 does not |
| 150 | define the format of the string so in some ways this revision is |
| 151 | irrelevant. |
| 152 | |
| 153 | |
| 154 | PROTOCOL for version 1 |
| 155 | ---------------------- |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 156 | This is the main top-level construction. Roughly speaking, it |
| 157 | contains a load of preamble, the errors from the run of the |
| 158 | program, and the result of the final leak check. Hence the |
| 159 | following in sequence: |
| 160 | |
| 161 | * Various preamble lines which give version info for the various |
| 162 | components. The text in them can be anything; it is not intended |
| 163 | for interpretation by the GUI: |
| 164 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 165 | <preamble> |
| 166 | <line>Misc version/copyright text</line> (zero or more of) |
| 167 | </preamble> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 168 | |
| 169 | * The PID of this process and of its parent: |
| 170 | |
| 171 | <pid>INT</pid> |
| 172 | <ppid>INT</ppid> |
| 173 | |
| 174 | * The name of the tool being used: |
| 175 | |
| 176 | <tool>TEXT</tool> |
| 177 | |
sewardj | ad31116 | 2005-07-19 11:25:02 +0000 | [diff] [blame] | 178 | * OPTIONALLY, if --log-file-qualifier=VAR flag was given: |
| 179 | |
| 180 | <logfilequalifier> <var>VAR</var> <value>$VAR</value> |
| 181 | </logfilequalifier> |
| 182 | |
| 183 | That is, both the name of the environment variable and its value |
| 184 | are given. |
njn | 374a36d | 2007-11-23 01:41:32 +0000 | [diff] [blame] | 185 | [update: as of v3.3.0, this is not present, as the --log-file-qualifier |
| 186 | option has been removed, replaced by the %q format specifier in --log-file.] |
sewardj | ad31116 | 2005-07-19 11:25:02 +0000 | [diff] [blame] | 187 | |
sewardj | e5e1f82 | 2005-07-19 14:59:41 +0000 | [diff] [blame] | 188 | * OPTIONALLY, if --xml-user-comment=STRING was given: |
| 189 | |
| 190 | <usercomment>STRING</usercomment> |
| 191 | |
| 192 | STRING is not escaped in any way, so that it itself may be a piece |
| 193 | of XML with arbitrary tags etc. |
| 194 | |
sewardj | b8a3dac | 2005-07-19 12:39:11 +0000 | [diff] [blame] | 195 | * The program and args: first those pertaining to Valgrind itself, and |
| 196 | then those pertaining to the program to be run under Valgrind (the |
| 197 | client): |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 198 | |
sewardj | b8a3dac | 2005-07-19 12:39:11 +0000 | [diff] [blame] | 199 | <args> |
| 200 | <vargv> |
| 201 | <exe>TEXT</exe> |
| 202 | <arg>TEXT</arg> (zero or more of) |
| 203 | </vargv> |
| 204 | <argv> |
| 205 | <exe>TEXT</exe> |
| 206 | <arg>TEXT</arg> (zero or more of) |
| 207 | </argv> |
| 208 | </args> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 209 | |
| 210 | * The following, indicating that the program has now started: |
| 211 | |
sewardj | 33e6042 | 2005-07-24 07:33:15 +0000 | [diff] [blame] | 212 | <status> <state>RUNNING</state> |
| 213 | <time>human-readable-time-string</time> |
sewardj | 68cde6f | 2005-07-19 12:17:51 +0000 | [diff] [blame] | 214 | </status> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 215 | |
| 216 | * Zero or more of (either ERROR or ERRORCOUNTS). |
| 217 | |
| 218 | * The following, indicating that the program has now finished, and |
| 219 | that the wrapup (leak checking) is happening. |
| 220 | |
sewardj | 33e6042 | 2005-07-24 07:33:15 +0000 | [diff] [blame] | 221 | <status> <state>FINISHED</state> |
| 222 | <time>human-readable-time-string</time> |
sewardj | 68cde6f | 2005-07-19 12:17:51 +0000 | [diff] [blame] | 223 | </status> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 224 | |
| 225 | * SUPPCOUNTS, indicating how many times each suppression was used. |
| 226 | |
| 227 | * Zero or more ERRORs, each of which is a complaint from the |
| 228 | leak checker. |
| 229 | |
sewardj | 6a5a69c | 2005-11-17 00:51:36 +0000 | [diff] [blame] | 230 | That's it. |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 231 | |
| 232 | |
| 233 | ERROR |
| 234 | ----- |
| 235 | This shows an error, and is the most complex nonterminal. The format |
| 236 | is as follows: |
| 237 | |
| 238 | <error> |
| 239 | <unique>HEX64</unique> |
| 240 | <tid>INT</tid> |
| 241 | <kind>KIND</kind> |
| 242 | <what>TEXT</what> |
| 243 | |
| 244 | optionally: <leakedbytes>INT</leakedbytes> |
| 245 | optionally: <leakedblocks>INT</leakedblocks> |
| 246 | |
| 247 | STACK |
| 248 | |
| 249 | optionally: <auxwhat>TEXT</auxwhat> |
| 250 | optionally: STACK |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame] | 251 | optionally: ORIGIN |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 252 | |
| 253 | </error> |
| 254 | |
| 255 | * Each error contains a unique, arbitrary 64-bit hex number. This is |
| 256 | used to refer to the error in ERRORCOUNTS nonterminals (see below). |
| 257 | |
| 258 | * The <tid> tag indicates the Valgrind thread number. This value |
| 259 | is arbitrary but may be used to determine which threads produced |
| 260 | which errors (at least, the first instance of each error). |
| 261 | |
| 262 | * The <kind> tag specifies one of a small number of fixed error |
| 263 | types (enumerated below), so that GUIs may roughly categorise |
| 264 | errors by type if they want. |
| 265 | |
| 266 | * The <what> tag gives a human-understandable description of the |
| 267 | error. |
| 268 | |
| 269 | * For <kind> tags specifying a KIND of the form "Leak_*", the |
| 270 | optional <leakedbytes> and <leakedblocks> indicate the number of |
| 271 | bytes and blocks leaked by this error. |
| 272 | |
| 273 | * The primary STACK for this error, indicating where it occurred. |
| 274 | |
| 275 | * Some error types may have auxiliary information attached: |
| 276 | |
| 277 | <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable |
| 278 | description (usually of invalid addresses) |
| 279 | |
| 280 | STACK gives an auxiliary stack (usually the allocation/free |
| 281 | point of a block). If this STACK is present then |
| 282 | <auxwhat>TEXT</auxwhat> will precede it. |
| 283 | |
| 284 | |
| 285 | KIND |
| 286 | ---- |
| 287 | This is a small enumeration indicating roughly the nature of an error. |
| 288 | The possible values are: |
| 289 | |
| 290 | InvalidFree |
| 291 | |
| 292 | free/delete/delete[] on an invalid pointer |
| 293 | |
| 294 | MismatchedFree |
| 295 | |
| 296 | free/delete/delete[] does not match allocation function |
| 297 | (eg doing new[] then free on the result) |
| 298 | |
| 299 | InvalidRead |
| 300 | |
| 301 | read of an invalid address |
| 302 | |
| 303 | InvalidWrite |
| 304 | |
| 305 | write of an invalid address |
| 306 | |
| 307 | InvalidJump |
| 308 | |
| 309 | jump to an invalid address |
| 310 | |
| 311 | Overlap |
| 312 | |
| 313 | args overlap other otherwise bogus in eg memcpy |
| 314 | |
| 315 | InvalidMemPool |
| 316 | |
| 317 | invalid mem pool specified in client request |
| 318 | |
| 319 | UninitCondition |
| 320 | |
| 321 | conditional jump/move depends on undefined value |
| 322 | |
| 323 | UninitValue |
| 324 | |
| 325 | other use of undefined value (primarily memory addresses) |
| 326 | |
| 327 | SyscallParam |
| 328 | |
| 329 | system call params are undefined or point to |
| 330 | undefined/unaddressible memory |
| 331 | |
| 332 | ClientCheck |
| 333 | |
| 334 | "error" resulting from a client check request |
| 335 | |
| 336 | Leak_DefinitelyLost |
| 337 | |
| 338 | memory leak; the referenced blocks are definitely lost |
| 339 | |
| 340 | Leak_IndirectlyLost |
| 341 | |
| 342 | memory leak; the referenced blocks are lost because all pointers |
| 343 | to them are also in leaked blocks |
| 344 | |
| 345 | Leak_PossiblyLost |
| 346 | |
| 347 | memory leak; only interior pointers to referenced blocks were |
| 348 | found |
| 349 | |
| 350 | Leak_StillReachable |
| 351 | |
| 352 | memory leak; pointers to un-freed blocks are still available |
| 353 | |
| 354 | |
| 355 | STACK |
| 356 | ----- |
| 357 | STACK indicates locations in the program being debugged. A STACK |
| 358 | is one or more FRAMEs. The first is the innermost frame, the |
| 359 | next its caller, etc. |
| 360 | |
| 361 | <stack> |
| 362 | one or more FRAME |
| 363 | </stack> |
| 364 | |
| 365 | |
| 366 | FRAME |
| 367 | ----- |
| 368 | FRAME records a single program location: |
| 369 | |
| 370 | <frame> |
| 371 | <ip>HEX64</ip> |
| 372 | optionally <obj>TEXT</obj> |
| 373 | optionally <fn>TEXT</fn> |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 374 | optionally <dir>TEXT</dir> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 375 | optionally <file>TEXT</file> |
| 376 | optionally <line>INT</line> |
| 377 | </frame> |
| 378 | |
| 379 | Only the <ip> field is guaranteed to be present. It indicates a |
| 380 | code ("instruction pointer") address. |
| 381 | |
| 382 | The optional fields, if present, appear in the order stated: |
| 383 | |
| 384 | * obj: gives the name of the ELF object containing the code address |
| 385 | |
| 386 | * fn: gives the name of the function containing the code address |
| 387 | |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 388 | * dir: gives the source directory associated with the name specified |
| 389 | by <file>. Note the current implementation often does not |
| 390 | put anything useful in this field. |
| 391 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 392 | * file: gives the name of the source file containing the code address |
| 393 | |
| 394 | * line: gives the line number in the source file |
| 395 | |
| 396 | |
sewardj | 7cf4e6b | 2008-05-01 20:24:26 +0000 | [diff] [blame] | 397 | ORIGIN |
| 398 | ------ |
| 399 | ORIGIN shows the origin of uninitialised data in errors that involve |
| 400 | uninitialised data. STACK shows the origin of the uninitialised |
| 401 | value. TEXT gives a human-understandable hint as to the meaning of |
| 402 | the information in STACK. |
| 403 | |
| 404 | <origin> |
| 405 | <what>TEXT<what> |
| 406 | STACK |
| 407 | </origin> |
| 408 | |
| 409 | |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 410 | ERRORCOUNTS |
| 411 | ----------- |
| 412 | This specifies, for each error that has been so far presented, |
| 413 | the number of occurrences of that error. |
| 414 | |
| 415 | <errorcounts> |
| 416 | zero or more of |
| 417 | <pair> <count>INT</count> <unique>HEX64</unique> </pair> |
| 418 | </errorcounts> |
| 419 | |
| 420 | Each <pair> gives the current error count <count> for the error with |
| 421 | unique tag </unique>. The counts do not have to give a count for each |
| 422 | error so far presented - partial information is allowable. |
| 423 | |
sewardj | 9e7212f | 2005-05-24 15:00:55 +0000 | [diff] [blame] | 424 | As at Valgrind rev 3793, error counts are only emitted at program |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 425 | termination. However, it is perfectly acceptable to periodically emit |
| 426 | error counts as the program is running. Doing so would facilitate a |
| 427 | GUI to dynamically update its error-count display as the program runs. |
| 428 | |
| 429 | |
| 430 | SUPPCOUNTS |
| 431 | ---------- |
| 432 | A SUPPCOUNTS block appears exactly once, after the program terminates. |
| 433 | It specifies the number of times each error-suppression was used. |
| 434 | Suppressions not mentioned were used zero times. |
| 435 | |
| 436 | <suppcounts> |
| 437 | zero or more of |
sewardj | 7c9e57c | 2005-05-24 14:21:45 +0000 | [diff] [blame] | 438 | <pair> <count>INT</count> <name>TEXT</name> </pair> |
sewardj | 9829e38 | 2005-05-24 14:17:41 +0000 | [diff] [blame] | 439 | </suppcounts> |
| 440 | |
| 441 | The <name> is as specified in the suppression name fields in .supp |
| 442 | files. |
sewardj | 57d99c5 | 2005-06-13 16:44:33 +0000 | [diff] [blame] | 443 | |