blob: 59e6c9cc124784abc127fcc35d51359f07c3f57f [file] [log] [blame]
sewardj9829e382005-05-24 14:17:41 +00001
sewardj6ea37fe2009-07-15 14:52:52 +00002Note, 11 May 2009. The XML format evolved over several versions,
3as expected. This file describes 3 different versions of the
4format (called Protocols 1, 2 and 3 respectively). As of 11 May 09
5a fourth version, Protocol 4, was defined, and that is described
6in xml-output-protocol4.txt.
7
8The original May 2005 introduction follows. These comments are
9correct up to and including Protocol 3, which was used in the Valgrind
103.4.x series. However, there were some more significant changes in
11the format and the required flags for Valgrind, in Protocol 4.
12
13 ----------------------
14
sewardj9829e382005-05-24 14:17:41 +000015As of May 2005, Valgrind can produce its output in XML form. The
16intention is to provide an easily parsed, stable format which is
17suitable for GUIs to read.
18
19
20Design goals
21~~~~~~~~~~~~
22
23* Produce XML output which is easily parsed
24
25* Have a stable output format which does not change much over time, so
26 that investments in parser-writing by GUI developers is not lost as
27 new versions of Valgrind appear.
28
sewardj6ea37fe2009-07-15 14:52:52 +000029* Have an extensible output format, so that future changes to the
sewardj9829e382005-05-24 14:17:41 +000030 format do not break backwards compatibility with existing parsers of
31 it.
32
33* Produce output in a form which suitable for both offline GUIs (run
34 all the way to the end, then examine output) and interactive GUIs
35 (parse XML incrementally, update display as we go).
36
37* Put as much information as possible into the XML and let the GUIs
38 decide what to show the user (a.k.a provide mechanism, not policy).
39
sewardj57d99c52005-06-13 16:44:33 +000040* Make XML which is actually parseable by standard XML tools.
41
sewardj9829e382005-05-24 14:17:41 +000042
43How to use
44~~~~~~~~~~
45
sewardj6ea37fe2009-07-15 14:52:52 +000046Run with flag --xml=yes. That's all. Note however several
sewardj9829e382005-05-24 14:17:41 +000047caveats.
48
49* At the present time only Memcheck is supported. The scheme extends
njn1d0825f2006-03-27 11:37:07 +000050 easily enough to cover Helgrind if needed.
sewardj9829e382005-05-24 14:17:41 +000051
52* When XML output is selected, various other settings are made.
53 This is in order that the output format is more controlled.
54 The settings which are changed are:
55
56 - Suppression generation is disabled, as that would require user
57 input.
58
59 - Attaching to GDB is disabled for the same reason.
60
61 - The verbosity level is set to 1 (-v).
62
63 - Error limits are disabled. Usually if the program generates a lot
64 of errors, Valgrind slows down and eventually stops collecting
65 them. When outputting XML this is not the case.
66
67 - VEX emulation warnings are not shown.
68
69 - File descriptor leak checking is disabled. This could be
70 re-enabled at some future point.
71
72 - Maximum-detail leak checking is selected (--leak-check=full).
73
74
75The output format
76~~~~~~~~~~~~~~~~~
sewardj9e7212f2005-05-24 15:00:55 +000077For the most part this should be self descriptive. It is printed in a
78sort-of human-readable way for easy understanding. You may want to
79read the rest of this together with the results of "valgrind --xml=yes
80memcheck/tests/xml1" as an example.
sewardj9829e382005-05-24 14:17:41 +000081
82All tags are balanced: a <foo> tag is always closed by </foo>. Hence
83in the description that follows, mention of a tag <foo> implicitly
84means there is a matching closing tag </foo>.
85
86Symbols in CAPITALS are nonterminals in the grammar and are defined
87somewhere below. The root nonterminal is TOPLEVEL.
88
89The following nonterminals are not described further:
90 INT is a 64-bit signed decimal integer.
91 TEXT is arbitrary text.
sewardj9e7212f2005-05-24 15:00:55 +000092 HEX64 is a 64-bit hexadecimal number, with leading "0x".
sewardj9829e382005-05-24 14:17:41 +000093
sewardj57d99c52005-06-13 16:44:33 +000094Text strings are escaped so as to remove the <, > and & characters
95which would otherwise mess up parsing. They are replaced respectively
96with the standard encodings "&lt;", "&gt;" and "&amp;" respectively.
97Note this is not (yet) done throughout, only for function names in
98<frame>..</frame> tags-pairs.
99
sewardj9829e382005-05-24 14:17:41 +0000100
101TOPLEVEL
102--------
sewardj57d99c52005-06-13 16:44:33 +0000103
104The first line output is always this:
105
106 <?xml version="1.0"?>
107
108All remaining output is contained within the tag-pair
109<valgrindoutput>.
sewardj9829e382005-05-24 14:17:41 +0000110
111Inside that, the first entity is an indication of the protocol
112version. This is provided so that existing parsers can identify XML
113created by future versions of Valgrind merely by observing that the
sewardj6ea37fe2009-07-15 14:52:52 +0000114protocol version is one they don't understand. Hence TOPLEVEL is:
sewardj9829e382005-05-24 14:17:41 +0000115
sewardj8665d8e2005-06-01 17:35:23 +0000116 <?xml version="1.0"?>
sewardj9829e382005-05-24 14:17:41 +0000117 <valgrindoutput>
118 <protocolversion>INT<protocolversion>
sewardj6a5a69c2005-11-17 00:51:36 +0000119 PROTOCOL
sewardj9829e382005-05-24 14:17:41 +0000120 </valgrindoutput>
121
sewardjb8b79ad2008-03-03 01:35:41 +0000122Valgrind versions 3.0.0 and 3.0.1 emit protocol version 1. Versions
sewardj7cf4e6b2008-05-01 20:24:26 +00001233.1.X and 3.2.X emit protocol version 2. 3.4.X emits protocol version
1243.
sewardjb8b79ad2008-03-03 01:35:41 +0000125
126
127PROTOCOL for version 3
128----------------------
sewardj7cf4e6b2008-05-01 20:24:26 +0000129Changes in 3.4.X (tentative): (jrs, 1 March 2008)
sewardjb8b79ad2008-03-03 01:35:41 +0000130
sewardj4efbaa72008-06-04 06:51:58 +0000131* There may be more than one <logfilequalifier> clause.
sewardjb8b79ad2008-03-03 01:35:41 +0000132
133* Some errors may have two <auxwhat> blocks, rather than just one
134 (resulting from merge of the DATASYMS branch)
sewardj9829e382005-05-24 14:17:41 +0000135
sewardj7cf4e6b2008-05-01 20:24:26 +0000136* Some errors may have an ORIGIN component, indicating the origins of
137 uninitialised values. This results from the merge of the
138 OTRACK_BY_INSTRUMENTATION branch.
139
sewardj9829e382005-05-24 14:17:41 +0000140
sewardj6a5a69c2005-11-17 00:51:36 +0000141PROTOCOL for version 2
142----------------------
143Version 2 is identical in every way to version 1, except that the time
144string in
145
146 <time>human-readable-time-string</time>
147
148has changed format, and is also elapsed wallclock time since process
149start, and not local time or any such. In fact version 1 does not
150define the format of the string so in some ways this revision is
151irrelevant.
152
153
154PROTOCOL for version 1
155----------------------
sewardj9829e382005-05-24 14:17:41 +0000156This is the main top-level construction. Roughly speaking, it
157contains a load of preamble, the errors from the run of the
158program, and the result of the final leak check. Hence the
159following in sequence:
160
161* Various preamble lines which give version info for the various
162 components. The text in them can be anything; it is not intended
163 for interpretation by the GUI:
164
sewardj57d99c52005-06-13 16:44:33 +0000165 <preamble>
166 <line>Misc version/copyright text</line> (zero or more of)
167 </preamble>
sewardj9829e382005-05-24 14:17:41 +0000168
169* The PID of this process and of its parent:
170
171 <pid>INT</pid>
172 <ppid>INT</ppid>
173
174* The name of the tool being used:
175
176 <tool>TEXT</tool>
177
sewardjad311162005-07-19 11:25:02 +0000178* OPTIONALLY, if --log-file-qualifier=VAR flag was given:
179
180 <logfilequalifier> <var>VAR</var> <value>$VAR</value>
181 </logfilequalifier>
182
183 That is, both the name of the environment variable and its value
184 are given.
njn374a36d2007-11-23 01:41:32 +0000185 [update: as of v3.3.0, this is not present, as the --log-file-qualifier
186 option has been removed, replaced by the %q format specifier in --log-file.]
sewardjad311162005-07-19 11:25:02 +0000187
sewardje5e1f822005-07-19 14:59:41 +0000188* OPTIONALLY, if --xml-user-comment=STRING was given:
189
190 <usercomment>STRING</usercomment>
191
192 STRING is not escaped in any way, so that it itself may be a piece
193 of XML with arbitrary tags etc.
194
sewardjb8a3dac2005-07-19 12:39:11 +0000195* The program and args: first those pertaining to Valgrind itself, and
196 then those pertaining to the program to be run under Valgrind (the
197 client):
sewardj9829e382005-05-24 14:17:41 +0000198
sewardjb8a3dac2005-07-19 12:39:11 +0000199 <args>
200 <vargv>
201 <exe>TEXT</exe>
202 <arg>TEXT</arg> (zero or more of)
203 </vargv>
204 <argv>
205 <exe>TEXT</exe>
206 <arg>TEXT</arg> (zero or more of)
207 </argv>
208 </args>
sewardj9829e382005-05-24 14:17:41 +0000209
210* The following, indicating that the program has now started:
211
sewardj33e60422005-07-24 07:33:15 +0000212 <status> <state>RUNNING</state>
213 <time>human-readable-time-string</time>
sewardj68cde6f2005-07-19 12:17:51 +0000214 </status>
sewardj9829e382005-05-24 14:17:41 +0000215
216* Zero or more of (either ERROR or ERRORCOUNTS).
217
218* The following, indicating that the program has now finished, and
219 that the wrapup (leak checking) is happening.
220
sewardj33e60422005-07-24 07:33:15 +0000221 <status> <state>FINISHED</state>
222 <time>human-readable-time-string</time>
sewardj68cde6f2005-07-19 12:17:51 +0000223 </status>
sewardj9829e382005-05-24 14:17:41 +0000224
225* SUPPCOUNTS, indicating how many times each suppression was used.
226
227* Zero or more ERRORs, each of which is a complaint from the
228 leak checker.
229
sewardj6a5a69c2005-11-17 00:51:36 +0000230That's it.
sewardj9829e382005-05-24 14:17:41 +0000231
232
233ERROR
234-----
235This shows an error, and is the most complex nonterminal. The format
236is as follows:
237
238 <error>
239 <unique>HEX64</unique>
240 <tid>INT</tid>
241 <kind>KIND</kind>
242 <what>TEXT</what>
243
244 optionally: <leakedbytes>INT</leakedbytes>
245 optionally: <leakedblocks>INT</leakedblocks>
246
247 STACK
248
249 optionally: <auxwhat>TEXT</auxwhat>
250 optionally: STACK
sewardj7cf4e6b2008-05-01 20:24:26 +0000251 optionally: ORIGIN
sewardj9829e382005-05-24 14:17:41 +0000252
253 </error>
254
255* Each error contains a unique, arbitrary 64-bit hex number. This is
256 used to refer to the error in ERRORCOUNTS nonterminals (see below).
257
258* The <tid> tag indicates the Valgrind thread number. This value
259 is arbitrary but may be used to determine which threads produced
260 which errors (at least, the first instance of each error).
261
262* The <kind> tag specifies one of a small number of fixed error
263 types (enumerated below), so that GUIs may roughly categorise
264 errors by type if they want.
265
266* The <what> tag gives a human-understandable description of the
267 error.
268
269* For <kind> tags specifying a KIND of the form "Leak_*", the
270 optional <leakedbytes> and <leakedblocks> indicate the number of
271 bytes and blocks leaked by this error.
272
273* The primary STACK for this error, indicating where it occurred.
274
275* Some error types may have auxiliary information attached:
276
277 <auxwhat>TEXT</auxwhat> gives an auxiliary human-readable
278 description (usually of invalid addresses)
279
280 STACK gives an auxiliary stack (usually the allocation/free
281 point of a block). If this STACK is present then
282 <auxwhat>TEXT</auxwhat> will precede it.
283
284
285KIND
286----
287This is a small enumeration indicating roughly the nature of an error.
288The possible values are:
289
290 InvalidFree
291
292 free/delete/delete[] on an invalid pointer
293
294 MismatchedFree
295
296 free/delete/delete[] does not match allocation function
297 (eg doing new[] then free on the result)
298
299 InvalidRead
300
301 read of an invalid address
302
303 InvalidWrite
304
305 write of an invalid address
306
307 InvalidJump
308
309 jump to an invalid address
310
311 Overlap
312
313 args overlap other otherwise bogus in eg memcpy
314
315 InvalidMemPool
316
317 invalid mem pool specified in client request
318
319 UninitCondition
320
321 conditional jump/move depends on undefined value
322
323 UninitValue
324
325 other use of undefined value (primarily memory addresses)
326
327 SyscallParam
328
329 system call params are undefined or point to
330 undefined/unaddressible memory
331
332 ClientCheck
333
334 "error" resulting from a client check request
335
336 Leak_DefinitelyLost
337
338 memory leak; the referenced blocks are definitely lost
339
340 Leak_IndirectlyLost
341
342 memory leak; the referenced blocks are lost because all pointers
343 to them are also in leaked blocks
344
345 Leak_PossiblyLost
346
347 memory leak; only interior pointers to referenced blocks were
348 found
349
350 Leak_StillReachable
351
352 memory leak; pointers to un-freed blocks are still available
353
354
355STACK
356-----
357STACK indicates locations in the program being debugged. A STACK
358is one or more FRAMEs. The first is the innermost frame, the
359next its caller, etc.
360
361 <stack>
362 one or more FRAME
363 </stack>
364
365
366FRAME
367-----
368FRAME records a single program location:
369
370 <frame>
371 <ip>HEX64</ip>
372 optionally <obj>TEXT</obj>
373 optionally <fn>TEXT</fn>
sewardj57d99c52005-06-13 16:44:33 +0000374 optionally <dir>TEXT</dir>
sewardj9829e382005-05-24 14:17:41 +0000375 optionally <file>TEXT</file>
376 optionally <line>INT</line>
377 </frame>
378
379Only the <ip> field is guaranteed to be present. It indicates a
380code ("instruction pointer") address.
381
382The optional fields, if present, appear in the order stated:
383
384* obj: gives the name of the ELF object containing the code address
385
386* fn: gives the name of the function containing the code address
387
sewardj57d99c52005-06-13 16:44:33 +0000388* dir: gives the source directory associated with the name specified
389 by <file>. Note the current implementation often does not
390 put anything useful in this field.
391
sewardj9829e382005-05-24 14:17:41 +0000392* file: gives the name of the source file containing the code address
393
394* line: gives the line number in the source file
395
396
sewardj7cf4e6b2008-05-01 20:24:26 +0000397ORIGIN
398------
399ORIGIN shows the origin of uninitialised data in errors that involve
400uninitialised data. STACK shows the origin of the uninitialised
401value. TEXT gives a human-understandable hint as to the meaning of
402the information in STACK.
403
404 <origin>
405 <what>TEXT<what>
406 STACK
407 </origin>
408
409
sewardj9829e382005-05-24 14:17:41 +0000410ERRORCOUNTS
411-----------
412This specifies, for each error that has been so far presented,
413the number of occurrences of that error.
414
415 <errorcounts>
416 zero or more of
417 <pair> <count>INT</count> <unique>HEX64</unique> </pair>
418 </errorcounts>
419
420Each <pair> gives the current error count <count> for the error with
421unique tag </unique>. The counts do not have to give a count for each
422error so far presented - partial information is allowable.
423
sewardj9e7212f2005-05-24 15:00:55 +0000424As at Valgrind rev 3793, error counts are only emitted at program
sewardj9829e382005-05-24 14:17:41 +0000425termination. However, it is perfectly acceptable to periodically emit
426error counts as the program is running. Doing so would facilitate a
427GUI to dynamically update its error-count display as the program runs.
428
429
430SUPPCOUNTS
431----------
432A SUPPCOUNTS block appears exactly once, after the program terminates.
433It specifies the number of times each error-suppression was used.
434Suppressions not mentioned were used zero times.
435
436 <suppcounts>
437 zero or more of
sewardj7c9e57c2005-05-24 14:21:45 +0000438 <pair> <count>INT</count> <name>TEXT</name> </pair>
sewardj9829e382005-05-24 14:17:41 +0000439 </suppcounts>
440
441The <name> is as specified in the suppression name fields in .supp
442files.
sewardj57d99c52005-06-13 16:44:33 +0000443