blob: c38ff6a7c6a8012d7ad9d9fa31a9bc4c4a41dc36 [file] [log] [blame]
Daniel Veillard66b82892003-01-04 00:44:13 +00001<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5 <meta http-equiv="Content-Type" content="text/html">
William M. Brack008c06b2003-09-01 22:17:39 +00006 <style type="text/css"></style>
Daniel Veillard66b82892003-01-04 00:44:13 +00007<!--
8TD {font-family: Verdana,Arial,Helvetica}
9BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
10H1 {font-family: Verdana,Arial,Helvetica}
11H2 {font-family: Verdana,Arial,Helvetica}
12H3 {font-family: Verdana,Arial,Helvetica}
William M. Brack008c06b2003-09-01 22:17:39 +000013A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillard66b82892003-01-04 00:44:13 +000014 </style>
William M. Brack008c06b2003-09-01 22:17:39 +000015-->
Daniel Veillarda55b27b2003-01-06 22:20:21 +000016 <title>Libxml2 XmlTextReader Interface tutorial</title>
Daniel Veillard66b82892003-01-04 00:44:13 +000017</head>
18
19<body bgcolor="#fffacd" text="#000000">
20<h1 align="center">Libxml2 XmlTextReader Interface tutorial</h1>
21
22<p></p>
23
24<p>This document describes the use of the XmlTextReader streaming API added
Daniel Veillarde59494f2003-01-04 16:35:29 +000025to libxml2 in version 2.5.0 . This API is closely modeled after the <a
Daniel Veillard66b82892003-01-04 00:44:13 +000026href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader</a>
27and <a
28href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlReader.html">XmlReader</a>
29classes of the C# language.</p>
30
31<p>This tutorial will present the key points of this API, and working
32examples using both C and the Python bindings:</p>
33
34<p>Table of content:</p>
35<ul>
36 <li><a href="#Introducti">Introduction: why a new API</a></li>
37 <li><a href="#Walking">Walking a simple tree</a></li>
38 <li><a href="#Extracting">Extracting informations for the current
39 node</a></li>
Daniel Veillarde59494f2003-01-04 16:35:29 +000040 <li><a href="#Extracting1">Extracting informations for the
41 attributes</a></li>
Daniel Veillard66b82892003-01-04 00:44:13 +000042 <li><a href="#Validating">Validating a document</a></li>
43 <li><a href="#Entities">Entities substitution</a></li>
Daniel Veillardac297932003-04-17 12:55:35 +000044 <li><a href="#L1142">Relax-NG Validation</a></li>
45 <li><a href="#Mixing">Mixing the reader and tree or XPath
46 operations</a></li>
Daniel Veillard66b82892003-01-04 00:44:13 +000047</ul>
48
49<p></p>
50
51<h2><a name="Introducti">Introduction: why a new API</a></h2>
52
53<p>Libxml2 <a href="http://xmlsoft.org/html/libxml-tree.html">main API is
54tree based</a>, where the parsing operation results in a document loaded
55completely in memory, and expose it as a tree of nodes all availble at the
56same time. This is very simple and quite powerful, but has the major
57limitation that the size of the document that can be hamdled is limited by
58the size of the memory available. Libxml2 also provide a <a
59href="http://www.saxproject.org/">SAX</a> based API, but that version was
60designed upon one of the early <a
61href="http://www.jclark.com/xml/expat.html">expat</a> version of SAX, SAX is
62also not formally defined for C. SAX basically work by registering callbacks
63which are called directly by the parser as it progresses through the document
64streams. The problem is that this programming model is relatively complex,
65not well standardized, cannot provide validation directly, makes entity,
66namespace and base processing relatively hard.</p>
67
68<p>The <a
69href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +000070API from C#</a> provides a far simpler programming model. The API acts as a
Daniel Veillard66b82892003-01-04 00:44:13 +000071cursor going forward on the document stream and stopping at each node in the
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +000072way. The user's code keeps control of the progress and simply calls a
Daniel Veillard66b82892003-01-04 00:44:13 +000073Read() function repeatedly to progress to each node in sequence in document
74order. There is direct support for namespaces, xml:base, entity handling and
75adding DTD validation on top of it was relatively simple. This API is really
76close to the <a href="http://www.w3.org/TR/DOM-Level-2-Core/">DOM Core
77specification</a> This provides a far more standard, easy to use and powerful
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +000078API than the existing SAX. Moreover integrating extension features based on
Daniel Veillard66b82892003-01-04 00:44:13 +000079the tree seems relatively easy.</p>
80
81<p>In a nutshell the XmlTextReader API provides a simpler, more standard and
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +000082more extensible interface to handle large documents than the existing SAX
Daniel Veillard66b82892003-01-04 00:44:13 +000083version.</p>
84
85<h2><a name="Walking">Walking a simple tree</a></h2>
86
87<p>Basically the XmlTextReader API is a forward only tree walking interface.
88The basic steps are:</p>
89<ol>
90 <li>prepare a reader context operating on some input</li>
91 <li>run a loop iterating over all nodes in the document</li>
92 <li>free up the reader context</li>
93</ol>
94
95<p>Here is a basic C sample doing this:</p>
96<pre>#include &lt;libxml/xmlreader.h&gt;
97
98void processNode(xmlTextReaderPtr reader) {
99 /* handling of a node in the tree */
100}
101
102int streamFile(char *filename) {
103 xmlTextReaderPtr reader;
104 int ret;
105
106 reader = xmlNewTextReaderFilename(filename);
107 if (reader != NULL) {
108 ret = xmlTextReaderRead(reader);
109 while (ret == 1) {
110 processNode(reader);
111 ret = xmlTextReaderRead(reader);
112 }
113 xmlFreeTextReader(reader);
114 if (ret != 0) {
115 printf("%s : failed to parse\n", filename);
116 }
117 } else {
118 printf("Unable to open %s\n", filename);
119 }
120}</pre>
121
122<p>A few things to notice:</p>
123<ul>
124 <li>the include file needed : <code>libxml/xmlreader.h</code></li>
125 <li>the creation of the reader using a filename</li>
126 <li>the repeated call to xmlTextReaderRead() and how any return value
127 different from 1 should stop the loop</li>
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000128 <li>that a negative return means a parsing error</li>
Daniel Veillard66b82892003-01-04 00:44:13 +0000129 <li>how xmlFreeTextReader() should be used to free up the resources used by
130 the reader.</li>
131</ul>
132
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000133<p>Here is similar code in python for exactly the same processing:</p>
Daniel Veillard66b82892003-01-04 00:44:13 +0000134<pre>import libxml2
135
136def processNode(reader):
137 pass
138
Daniel Veillarde59494f2003-01-04 16:35:29 +0000139def streamFile(filename):
140 try:
141 reader = libxml2.newTextReaderFilename(filename)
142 except:
143 print "unable to open %s" % (filename)
144 return
Daniel Veillard66b82892003-01-04 00:44:13 +0000145
Daniel Veillard66b82892003-01-04 00:44:13 +0000146 ret = reader.Read()
Daniel Veillarde59494f2003-01-04 16:35:29 +0000147 while ret == 1:
148 processNode(reader)
149 ret = reader.Read()
150
151 if ret != 0:
Daniel Veillardac297932003-04-17 12:55:35 +0000152 print "%s : failed to parse" % (filename)</pre>
Daniel Veillard66b82892003-01-04 00:44:13 +0000153
154<p>The only things worth adding are that the <a
155href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader
156is abstracted as a class like in C#</a> with the same method names (but the
Daniel Veillarde59494f2003-01-04 16:35:29 +0000157properties are currently accessed with methods) and that one doesn't need to
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000158free the reader at the end of the processing. It will get garbage collected
159once all references have disapeared.</p>
Daniel Veillard66b82892003-01-04 00:44:13 +0000160
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000161<h2><a name="Extracting">Extracting information for the current node</a></h2>
Daniel Veillard66b82892003-01-04 00:44:13 +0000162
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000163<p>So far the example code did not indicate how information was extracted
164from the reader. It was abstrated as a call to the processNode() routine,
Daniel Veillard66b82892003-01-04 00:44:13 +0000165with the reader as the argument. At each invocation, the parser is stopped on
166a given node and the reader can be used to query those node properties. Each
167<em>Property</em> is available at the C level as a function taking a single
168xmlTextReaderPtr argument whose name is
169<code>xmlTextReader</code><em>Property</em> , if the return type is an
170<code>xmlChar *</code> string then it must be deallocated with
171<code>xmlFree()</code> to avoid leaks. For the Python interface, there is a
172<em>Property</em> method to the reader class that can be called on the
173instance. The list of the properties is based on the <a
174href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">C#
175XmlTextReader class</a> set of properties and methods:</p>
176<ul>
177 <li><em>NodeType</em>: The node type, 1 for start element, 15 for end of
178 element, 2 for attributes, 3 for text nodes, 4 for CData sections, 5 for
179 entity references, 6 for entity declarations, 7 for PIs, 8 for comments,
180 9 for the document nodes, 10 for DTD/Doctype nodes, 11 for document
181 fragment and 12 for notation nodes.</li>
182 <li><em>Name</em>: the <a
183 href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames">qualified
184 name</a> of the node, equal to (<em>Prefix</em>:)<em>LocalName</em>.</li>
185 <li><em>LocalName</em>: the <a
186 href="http://www.w3.org/TR/REC-xml-names/#NT-LocalPart">local name</a> of
187 the node.</li>
188 <li><em>Prefix</em>: a shorthand reference to the <a
189 href="http://www.w3.org/TR/REC-xml-names/">namespace</a> associated with
190 the node.</li>
191 <li><em>NamespaceUri</em>: the URI defining the <a
192 href="http://www.w3.org/TR/REC-xml-names/">namespace</a> associated with
193 the node.</li>
194 <li><em>BaseUri:</em> the base URI of the node. See the <a
195 href="http://www.w3.org/TR/xmlbase/">XML Base W3C specification</a>.</li>
196 <li><em>Depth:</em> the depth of the node in the tree, starts at 0 for the
197 root node.</li>
198 <li><em>HasAttributes</em>: whether the node has attributes.</li>
199 <li><em>HasValue</em>: whether the node can have a text value.</li>
200 <li><em>Value</em>: provides the text value of the node if present.</li>
201 <li><em>IsDefault</em>: whether an Attribute node was generated from the
202 default value defined in the DTD or schema (<em>unsupported
203 yet</em>).</li>
204 <li><em>XmlLang</em>: the <a
205 href="http://www.w3.org/TR/REC-xml#sec-lang-tag">xml:lang</a> scope
206 within which the node resides.</li>
207 <li><em>IsEmptyElement</em>: check if the current node is empty, this is a
208 bit bizarre in the sense that <code>&lt;a/&gt;</code> will be considered
209 empty while <code>&lt;a&gt;&lt;/a&gt;</code> will not.</li>
210 <li><em>AttributeCount</em>: provides the number of attributes of the
211 current node.</li>
212</ul>
213
Daniel Veillarde59494f2003-01-04 16:35:29 +0000214<p>Let's look first at a small example to get this in practice by redefining
215the processNode() function in the Python example:</p>
216<pre>def processNode(reader):
217 print "%d %d %s %d" % (reader.Depth(), reader.NodeType(),
218 reader.Name(), reader.IsEmptyElement())</pre>
219
220<p>and look at the result of calling streamFile("tst.xml") for various
221content of the XML test file.</p>
222
223<p>For the minimal document "<code>&lt;doc/&gt;</code>" we get:</p>
224<pre>0 1 doc 1</pre>
225
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000226<p>Only one node is found, its depth is 0, type 1 indicate an element start,
Daniel Veillarde59494f2003-01-04 16:35:29 +0000227of name "doc" and it is empty. Trying now with
228"<code>&lt;doc&gt;&lt;/doc&gt;</code>" instead leads to:</p>
229<pre>0 1 doc 0
2300 15 doc 0</pre>
231
232<p>The document root node is not flagged as empty anymore and both a start
233and an end of element are detected. The following document shows how
234character data are reported:</p>
235<pre>&lt;doc&gt;&lt;a/&gt;&lt;b&gt;some text&lt;/b&gt;
236&lt;c/&gt;&lt;/doc&gt;</pre>
237
238<p>We modifying the processNode() function to also report the node Value:</p>
239<pre>def processNode(reader):
240 print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(),
241 reader.Name(), reader.IsEmptyElement(),
242 reader.Value())</pre>
243
244<p>The result of the test is:</p>
245<pre>0 1 doc 0 None
2461 1 a 1 None
2471 1 b 0 None
2482 3 #text 0 some text
2491 15 b 0 None
2501 3 #text 0
251
2521 1 c 1 None
2530 15 doc 0 None</pre>
254
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000255<p>There are a few things to note:</p>
Daniel Veillarde59494f2003-01-04 16:35:29 +0000256<ul>
257 <li>the increase of the depth value (first row) as children nodes are
258 explored</li>
259 <li>the text node child of the b element, of type 3 and its content</li>
260 <li>the text node containing the line return between elements b and c</li>
261 <li>that elements have the Value None (or NULL in C)</li>
262</ul>
263
264<p>The equivalent routine for <code>processNode()</code> as used by
265<code>xmllint --stream --debug</code> is the following and can be found in
266the xmllint.c module in the source distribution:</p>
267<pre>static void processNode(xmlTextReaderPtr reader) {
268 xmlChar *name, *value;
269
270 name = xmlTextReaderName(reader);
271 if (name == NULL)
272 name = xmlStrdup(BAD_CAST "--");
273 value = xmlTextReaderValue(reader);
274
275 printf("%d %d %s %d",
276 xmlTextReaderDepth(reader),
277 xmlTextReaderNodeType(reader),
278 name,
279 xmlTextReaderIsEmptyElement(reader));
280 xmlFree(name);
281 if (value == NULL)
282 printf("\n");
283 else {
284 printf(" %s\n", value);
285 xmlFree(value);
286 }
287}</pre>
288
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000289<h2><a name="Extracting1">Extracting information for the attributes</a></h2>
Daniel Veillarde59494f2003-01-04 16:35:29 +0000290
291<p>The previous examples don't indicate how attributes are processed. The
292simple test "<code>&lt;doc a="b"/&gt;</code>" provides the following
293result:</p>
294<pre>0 1 doc 1 None</pre>
295
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000296<p>This proves that attribute nodes are not traversed by default. The
Daniel Veillarde59494f2003-01-04 16:35:29 +0000297<em>HasAttributes</em> property allow to detect their presence. To check
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000298their content the API has special instructions. Basically two kinds of operations
Daniel Veillarde59494f2003-01-04 16:35:29 +0000299are possible:</p>
300<ol>
301 <li>to move the reader to the attribute nodes of the current element, in
302 that case the cursor is positionned on the attribute node</li>
303 <li>to directly query the element node for the attribute value</li>
304</ol>
305
306<p>In both case the attribute can be designed either by its position in the
307list of attribute (<em>MoveToAttributeNo</em> or <em>GetAttributeNo</em>) or
308by their name (and namespace):</p>
309<ul>
310 <li><em>GetAttributeNo</em>(no): provides the value of the attribute with
311 the specified index no relative to the containing element.</li>
312 <li><em>GetAttribute</em>(name): provides the value of the attribute with
313 the specified qualified name.</li>
314 <li>GetAttributeNs(localName, namespaceURI): provides the value of the
315 attribute with the specified local name and namespace URI.</li>
316 <li><em>MoveToAttributeNo</em>(no): moves the position of the current
317 instance to the attribute with the specified index relative to the
318 containing element.</li>
319 <li><em>MoveToAttribute</em>(name): moves the position of the current
320 instance to the attribute with the specified qualified name.</li>
321 <li><em>MoveToAttributeNs</em>(localName, namespaceURI): moves the position
322 of the current instance to the attribute with the specified local name
323 and namespace URI.</li>
324 <li><em>MoveToFirstAttribute</em>: moves the position of the current
325 instance to the first attribute associated with the current node.</li>
326 <li><em>MoveToNextAttribute</em>: moves the position of the current
327 instance to the next attribute associated with the current node.</li>
328 <li><em>MoveToElement</em>: moves the position of the current instance to
329 the node that contains the current Attribute node.</li>
330</ul>
331
332<p>After modifying the processNode() function to show attributes:</p>
333<pre>def processNode(reader):
334 print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(),
335 reader.Name(), reader.IsEmptyElement(),
336 reader.Value())
337 if reader.NodeType() == 1: # Element
338 while reader.MoveToNextAttribute():
339 print "-- %d %d (%s) [%s]" % (reader.Depth(), reader.NodeType(),
340 reader.Name(),reader.Value())</pre>
341
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000342<p>The output for the same input document reflects the attribute:</p>
Daniel Veillarde59494f2003-01-04 16:35:29 +0000343<pre>0 1 doc 1 None
344-- 1 2 (a) [b]</pre>
345
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000346<p>There are a couple of things to note on the attribute processing:</p>
Daniel Veillarde59494f2003-01-04 16:35:29 +0000347<ul>
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000348 <li>Their depth is the one of the carrying element plus one.</li>
349 <li>Namespace declarations are seen as attributes, as in DOM.</li>
Daniel Veillarde59494f2003-01-04 16:35:29 +0000350</ul>
Daniel Veillard66b82892003-01-04 00:44:13 +0000351
352<h2><a name="Validating">Validating a document</a></h2>
353
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000354<p>Libxml2 implementation adds some extra features on top of the XmlTextReader
355API. The main one is the ability to DTD validate the parsed document
Daniel Veillarde59494f2003-01-04 16:35:29 +0000356progressively. This is simply the activation of the associated feature of the
357parser used by the reader structure. There are a few options available
358defined as the enum xmlParserProperties in the libxml/xmlreader.h header
359file:</p>
360<ul>
361 <li>XML_PARSER_LOADDTD: force loading the DTD (without validating)</li>
362 <li>XML_PARSER_DEFAULTATTRS: force attribute defaulting (this also imply
363 loading the DTD)</li>
364 <li>XML_PARSER_VALIDATE: activate DTD validation (this also imply loading
365 the DTD)</li>
366 <li>XML_PARSER_SUBST_ENTITIES: substitute entities on the fly, entity
367 reference nodes are not generated and are replaced by their expanded
368 content.</li>
369 <li>more settings might be added, those were the one available at the 2.5.0
370 release...</li>
371</ul>
372
373<p>The GetParserProp() and SetParserProp() methods can then be used to get
374and set the values of those parser properties of the reader. For example</p>
375<pre>def parseAndValidate(file):
376 reader = libxml2.newTextReaderFilename(file)
377 reader.SetParserProp(libxml2.PARSER_VALIDATE, 1)
378 ret = reader.Read()
379 while ret == 1:
380 ret = reader.Read()
381 if ret != 0:
382 print "Error parsing and validating %s" % (file)</pre>
383
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000384<p>This routine will parse and validate the file. Error messages can be
Daniel Veillarde59494f2003-01-04 16:35:29 +0000385captured by registering an error handler. See python/tests/reader2.py for
386more complete Python examples. At the C level the equivalent call to cativate
387the validation feature is just:</p>
388<pre>ret = xmlTextReaderSetParserProp(reader, XML_PARSER_VALIDATE, 1)</pre>
389
390<p>and a return value of 0 indicates success.</p>
391
Daniel Veillard66b82892003-01-04 00:44:13 +0000392<h2><a name="Entities">Entities substitution</a></h2>
393
Daniel Veillardac297932003-04-17 12:55:35 +0000394<p>By default the xmlReader will report entities as such and not replace them
395with their content. This default behaviour can however be overriden using:</p>
Daniel Veillard067bae52003-01-05 01:27:54 +0000396
Daniel Veillardac297932003-04-17 12:55:35 +0000397<p><code>reader.SetParserProp(libxml2.PARSER_SUBST_ENTITIES,1)</code></p>
398
399<h2><a name="L1142">Relax-NG Validation</a></h2>
400
401<p style="font-size: 10pt">Introduced in version 2.5.7</p>
402
403<p>Libxml2 can now validate the document being read using the xmlReader using
404Relax-NG schemas. While the Relax NG validator can't always work in a
405streamable mode, only subsets which cannot be reduced to regular expressions
406need to have their subtree expanded for validation. In practice it means
407that, unless the schemas for the top level element content is not expressable
408as a regexp, only chunk of the document needs to be parsed while
409validating.</p>
410
411<p>The steps to do so are:</p>
412<ul>
413 <li>create a reader working on a document as usual</li>
414 <li>before any call to read associate it to a Relax NG schemas, either the
415 preparsed schemas or the URL to the schemas to use</li>
416 <li>errors will be reported the usual way, and the validity status can be
417 obtained using the IsValid() interface of the reader like for DTDs.</li>
418</ul>
419
420<p>Example, assuming the reader has already being created and that the schema
421string contains the Relax-NG schemas:</p>
Daniel Veillarde81765f2003-04-17 14:59:27 +0000422<pre><code>rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))<br>
Daniel Veillardac297932003-04-17 12:55:35 +0000423rngs = rngp.relaxNGParse()<br>
424reader.RelaxNGSetSchema(rngs)<br>
425ret = reader.Read()<br>
426while ret == 1:<br>
427 ret = reader.Read()<br>
428if ret != 0:<br>
429 print "Error parsing the document"<br>
430if reader.IsValid() != 1:<br>
431 print "Document failed to validate"</code><br>
Daniel Veillarde81765f2003-04-17 14:59:27 +0000432</pre>
433
434<p>See <code>reader6.py</code> in the sources or documentation for a complete
Daniel Veillardac297932003-04-17 12:55:35 +0000435example.</p>
436
437<h2><a name="Mixing">Mixing the reader and tree or XPath operations</a></h2>
438
439<p style="font-size: 10pt">Introduced in version 2.5.7</p>
440
441<p>While the reader is a streaming interface, its underlying implementation
442is based on the DOM builder of libxml2. As a result it is relatively simple
443to mix operations based on both models under some constraints. To do so the
444reader has an Expand() operation allowing to grow the subtree under the
Daniel Veillarde81765f2003-04-17 14:59:27 +0000445current node. It returns a pointer to a standard node which can be
446manipulated in the usual ways. The node will get all its ancestors and the
447full subtree available. Usual operations like XPath queries can be used on
448that reduced view of the document. Here is an example extracted from
449reader5.py in the sources which extract and prints the bibliography for the
450"Dragon" compiler book from the XML 1.0 recommendation:</p>
Daniel Veillardac297932003-04-17 12:55:35 +0000451<pre>f = open('../../test/valid/REC-xml-19980210.xml')
452input = libxml2.inputBuffer(f)
453reader = input.newTextReader("REC")
454res=""
455while reader.Read():
456 while reader.Name() == 'bibl':
457 node = reader.Expand() # expand the subtree
458 if node.xpathEval("@id = 'Aho'"): # use XPath on it
459 res = res + node.serialize()
460 if reader.Next() != 1: # skip the subtree
461 break;</pre>
462
MST 2003 John Fleckdbf6ae82003-11-05 04:15:16 +0000463<p>Note, however that the node instance returned by the Expand() call is only
Daniel Veillardac297932003-04-17 12:55:35 +0000464valid until the next Read() operation. The Expand() operation does not
465affects the Read() ones, however usually once processed the full subtree is
466not useful anymore, and the Next() operation allows to skip it completely and
Daniel Veillarde81765f2003-04-17 14:59:27 +0000467process to the successor or return 0 if the document end is reached.</p>
Daniel Veillard66b82892003-01-04 00:44:13 +0000468
469<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
470
471<p>$Id$</p>
472
473<p></p>
474</body>
475</html>