| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" | 
|  | 2 | "http://www.w3.org/TR/html4/loose.dtd"> | 
|  | 3 | <html> | 
|  | 4 | <head> | 
|  | 5 | <meta http-equiv="Content-Type" content="text/html"> | 
|  | 6 | <style type="text/css"> | 
|  | 7 | <!-- | 
|  | 8 | TD {font-family: Verdana,Arial,Helvetica} | 
|  | 9 | BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em} | 
|  | 10 | H1 {font-family: Verdana,Arial,Helvetica} | 
|  | 11 | H2 {font-family: Verdana,Arial,Helvetica} | 
|  | 12 | H3 {font-family: Verdana,Arial,Helvetica} | 
|  | 13 | A:link, A:visited, A:active { text-decoration: underline }--> | 
|  | 14 |  | 
|  | 15 |  | 
|  | 16 | </style> | 
| Daniel Veillard | a55b27b | 2003-01-06 22:20:21 +0000 | [diff] [blame^] | 17 | <title>Libxml2 XmlTextReader Interface tutorial</title> | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 18 | </head> | 
|  | 19 |  | 
|  | 20 | <body bgcolor="#fffacd" text="#000000"> | 
|  | 21 | <h1 align="center">Libxml2 XmlTextReader Interface tutorial</h1> | 
|  | 22 |  | 
|  | 23 | <p></p> | 
|  | 24 |  | 
|  | 25 | <p>This document describes the use of the XmlTextReader streaming API added | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 26 | to libxml2 in version 2.5.0 . This API is closely modeled after the <a | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 27 | href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader</a> | 
|  | 28 | and <a | 
|  | 29 | href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlReader.html">XmlReader</a> | 
|  | 30 | classes of the C# language.</p> | 
|  | 31 |  | 
|  | 32 | <p>This tutorial will present the key points of this API, and working | 
|  | 33 | examples using both C and the Python bindings:</p> | 
|  | 34 |  | 
|  | 35 | <p>Table of content:</p> | 
|  | 36 | <ul> | 
|  | 37 | <li><a href="#Introducti">Introduction: why a new API</a></li> | 
|  | 38 | <li><a href="#Walking">Walking a simple tree</a></li> | 
|  | 39 | <li><a href="#Extracting">Extracting informations for the current | 
|  | 40 | node</a></li> | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 41 | <li><a href="#Extracting1">Extracting informations for the | 
|  | 42 | attributes</a></li> | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 43 | <li><a href="#Validating">Validating a document</a></li> | 
|  | 44 | <li><a href="#Entities">Entities substitution</a></li> | 
|  | 45 | </ul> | 
|  | 46 |  | 
|  | 47 | <p></p> | 
|  | 48 |  | 
|  | 49 | <h2><a name="Introducti">Introduction: why a new API</a></h2> | 
|  | 50 |  | 
|  | 51 | <p>Libxml2 <a href="http://xmlsoft.org/html/libxml-tree.html">main API is | 
|  | 52 | tree based</a>, where the parsing operation results in a document loaded | 
|  | 53 | completely in memory, and expose it as a tree of nodes all availble at the | 
|  | 54 | same time. This is very simple and quite powerful, but has the major | 
|  | 55 | limitation that the size of the document that can be hamdled is limited by | 
|  | 56 | the size of the memory available. Libxml2 also provide a <a | 
|  | 57 | href="http://www.saxproject.org/">SAX</a> based API, but that version was | 
|  | 58 | designed upon one of the early <a | 
|  | 59 | href="http://www.jclark.com/xml/expat.html">expat</a> version of SAX, SAX is | 
|  | 60 | also not formally defined for C. SAX basically work by registering callbacks | 
|  | 61 | which are called directly by the parser as it progresses through the document | 
|  | 62 | streams. The problem is that this programming model is relatively complex, | 
|  | 63 | not well standardized, cannot provide validation directly, makes entity, | 
|  | 64 | namespace and base processing relatively hard.</p> | 
|  | 65 |  | 
|  | 66 | <p>The <a | 
|  | 67 | href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">XmlTextReader | 
|  | 68 | API from C#</a> provides a far simpler programming model, the API act as a | 
|  | 69 | cursor going forward on the document stream and stopping at each node in the | 
|  | 70 | way. The user code keep the control of the progresses and simply call a | 
|  | 71 | Read() function repeatedly to progress to each node in sequence in document | 
|  | 72 | order. There is direct support for namespaces, xml:base, entity handling and | 
|  | 73 | adding DTD validation on top of it was relatively simple. This API is really | 
|  | 74 | close to the <a href="http://www.w3.org/TR/DOM-Level-2-Core/">DOM Core | 
|  | 75 | specification</a> This provides a far more standard, easy to use and powerful | 
|  | 76 | API than the existing SAX. Moreover integrating extension feature based on | 
|  | 77 | the tree seems relatively easy.</p> | 
|  | 78 |  | 
|  | 79 | <p>In a nutshell the XmlTextReader API provides a simpler, more standard and | 
|  | 80 | more extensible interface to handle large document than the existing SAX | 
|  | 81 | version.</p> | 
|  | 82 |  | 
|  | 83 | <h2><a name="Walking">Walking a simple tree</a></h2> | 
|  | 84 |  | 
|  | 85 | <p>Basically the XmlTextReader API is a forward only tree walking interface. | 
|  | 86 | The basic steps are:</p> | 
|  | 87 | <ol> | 
|  | 88 | <li>prepare a reader context operating on some input</li> | 
|  | 89 | <li>run a loop iterating over all nodes in the document</li> | 
|  | 90 | <li>free up the reader context</li> | 
|  | 91 | </ol> | 
|  | 92 |  | 
|  | 93 | <p>Here is a basic C sample doing this:</p> | 
|  | 94 | <pre>#include <libxml/xmlreader.h> | 
|  | 95 |  | 
|  | 96 | void processNode(xmlTextReaderPtr reader) { | 
|  | 97 | /* handling of a node in the tree */ | 
|  | 98 | } | 
|  | 99 |  | 
|  | 100 | int streamFile(char *filename) { | 
|  | 101 | xmlTextReaderPtr reader; | 
|  | 102 | int ret; | 
|  | 103 |  | 
|  | 104 | reader = xmlNewTextReaderFilename(filename); | 
|  | 105 | if (reader != NULL) { | 
|  | 106 | ret = xmlTextReaderRead(reader); | 
|  | 107 | while (ret == 1) { | 
|  | 108 | processNode(reader); | 
|  | 109 | ret = xmlTextReaderRead(reader); | 
|  | 110 | } | 
|  | 111 | xmlFreeTextReader(reader); | 
|  | 112 | if (ret != 0) { | 
|  | 113 | printf("%s : failed to parse\n", filename); | 
|  | 114 | } | 
|  | 115 | } else { | 
|  | 116 | printf("Unable to open %s\n", filename); | 
|  | 117 | } | 
|  | 118 | }</pre> | 
|  | 119 |  | 
|  | 120 | <p>A few things to notice:</p> | 
|  | 121 | <ul> | 
|  | 122 | <li>the include file needed : <code>libxml/xmlreader.h</code></li> | 
|  | 123 | <li>the creation of the reader using a filename</li> | 
|  | 124 | <li>the repeated call to xmlTextReaderRead() and how any return value | 
|  | 125 | different from 1 should stop the loop</li> | 
|  | 126 | <li>that a negative return mean a parsing error</li> | 
|  | 127 | <li>how xmlFreeTextReader() should be used to free up the resources used by | 
|  | 128 | the reader.</li> | 
|  | 129 | </ul> | 
|  | 130 |  | 
|  | 131 | <p>Here is a similar code in python for exactly the same processing:</p> | 
|  | 132 | <pre>import libxml2 | 
|  | 133 |  | 
|  | 134 | def processNode(reader): | 
|  | 135 | pass | 
|  | 136 |  | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 137 | def streamFile(filename): | 
|  | 138 | try: | 
|  | 139 | reader = libxml2.newTextReaderFilename(filename) | 
|  | 140 | except: | 
|  | 141 | print "unable to open %s" % (filename) | 
|  | 142 | return | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 143 |  | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 144 | ret = reader.Read() | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 145 | while ret == 1: | 
|  | 146 | processNode(reader) | 
|  | 147 | ret = reader.Read() | 
|  | 148 |  | 
|  | 149 | if ret != 0: | 
|  | 150 | print "%s : failed to parse" % (filename) | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 151 | </pre> | 
|  | 152 |  | 
|  | 153 | <p>The only things worth adding are that the <a | 
|  | 154 | href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader | 
|  | 155 | is abstracted as a class like in C#</a> with the same method names (but the | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 156 | properties are currently accessed with methods) and that one doesn't need to | 
|  | 157 | free the reader at the end of the processing, it will get garbage collected | 
|  | 158 | once all references have disapeared</p> | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 159 |  | 
|  | 160 | <h2><a name="Extracting">Extracting informations for the current node</a></h2> | 
|  | 161 |  | 
|  | 162 | <p>So far the example code did not indicate how informations were extracted | 
|  | 163 | from the reader, it was abstrated as a call to the processNode() routine, | 
|  | 164 | with the reader as the argument. At each invocation, the parser is stopped on | 
|  | 165 | a given node and the reader can be used to query those node properties. Each | 
|  | 166 | <em>Property</em> is available at the C level as a function taking a single | 
|  | 167 | xmlTextReaderPtr argument whose name is | 
|  | 168 | <code>xmlTextReader</code><em>Property</em> , if the return type is an | 
|  | 169 | <code>xmlChar *</code> string then it must be deallocated with | 
|  | 170 | <code>xmlFree()</code> to avoid leaks. For the Python interface, there is a | 
|  | 171 | <em>Property</em> method to the reader class that can be called on the | 
|  | 172 | instance. The list of the properties is based on the <a | 
|  | 173 | href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">C# | 
|  | 174 | XmlTextReader class</a> set of properties and methods:</p> | 
|  | 175 | <ul> | 
|  | 176 | <li><em>NodeType</em>: The node type, 1 for start element, 15 for end of | 
|  | 177 | element, 2 for attributes, 3 for text nodes, 4 for CData sections, 5 for | 
|  | 178 | entity references, 6 for entity declarations, 7 for PIs, 8 for comments, | 
|  | 179 | 9 for the document nodes, 10 for DTD/Doctype nodes, 11 for document | 
|  | 180 | fragment and 12 for notation nodes.</li> | 
|  | 181 | <li><em>Name</em>: the <a | 
|  | 182 | href="http://www.w3.org/TR/REC-xml-names/#ns-qualnames">qualified | 
|  | 183 | name</a> of the node, equal to (<em>Prefix</em>:)<em>LocalName</em>.</li> | 
|  | 184 | <li><em>LocalName</em>: the <a | 
|  | 185 | href="http://www.w3.org/TR/REC-xml-names/#NT-LocalPart">local name</a> of | 
|  | 186 | the node.</li> | 
|  | 187 | <li><em>Prefix</em>: a  shorthand reference to the <a | 
|  | 188 | href="http://www.w3.org/TR/REC-xml-names/">namespace</a> associated with | 
|  | 189 | the node.</li> | 
|  | 190 | <li><em>NamespaceUri</em>: the URI defining the <a | 
|  | 191 | href="http://www.w3.org/TR/REC-xml-names/">namespace</a> associated with | 
|  | 192 | the node.</li> | 
|  | 193 | <li><em>BaseUri:</em> the base URI of the node. See the <a | 
|  | 194 | href="http://www.w3.org/TR/xmlbase/">XML Base W3C specification</a>.</li> | 
|  | 195 | <li><em>Depth:</em> the depth of the node in the tree, starts at 0 for the | 
|  | 196 | root node.</li> | 
|  | 197 | <li><em>HasAttributes</em>: whether the node has attributes.</li> | 
|  | 198 | <li><em>HasValue</em>: whether the node can have a text value.</li> | 
|  | 199 | <li><em>Value</em>: provides the text value of the node if present.</li> | 
|  | 200 | <li><em>IsDefault</em>: whether an Attribute  node was generated from the | 
|  | 201 | default value defined in the DTD or schema (<em>unsupported | 
|  | 202 | yet</em>).</li> | 
|  | 203 | <li><em>XmlLang</em>: the <a | 
|  | 204 | href="http://www.w3.org/TR/REC-xml#sec-lang-tag">xml:lang</a> scope | 
|  | 205 | within which the node resides.</li> | 
|  | 206 | <li><em>IsEmptyElement</em>: check if the current node is empty, this is a | 
|  | 207 | bit bizarre in the sense that <code><a/></code> will be considered | 
|  | 208 | empty while <code><a></a></code> will not.</li> | 
|  | 209 | <li><em>AttributeCount</em>: provides the number of attributes of the | 
|  | 210 | current node.</li> | 
|  | 211 | </ul> | 
|  | 212 |  | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 213 | <p>Let's look first at a small example to get this in practice by redefining | 
|  | 214 | the processNode() function in the Python example:</p> | 
|  | 215 | <pre>def processNode(reader): | 
|  | 216 | print "%d %d %s %d" % (reader.Depth(), reader.NodeType(), | 
|  | 217 | reader.Name(), reader.IsEmptyElement())</pre> | 
|  | 218 |  | 
|  | 219 | <p>and look at the result of calling streamFile("tst.xml") for various | 
|  | 220 | content of the XML test file.</p> | 
|  | 221 |  | 
|  | 222 | <p>For the minimal document "<code><doc/></code>" we get:</p> | 
|  | 223 | <pre>0 1 doc 1</pre> | 
|  | 224 |  | 
|  | 225 | <p>Only one node is found, its depth is 0, type 1 indocate an element start, | 
|  | 226 | of name "doc" and it is empty. Trying now with | 
|  | 227 | "<code><doc></doc></code>" instead leads to:</p> | 
|  | 228 | <pre>0 1 doc 0 | 
|  | 229 | 0 15 doc 0</pre> | 
|  | 230 |  | 
|  | 231 | <p>The document root node is not flagged as empty anymore and both a start | 
|  | 232 | and an end of element are detected. The following document shows how | 
|  | 233 | character data are reported:</p> | 
|  | 234 | <pre><doc><a/><b>some text</b> | 
|  | 235 | <c/></doc></pre> | 
|  | 236 |  | 
|  | 237 | <p>We modifying the processNode() function to also report the node Value:</p> | 
|  | 238 | <pre>def processNode(reader): | 
|  | 239 | print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(), | 
|  | 240 | reader.Name(), reader.IsEmptyElement(), | 
|  | 241 | reader.Value())</pre> | 
|  | 242 |  | 
|  | 243 | <p>The result of the test is:</p> | 
|  | 244 | <pre>0 1 doc 0 None | 
|  | 245 | 1 1 a 1 None | 
|  | 246 | 1 1 b 0 None | 
|  | 247 | 2 3 #text 0 some text | 
|  | 248 | 1 15 b 0 None | 
|  | 249 | 1 3 #text 0 | 
|  | 250 |  | 
|  | 251 | 1 1 c 1 None | 
|  | 252 | 0 15 doc 0 None</pre> | 
|  | 253 |  | 
|  | 254 | <p>There is a few things to note:</p> | 
|  | 255 | <ul> | 
|  | 256 | <li>the increase of the depth value (first row) as children nodes are | 
|  | 257 | explored</li> | 
|  | 258 | <li>the text node child of the b element, of type 3 and its content</li> | 
|  | 259 | <li>the text node containing the line return between elements b and c</li> | 
|  | 260 | <li>that elements have the Value None (or NULL in C)</li> | 
|  | 261 | </ul> | 
|  | 262 |  | 
|  | 263 | <p>The equivalent routine for <code>processNode()</code> as used by | 
|  | 264 | <code>xmllint --stream --debug</code> is the following and can be found in | 
|  | 265 | the xmllint.c module in the source distribution:</p> | 
|  | 266 | <pre>static void processNode(xmlTextReaderPtr reader) { | 
|  | 267 | xmlChar *name, *value; | 
|  | 268 |  | 
|  | 269 | name = xmlTextReaderName(reader); | 
|  | 270 | if (name == NULL) | 
|  | 271 | name = xmlStrdup(BAD_CAST "--"); | 
|  | 272 | value = xmlTextReaderValue(reader); | 
|  | 273 |  | 
|  | 274 | printf("%d %d %s %d", | 
|  | 275 | xmlTextReaderDepth(reader), | 
|  | 276 | xmlTextReaderNodeType(reader), | 
|  | 277 | name, | 
|  | 278 | xmlTextReaderIsEmptyElement(reader)); | 
|  | 279 | xmlFree(name); | 
|  | 280 | if (value == NULL) | 
|  | 281 | printf("\n"); | 
|  | 282 | else { | 
|  | 283 | printf(" %s\n", value); | 
|  | 284 | xmlFree(value); | 
|  | 285 | } | 
|  | 286 | }</pre> | 
|  | 287 |  | 
|  | 288 | <h2><a name="Extracting1">Extracting informations for the attributes</a></h2> | 
|  | 289 |  | 
|  | 290 | <p>The previous examples don't indicate how attributes are processed. The | 
|  | 291 | simple test "<code><doc a="b"/></code>" provides the following | 
|  | 292 | result:</p> | 
|  | 293 | <pre>0 1 doc 1 None</pre> | 
|  | 294 |  | 
|  | 295 | <p>This prove that attributes nodes are not traversed by default. The | 
|  | 296 | <em>HasAttributes</em> property allow to detect their presence. To check | 
|  | 297 | their content the API has special instructions basically 2 kind of operations | 
|  | 298 | are possible:</p> | 
|  | 299 | <ol> | 
|  | 300 | <li>to move the reader to the attribute nodes of the current element, in | 
|  | 301 | that case the cursor is positionned on the attribute node</li> | 
|  | 302 | <li>to directly query the element node for the attribute value</li> | 
|  | 303 | </ol> | 
|  | 304 |  | 
|  | 305 | <p>In both case the attribute can be designed either by its position in the | 
|  | 306 | list of attribute (<em>MoveToAttributeNo</em> or <em>GetAttributeNo</em>) or | 
|  | 307 | by their name (and namespace):</p> | 
|  | 308 | <ul> | 
|  | 309 | <li><em>GetAttributeNo</em>(no): provides the value of the attribute with | 
|  | 310 | the specified index no relative to the containing element.</li> | 
|  | 311 | <li><em>GetAttribute</em>(name): provides the value of the attribute with | 
|  | 312 | the specified qualified name.</li> | 
|  | 313 | <li>GetAttributeNs(localName, namespaceURI): provides the value of the | 
|  | 314 | attribute with the specified local name and namespace URI.</li> | 
|  | 315 | <li><em>MoveToAttributeNo</em>(no): moves the position of the current | 
|  | 316 | instance to the attribute with the specified index relative to the | 
|  | 317 | containing element.</li> | 
|  | 318 | <li><em>MoveToAttribute</em>(name): moves the position of the current | 
|  | 319 | instance to the attribute with the specified qualified name.</li> | 
|  | 320 | <li><em>MoveToAttributeNs</em>(localName, namespaceURI): moves the position | 
|  | 321 | of the current instance to the attribute with the specified local name | 
|  | 322 | and namespace URI.</li> | 
|  | 323 | <li><em>MoveToFirstAttribute</em>: moves the position of the current | 
|  | 324 | instance to the first attribute associated with the current node.</li> | 
|  | 325 | <li><em>MoveToNextAttribute</em>: moves the position of the current | 
|  | 326 | instance to the next attribute associated with the current node.</li> | 
|  | 327 | <li><em>MoveToElement</em>: moves the position of the current instance to | 
|  | 328 | the node that contains the current Attribute  node.</li> | 
|  | 329 | </ul> | 
|  | 330 |  | 
|  | 331 | <p>After modifying the processNode() function to show attributes:</p> | 
|  | 332 | <pre>def processNode(reader): | 
|  | 333 | print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(), | 
|  | 334 | reader.Name(), reader.IsEmptyElement(), | 
|  | 335 | reader.Value()) | 
|  | 336 | if reader.NodeType() == 1: # Element | 
|  | 337 | while reader.MoveToNextAttribute(): | 
|  | 338 | print "-- %d %d (%s) [%s]" % (reader.Depth(), reader.NodeType(), | 
|  | 339 | reader.Name(),reader.Value())</pre> | 
|  | 340 |  | 
|  | 341 | <p>the output for the same input document reflects the attribute:</p> | 
|  | 342 | <pre>0 1 doc 1 None | 
|  | 343 | -- 1 2 (a) [b]</pre> | 
|  | 344 |  | 
|  | 345 | <p>There is a couple of things to note on the attribute processing:</p> | 
|  | 346 | <ul> | 
|  | 347 | <li>their depth is the one of the carrying element plus one</li> | 
|  | 348 | <li>namespace declarations are seen as attributes like in DOM</li> | 
|  | 349 | </ul> | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 350 |  | 
|  | 351 | <h2><a name="Validating">Validating a document</a></h2> | 
|  | 352 |  | 
| Daniel Veillard | e59494f | 2003-01-04 16:35:29 +0000 | [diff] [blame] | 353 | <p>Libxml2 implementation adds some extra feature on top of the XmlTextReader | 
|  | 354 | API, the main one is the ability to DTD validate the parsed document | 
|  | 355 | progressively. This is simply the activation of the associated feature of the | 
|  | 356 | parser used by the reader structure. There are a few options available | 
|  | 357 | defined as the enum xmlParserProperties in the libxml/xmlreader.h header | 
|  | 358 | file:</p> | 
|  | 359 | <ul> | 
|  | 360 | <li>XML_PARSER_LOADDTD: force loading the DTD (without validating)</li> | 
|  | 361 | <li>XML_PARSER_DEFAULTATTRS: force attribute defaulting (this also imply | 
|  | 362 | loading the DTD)</li> | 
|  | 363 | <li>XML_PARSER_VALIDATE: activate DTD validation (this also imply loading | 
|  | 364 | the DTD)</li> | 
|  | 365 | <li>XML_PARSER_SUBST_ENTITIES: substitute entities on the fly, entity | 
|  | 366 | reference nodes are not generated and are replaced by their expanded | 
|  | 367 | content.</li> | 
|  | 368 | <li>more settings might be added, those were the one available at the 2.5.0 | 
|  | 369 | release...</li> | 
|  | 370 | </ul> | 
|  | 371 |  | 
|  | 372 | <p>The GetParserProp() and SetParserProp() methods can then be used to get | 
|  | 373 | and set the values of those parser properties of the reader. For example</p> | 
|  | 374 | <pre>def parseAndValidate(file): | 
|  | 375 | reader = libxml2.newTextReaderFilename(file) | 
|  | 376 | reader.SetParserProp(libxml2.PARSER_VALIDATE, 1) | 
|  | 377 | ret = reader.Read() | 
|  | 378 | while ret == 1: | 
|  | 379 | ret = reader.Read() | 
|  | 380 | if ret != 0: | 
|  | 381 | print "Error parsing and validating %s" % (file)</pre> | 
|  | 382 |  | 
|  | 383 | <p>This routine will parse and validate the file. Errors message can be | 
|  | 384 | captured by registering an error handler. See python/tests/reader2.py for | 
|  | 385 | more complete Python examples. At the C level the equivalent call to cativate | 
|  | 386 | the validation feature is just:</p> | 
|  | 387 | <pre>ret = xmlTextReaderSetParserProp(reader, XML_PARSER_VALIDATE, 1)</pre> | 
|  | 388 |  | 
|  | 389 | <p>and a return value of 0 indicates success.</p> | 
|  | 390 |  | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 391 | <h2><a name="Entities">Entities substitution</a></h2> | 
|  | 392 |  | 
| Daniel Veillard | 067bae5 | 2003-01-05 01:27:54 +0000 | [diff] [blame] | 393 | <p>@@TODO@@</p> | 
|  | 394 |  | 
| Daniel Veillard | 66b8289 | 2003-01-04 00:44:13 +0000 | [diff] [blame] | 395 | <p> </p> | 
|  | 396 |  | 
|  | 397 | <p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p> | 
|  | 398 |  | 
|  | 399 | <p>$Id$</p> | 
|  | 400 |  | 
|  | 401 | <p></p> | 
|  | 402 | </body> | 
|  | 403 | </html> |