blob: 729ddd64f7204ccfaafb04a9960b7e81b2e07199 [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillardc9484202001-10-24 12:35:52 +00002<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
Daniel Veillardc332dab2002-03-29 14:08:27 +00005<link rel="SHORTCUT ICON" href="/favicon.ico">
Daniel Veillardc9484202001-10-24 12:35:52 +00006<style type="text/css"><!--
Daniel Veillard373a4752002-02-21 14:46:29 +00007TD {font-family: Verdana,Arial,Helvetica}
8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
9H1 {font-family: Verdana,Arial,Helvetica}
10H2 {font-family: Verdana,Arial,Helvetica}
11H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000012A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000013--></style>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000014<title>The parser interfaces</title>
Daniel Veillardc9484202001-10-24 12:35:52 +000015</head>
16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
18<td width="180">
19<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
20</td>
21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
22<h1>The XML C library for Gnome</h1>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000023<h2>The parser interfaces</h2>
Daniel Veillardc9484202001-10-24 12:35:52 +000024</td></tr></table></td></tr></table></td>
25</tr></table>
26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000027<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
28<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000029<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000030<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000032<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000033<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000034<li><a href="docs.html">Documentation</a></li>
35<li><a href="bugs.html">Reporting bugs and getting help</a></li>
36<li><a href="help.html">How to help</a></li>
37<li><a href="downloads.html">Downloads</a></li>
38<li><a href="news.html">News</a></li>
Daniel Veillard7b602b42002-01-08 13:26:00 +000039<li><a href="XMLinfo.html">XML</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000040<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillard6dbcaf82002-02-20 14:37:47 +000041<li><a href="python.html">Python and bindings</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000042<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000043<li><a href="tree.html">The tree output</a></li>
44<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000045<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
46<li><a href="xmlmem.html">Memory Management</a></li>
47<li><a href="encoding.html">Encodings support</a></li>
48<li><a href="xmlio.html">I/O Interfaces</a></li>
49<li><a href="catalog.html">Catalog support</a></li>
50<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000051<li><a href="entities.html">Entities or no entities</a></li>
52<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000053<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000054<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000055<li><a href="DOM.html">DOM Principles</a></li>
56<li><a href="example.html">A real example</a></li>
57<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000058<li>
59<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
60</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000061</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000062</table>
63<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard3bf65be2002-01-23 12:36:34 +000064<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
65<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardf8592562002-01-23 17:58:17 +000066<li><a href="APIchunk0.html">Alphabetic</a></li>
Daniel Veillard3bf65be2002-01-23 12:36:34 +000067<li><a href="APIconstructors.html">Constructors</a></li>
68<li><a href="APIfunctions.html">Functions/Types</a></li>
69<li><a href="APIfiles.html">Modules</a></li>
70<li><a href="APIsymbols.html">Symbols</a></li>
71</ul></td></tr>
72</table>
73<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard594cf0b2001-10-25 08:09:12 +000074<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000075<tr><td bgcolor="#fffacd"><ul>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000076<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
77<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
Daniel Veillard4a859202002-01-08 11:49:22 +000078<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000079<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000080<li><a href="ftp://xmlsoft.org/">FTP</a></li>
81<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000082<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillarde6d8e202002-05-02 06:11:10 +000083<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000084<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000085</ul></td></tr>
86</table>
87</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000088<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
89<p>This section is directly intended to help programmers getting bootstrapped
90using the XML library from the C language. It is not intended to be
91extensive. I hope the automatically generated documents will provide the
92completeness required, but as a separate set of documents. The interfaces of
93the XML library are by principle low level, there is nearly zero abstraction.
94Those interested in a higher level API should <a href="#DOM">look at
95DOM</a>.</p>
96<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are
97separated from the <a href="html/libxml-htmlparser.html">HTML parser
98interfaces</a>. Let's have a look at how the XML parser can be called:</p>
99<h3><a name="Invoking">Invoking the parser : the pull method</a></h3>
100<p>Usually, the first thing to do is to read an XML input. The parser accepts
101documents either from in-memory strings or from files. The functions are
102defined in &quot;parser.h&quot;:</p>
103<dl>
104<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
105<dd><p>Parse a null-terminated string containing the document.</p></dd>
106</dl>
107<dl>
108<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
109<dd><p>Parse an XML document contained in a (possibly compressed)
110 file.</p></dd>
111</dl>
112<p>The parser returns a pointer to the document structure (or NULL in case of
113failure).</p>
114<h3 id="Invoking1">Invoking the parser: the push method</h3>
115<p>In order for the application to keep the control when the document is
116being fetched (which is common for GUI based programs) libxml provides a push
117interface, too, as of version 1.8.3. Here are the interface functions:</p>
118<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
119 void *user_data,
120 const char *chunk,
121 int size,
122 const char *filename);
123int xmlParseChunk (xmlParserCtxtPtr ctxt,
124 const char *chunk,
125 int size,
126 int terminate);</pre>
127<p>and here is a simple example showing how to use the interface:</p>
128<pre> FILE *f;
129
130 f = fopen(filename, &quot;r&quot;);
131 if (f != NULL) {
132 int res, size = 1024;
133 char chars[1024];
134 xmlParserCtxtPtr ctxt;
135
136 res = fread(chars, 1, 4, f);
137 if (res &gt; 0) {
138 ctxt = xmlCreatePushParserCtxt(NULL, NULL,
139 chars, res, filename);
140 while ((res = fread(chars, 1, size, f)) &gt; 0) {
141 xmlParseChunk(ctxt, chars, res, 0);
142 }
143 xmlParseChunk(ctxt, chars, 0, 1);
144 doc = ctxt-&gt;myDoc;
145 xmlFreeParserCtxt(ctxt);
146 }
147 }</pre>
148<p>The HTML parser embedded into libxml also has a push interface; the
149functions are just prefixed by &quot;html&quot; rather than &quot;xml&quot;.</p>
150<h3 id="Invoking2">Invoking the parser: the SAX interface</h3>
151<p>The tree-building interface makes the parser memory-hungry, first loading
152the document in memory and then building the tree itself. Reading a document
153without building the tree is possible using the SAX interfaces (see SAX.h and
154<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James
155Henstridge's documentation</a>). Note also that the push interface can be
156limited to SAX: just use the two first arguments of
157<code>xmlCreatePushParserCtxt()</code>.</p>
158<h3><a name="Building">Building a tree from scratch</a></h3>
159<p>The other way to get an XML tree in memory is by building it. Basically
160there is a set of functions dedicated to building new elements. (These are
161also described in &lt;libxml/tree.h&gt;.) For example, here is a piece of
162code that produces the XML document used in the previous examples:</p>
163<pre> #include &lt;libxml/tree.h&gt;
164 xmlDocPtr doc;
165 xmlNodePtr tree, subtree;
166
167 doc = xmlNewDoc(&quot;1.0&quot;);
168 doc-&gt;children = xmlNewDocNode(doc, NULL, &quot;EXAMPLE&quot;, NULL);
169 xmlSetProp(doc-&gt;children, &quot;prop1&quot;, &quot;gnome is great&quot;);
170 xmlSetProp(doc-&gt;children, &quot;prop2&quot;, &quot;&amp; linux too&quot;);
171 tree = xmlNewChild(doc-&gt;children, NULL, &quot;head&quot;, NULL);
172 subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;Welcome to Gnome&quot;);
173 tree = xmlNewChild(doc-&gt;children, NULL, &quot;chapter&quot;, NULL);
174 subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;The Linux adventure&quot;);
175 subtree = xmlNewChild(tree, NULL, &quot;p&quot;, &quot;bla bla bla ...&quot;);
176 subtree = xmlNewChild(tree, NULL, &quot;image&quot;, NULL);
177 xmlSetProp(subtree, &quot;href&quot;, &quot;linus.gif&quot;);</pre>
178<p>Not really rocket science ...</p>
179<h3><a name="Traversing">Traversing the tree</a></h3>
180<p>Basically by <a href="html/libxml-tree.html">including &quot;tree.h&quot;</a> your
181code has access to the internal structure of all the elements of the tree.
182The names should be somewhat simple like <strong>parent</strong>,
183<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>,
184<strong>properties</strong>, etc... For example, still with the previous
185example:</p>
186<pre><code>doc-&gt;children-&gt;children-&gt;children</code></pre>
187<p>points to the title element,</p>
188<pre>doc-&gt;children-&gt;children-&gt;next-&gt;children-&gt;children</pre>
189<p>points to the text node containing the chapter title &quot;The Linux
190adventure&quot;.</p>
191<p>
192<strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be
193present before the document root, so <code>doc-&gt;children</code> may point
194to an element which is not the document Root Element; a function
195<code>xmlDocGetRootElement()</code> was added for this purpose.</p>
196<h3><a name="Modifying">Modifying the tree</a></h3>
197<p>Functions are provided for reading and writing the document content. Here
198is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p>
199<dl>
200<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
201 xmlChar *value);</code></dt>
202<dd><p>This sets (or changes) an attribute carried by an ELEMENT node.
203 The value can be NULL.</p></dd>
204</dl>
205<dl>
206<dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
207 *name);</code></dt>
208<dd><p>This function returns a pointer to new copy of the property
209 content. Note that the user must deallocate the result.</p></dd>
210</dl>
211<p>Two functions are provided for reading and writing the text associated
212with elements:</p>
213<dl>
214<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
215 *value);</code></dt>
216<dd><p>This function takes an &quot;external&quot; string and converts it to one
217 text node or possibly to a list of entity and text nodes. All
218 non-predefined entity references like &amp;Gnome; will be stored
219 internally as entity nodes, hence the result of the function may not be
220 a single node.</p></dd>
221</dl>
222<dl>
223<dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
224 inLine);</code></dt>
225<dd><p>This function is the inverse of
226 <code>xmlStringGetNodeList()</code>. It generates a new string
227 containing the content of the text and entity nodes. Note the extra
228 argument inLine. If this argument is set to 1, the function will expand
229 entity references. For example, instead of returning the &amp;Gnome;
230 XML encoding in the string, it will substitute it with its value (say,
231 &quot;GNU Network Object Model Environment&quot;).</p></dd>
232</dl>
233<h3><a name="Saving">Saving a tree</a></h3>
234<p>Basically 3 options are possible:</p>
235<dl>
236<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
237 *size);</code></dt>
238<dd><p>Returns a buffer into which the document has been saved.</p></dd>
239</dl>
240<dl>
241<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
242<dd><p>Dumps a document to an open file descriptor.</p></dd>
243</dl>
244<dl>
245<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
246<dd><p>Saves the document to a file. In this case, the compression
247 interface is triggered if it has been turned on.</p></dd>
248</dl>
249<h3><a name="Compressio">Compression</a></h3>
250<p>The library transparently handles compression when doing file-based
251accesses. The level of compression on saves can be turned on either globally
252or individually for one file:</p>
253<dl>
254<dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
255<dd><p>Gets the document compression ratio (0-9).</p></dd>
256</dl>
257<dl>
258<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
259<dd><p>Sets the document compression ratio.</p></dd>
260</dl>
261<dl>
262<dt><code>int xmlGetCompressMode(void);</code></dt>
263<dd><p>Gets the default compression ratio.</p></dd>
264</dl>
265<dl>
266<dt><code>void xmlSetCompressMode(int mode);</code></dt>
267<dd><p>Sets the default compression ratio.</p></dd>
268</dl>
Daniel Veillard3f4c40f2002-02-13 09:19:28 +0000269<p><a href="bugs.html">Daniel Veillard</a></p>
Daniel Veillardc9484202001-10-24 12:35:52 +0000270</td></tr></table></td></tr></table></td></tr></table></td>
271</tr></table></td></tr></table>
272</body>
273</html>