blob: 42eb5f87cefe1707115daad87a0a482f8a036c7e [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillardc9484202001-10-24 12:35:52 +00002<html>
3<head>
Daniel Veillard7216cfd2002-11-08 15:10:00 +00004<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Daniel Veillardc332dab2002-03-29 14:08:27 +00005<link rel="SHORTCUT ICON" href="/favicon.ico">
Daniel Veillardc9484202001-10-24 12:35:52 +00006<style type="text/css"><!--
Daniel Veillard373a4752002-02-21 14:46:29 +00007TD {font-family: Verdana,Arial,Helvetica}
8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
9H1 {font-family: Verdana,Arial,Helvetica}
10H2 {font-family: Verdana,Arial,Helvetica}
11H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000012A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000013--></style>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000014<title>The parser interfaces</title>
Daniel Veillardc9484202001-10-24 12:35:52 +000015</head>
16<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
18<td width="180">
Daniel Veillard8f40f1e2002-08-28 21:18:45 +000019<a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo"></a></div>
Daniel Veillardc9484202001-10-24 12:35:52 +000020</td>
21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
22<h1>The XML C library for Gnome</h1>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000023<h2>The parser interfaces</h2>
Daniel Veillardc9484202001-10-24 12:35:52 +000024</td></tr></table></td></tr></table></td>
25</tr></table>
26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000027<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
28<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000029<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
Daniel Veillard4a603e42003-01-11 14:18:53 +000030<tr><td bgcolor="#fffacd">
31<form action="search.php" enctype="application/x-www-form-urlencoded" method="GET">
32<input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ...">
33</form>
34<ul>
Daniel Veillardc9484202001-10-24 12:35:52 +000035<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000036<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000037<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000038<li><a href="docs.html">Documentation</a></li>
39<li><a href="bugs.html">Reporting bugs and getting help</a></li>
40<li><a href="help.html">How to help</a></li>
41<li><a href="downloads.html">Downloads</a></li>
42<li><a href="news.html">News</a></li>
Daniel Veillard7b602b42002-01-08 13:26:00 +000043<li><a href="XMLinfo.html">XML</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000044<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillard6dbcaf82002-02-20 14:37:47 +000045<li><a href="python.html">Python and bindings</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000046<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000047<li><a href="tree.html">The tree output</a></li>
48<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000049<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
50<li><a href="xmlmem.html">Memory Management</a></li>
51<li><a href="encoding.html">Encodings support</a></li>
52<li><a href="xmlio.html">I/O Interfaces</a></li>
53<li><a href="catalog.html">Catalog support</a></li>
54<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000055<li><a href="entities.html">Entities or no entities</a></li>
56<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000057<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000058<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000059<li><a href="DOM.html">DOM Principles</a></li>
60<li><a href="example.html">A real example</a></li>
61<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard7b4b2f92003-01-06 13:11:20 +000062<li><a href="xmlreader.html">The Reader Interface</a></li>
Daniel Veillardfc59c092002-06-05 14:48:26 +000063<li><a href="tutorial/index.html">Tutorial</a></li>
Daniel Veillard7b4b2f92003-01-06 13:11:20 +000064<li><a href="guidelines.html">XML Guidelines</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000065<li>
66<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
67</li>
Daniel Veillard5ede35e2002-10-01 11:37:35 +000068</ul>
69</td></tr>
Daniel Veillard3bf65be2002-01-23 12:36:34 +000070</table>
71<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard594cf0b2001-10-25 08:09:12 +000072<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000073<tr><td bgcolor="#fffacd"><ul>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000074<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
75<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
Daniel Veillard4a859202002-01-08 11:49:22 +000076<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000077<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000078<li><a href="ftp://xmlsoft.org/">FTP</a></li>
Daniel Veillardc84f8b52002-12-19 22:12:47 +000079<li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000080<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillardcb7543b2002-09-09 10:54:06 +000081<li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li>
Daniel Veillarde6d8e202002-05-02 06:11:10 +000082<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000083<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000084</ul></td></tr>
85</table>
Daniel Veillard4a603e42003-01-11 14:18:53 +000086<table width="100%" border="0" cellspacing="1" cellpadding="3">
87<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
88<tr><td bgcolor="#fffacd"><ul>
89<li><a href="APIchunk0.html">Alphabetic</a></li>
90<li><a href="APIconstructors.html">Constructors</a></li>
91<li><a href="APIfunctions.html">Functions/Types</a></li>
92<li><a href="APIfiles.html">Modules</a></li>
93<li><a href="APIsymbols.html">Symbols</a></li>
94</ul></td></tr>
95</table>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000096</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000097<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
98<p>This section is directly intended to help programmers getting bootstrapped
99using the XML library from the C language. It is not intended to be
100extensive. I hope the automatically generated documents will provide the
101completeness required, but as a separate set of documents. The interfaces of
102the XML library are by principle low level, there is nearly zero abstraction.
103Those interested in a higher level API should <a href="#DOM">look at
104DOM</a>.</p>
105<p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are
106separated from the <a href="html/libxml-htmlparser.html">HTML parser
107interfaces</a>. Let's have a look at how the XML parser can be called:</p>
108<h3><a name="Invoking">Invoking the parser : the pull method</a></h3>
109<p>Usually, the first thing to do is to read an XML input. The parser accepts
110documents either from in-memory strings or from files. The functions are
111defined in &quot;parser.h&quot;:</p>
112<dl>
113<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000114 <dd>
115<p>Parse a null-terminated string containing the document.</p>
116 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000117</dl>
118<dl>
119<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000120 <dd>
121<p>Parse an XML document contained in a (possibly compressed)
122 file.</p>
123 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000124</dl>
125<p>The parser returns a pointer to the document structure (or NULL in case of
126failure).</p>
127<h3 id="Invoking1">Invoking the parser: the push method</h3>
128<p>In order for the application to keep the control when the document is
129being fetched (which is common for GUI based programs) libxml provides a push
130interface, too, as of version 1.8.3. Here are the interface functions:</p>
131<pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
132 void *user_data,
133 const char *chunk,
134 int size,
135 const char *filename);
136int xmlParseChunk (xmlParserCtxtPtr ctxt,
137 const char *chunk,
138 int size,
139 int terminate);</pre>
140<p>and here is a simple example showing how to use the interface:</p>
141<pre> FILE *f;
142
143 f = fopen(filename, &quot;r&quot;);
144 if (f != NULL) {
145 int res, size = 1024;
146 char chars[1024];
147 xmlParserCtxtPtr ctxt;
148
149 res = fread(chars, 1, 4, f);
150 if (res &gt; 0) {
151 ctxt = xmlCreatePushParserCtxt(NULL, NULL,
152 chars, res, filename);
153 while ((res = fread(chars, 1, size, f)) &gt; 0) {
154 xmlParseChunk(ctxt, chars, res, 0);
155 }
156 xmlParseChunk(ctxt, chars, 0, 1);
157 doc = ctxt-&gt;myDoc;
158 xmlFreeParserCtxt(ctxt);
159 }
160 }</pre>
161<p>The HTML parser embedded into libxml also has a push interface; the
162functions are just prefixed by &quot;html&quot; rather than &quot;xml&quot;.</p>
163<h3 id="Invoking2">Invoking the parser: the SAX interface</h3>
164<p>The tree-building interface makes the parser memory-hungry, first loading
165the document in memory and then building the tree itself. Reading a document
166without building the tree is possible using the SAX interfaces (see SAX.h and
167<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James
168Henstridge's documentation</a>). Note also that the push interface can be
169limited to SAX: just use the two first arguments of
170<code>xmlCreatePushParserCtxt()</code>.</p>
171<h3><a name="Building">Building a tree from scratch</a></h3>
172<p>The other way to get an XML tree in memory is by building it. Basically
173there is a set of functions dedicated to building new elements. (These are
174also described in &lt;libxml/tree.h&gt;.) For example, here is a piece of
175code that produces the XML document used in the previous examples:</p>
176<pre> #include &lt;libxml/tree.h&gt;
177 xmlDocPtr doc;
178 xmlNodePtr tree, subtree;
179
180 doc = xmlNewDoc(&quot;1.0&quot;);
181 doc-&gt;children = xmlNewDocNode(doc, NULL, &quot;EXAMPLE&quot;, NULL);
182 xmlSetProp(doc-&gt;children, &quot;prop1&quot;, &quot;gnome is great&quot;);
183 xmlSetProp(doc-&gt;children, &quot;prop2&quot;, &quot;&amp; linux too&quot;);
184 tree = xmlNewChild(doc-&gt;children, NULL, &quot;head&quot;, NULL);
185 subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;Welcome to Gnome&quot;);
186 tree = xmlNewChild(doc-&gt;children, NULL, &quot;chapter&quot;, NULL);
187 subtree = xmlNewChild(tree, NULL, &quot;title&quot;, &quot;The Linux adventure&quot;);
188 subtree = xmlNewChild(tree, NULL, &quot;p&quot;, &quot;bla bla bla ...&quot;);
189 subtree = xmlNewChild(tree, NULL, &quot;image&quot;, NULL);
190 xmlSetProp(subtree, &quot;href&quot;, &quot;linus.gif&quot;);</pre>
191<p>Not really rocket science ...</p>
192<h3><a name="Traversing">Traversing the tree</a></h3>
193<p>Basically by <a href="html/libxml-tree.html">including &quot;tree.h&quot;</a> your
194code has access to the internal structure of all the elements of the tree.
195The names should be somewhat simple like <strong>parent</strong>,
196<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>,
197<strong>properties</strong>, etc... For example, still with the previous
198example:</p>
199<pre><code>doc-&gt;children-&gt;children-&gt;children</code></pre>
200<p>points to the title element,</p>
201<pre>doc-&gt;children-&gt;children-&gt;next-&gt;children-&gt;children</pre>
202<p>points to the text node containing the chapter title &quot;The Linux
203adventure&quot;.</p>
204<p>
205<strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be
206present before the document root, so <code>doc-&gt;children</code> may point
207to an element which is not the document Root Element; a function
208<code>xmlDocGetRootElement()</code> was added for this purpose.</p>
209<h3><a name="Modifying">Modifying the tree</a></h3>
210<p>Functions are provided for reading and writing the document content. Here
211is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p>
212<dl>
213<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
214 xmlChar *value);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000215 <dd>
216<p>This sets (or changes) an attribute carried by an ELEMENT node.
217 The value can be NULL.</p>
218 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000219</dl>
220<dl>
221<dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
222 *name);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000223 <dd>
224<p>This function returns a pointer to new copy of the property
225 content. Note that the user must deallocate the result.</p>
226 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000227</dl>
228<p>Two functions are provided for reading and writing the text associated
229with elements:</p>
230<dl>
231<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
232 *value);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000233 <dd>
234<p>This function takes an &quot;external&quot; string and converts it to one
Daniel Veillardc9484202001-10-24 12:35:52 +0000235 text node or possibly to a list of entity and text nodes. All
236 non-predefined entity references like &amp;Gnome; will be stored
237 internally as entity nodes, hence the result of the function may not be
Daniel Veillard0b28e882002-07-24 23:47:05 +0000238 a single node.</p>
239 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000240</dl>
241<dl>
242<dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
243 inLine);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000244 <dd>
245<p>This function is the inverse of
Daniel Veillardc9484202001-10-24 12:35:52 +0000246 <code>xmlStringGetNodeList()</code>. It generates a new string
247 containing the content of the text and entity nodes. Note the extra
248 argument inLine. If this argument is set to 1, the function will expand
249 entity references. For example, instead of returning the &amp;Gnome;
250 XML encoding in the string, it will substitute it with its value (say,
Daniel Veillard0b28e882002-07-24 23:47:05 +0000251 &quot;GNU Network Object Model Environment&quot;).</p>
252 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000253</dl>
254<h3><a name="Saving">Saving a tree</a></h3>
255<p>Basically 3 options are possible:</p>
256<dl>
257<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
258 *size);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000259 <dd>
260<p>Returns a buffer into which the document has been saved.</p>
261 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000262</dl>
263<dl>
264<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000265 <dd>
266<p>Dumps a document to an open file descriptor.</p>
267 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000268</dl>
269<dl>
270<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000271 <dd>
272<p>Saves the document to a file. In this case, the compression
273 interface is triggered if it has been turned on.</p>
274 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000275</dl>
276<h3><a name="Compressio">Compression</a></h3>
277<p>The library transparently handles compression when doing file-based
278accesses. The level of compression on saves can be turned on either globally
279or individually for one file:</p>
280<dl>
281<dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000282 <dd>
283<p>Gets the document compression ratio (0-9).</p>
284 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000285</dl>
286<dl>
287<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000288 <dd>
289<p>Sets the document compression ratio.</p>
290 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000291</dl>
292<dl>
293<dt><code>int xmlGetCompressMode(void);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000294 <dd>
295<p>Gets the default compression ratio.</p>
296 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000297</dl>
298<dl>
299<dt><code>void xmlSetCompressMode(int mode);</code></dt>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000300 <dd>
301<p>Sets the default compression ratio.</p>
302 </dd>
Daniel Veillardc9484202001-10-24 12:35:52 +0000303</dl>
Daniel Veillard3f4c40f2002-02-13 09:19:28 +0000304<p><a href="bugs.html">Daniel Veillard</a></p>
Daniel Veillardc9484202001-10-24 12:35:52 +0000305</td></tr></table></td></tr></table></td></tr></table></td>
306</tr></table></td></tr></table>
307</body>
308</html>