blob: e51139a91feefd35240ad2ae04f46fd1c66b771b [file] [log] [blame]
Daniel Veillardd2190fa2010-09-30 13:58:22 +02001<?xml version="1.0" encoding="UTF-8"?>
Daniel Veillard1177ca42003-04-26 22:29:54 +00002<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Daniel Veillardd2190fa2010-09-30 13:58:22 +02003<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Daniel Veillard373a4752002-02-21 14:46:29 +00004TD {font-family: Verdana,Arial,Helvetica}
5BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
6H1 {font-family: Verdana,Arial,Helvetica}
7H2 {font-family: Verdana,Arial,Helvetica}
8H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00009A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillard28fdf8b2011-03-07 08:12:39 +080010</style><title>The parser interfaces</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>The parser interfaces</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Developer Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html" style="font-weight:bold">Main Menu</a></li><li><a href="html/index.html" style="font-weight:bold">Reference Manual</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="tutorial/index.html">Tutorial</a></li><li><a href="xmlreader.html">The Reader Interface</a></li><li><a href="ChangeLog.html">ChangeLog</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="python.html">Python and bindings</a></li><li><a href="architecture.html">libxml2 architecture</a></li><li><a href="tree.html">The tree output</a></li><li><a href="interface.html">The SAX interface</a></li><li><a href="xmlmem.html">Memory Management</a></li><li><a href="xmlio.html">I/O Interfaces</a></li><li><a href="library.html">The parser interfaces</a></li><li><a href="entities.html">Entities or no entities</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="upgrade.html">Upgrading 1.x code</a></li><li><a href="threads.html">Thread safety</a></li><li><a href="DOM.html">DOM Principles</a></li><li><a href="example.html">A real example</a></li><li><a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="APIchunk0.html">Alphabetic</a></li><li><a href="APIconstructors.html">Constructors</a></li><li><a href="APIfunctions.html">Functions/Types</a></li><li><a href="APIfiles.html">Modules</a></li><li><a href="APIsymbols.html">Symbols</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://opencsw.org/packages/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://lxml.de/">lxml Python bindings</a></li><li><a href="http://cpan.uwinnipeg.ca/dist/XML-LibXML">Perl bindings</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>This section is directly intended to help programmers getting bootstrapped
Daniel Veillardf781dba2006-06-09 13:34:49 +000011using the XML tollkit from the C language. It is not intended to be
12extensive. I hope the automatically generated documents will provide the
13completeness required, but as a separate set of documents. The interfaces of
14the XML parser are by principle low level, Those interested in a higher level
15API should <a href="#DOM">look at DOM</a>.</p><p>The <a href="html/libxml-parser.html">parser interfaces for XML</a> are
16separated from the <a href="html/libxml-htmlparser.html">HTML parser
17interfaces</a>. Let's have a look at how the XML parser can be called:</p><h3><a name="Invoking" id="Invoking">Invoking the parser : the pull method</a></h3><p>Usually, the first thing to do is to read an XML input. The parser accepts
18documents either from in-memory strings or from files. The functions are
Daniel Veillarde38217a2013-05-10 15:40:13 +080019defined in "parser.h":</p><dl>
20 <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +000021 <dd><p>Parse a null-terminated string containing the document.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +000022 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +080023</dl><dl>
24 <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
Daniel Veillardf781dba2006-06-09 13:34:49 +000025 <dd><p>Parse an XML document contained in a (possibly compressed)
26 file.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +000027 </dd>
Daniel Veillardf781dba2006-06-09 13:34:49 +000028</dl><p>The parser returns a pointer to the document structure (or NULL in case of
29failure).</p><h3 id="Invoking1">Invoking the parser: the push method</h3><p>In order for the application to keep the control when the document is
30being fetched (which is common for GUI based programs) libxml2 provides a
31push interface, too, as of version 1.8.3. Here are the interface
32functions:</p><pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
Daniel Veillardc9484202001-10-24 12:35:52 +000033 void *user_data,
34 const char *chunk,
35 int size,
36 const char *filename);
37int xmlParseChunk (xmlParserCtxtPtr ctxt,
38 const char *chunk,
39 int size,
Daniel Veillard1177ca42003-04-26 22:29:54 +000040 int terminate);</pre><p>and here is a simple example showing how to use the interface:</p><pre> FILE *f;
Daniel Veillardc9484202001-10-24 12:35:52 +000041
Daniel Veillard024f1992003-12-10 16:43:49 +000042 f = fopen(filename, "r");
Daniel Veillardc9484202001-10-24 12:35:52 +000043 if (f != NULL) {
44 int res, size = 1024;
45 char chars[1024];
46 xmlParserCtxtPtr ctxt;
47
48 res = fread(chars, 1, 4, f);
49 if (res &gt; 0) {
50 ctxt = xmlCreatePushParserCtxt(NULL, NULL,
51 chars, res, filename);
52 while ((res = fread(chars, 1, size, f)) &gt; 0) {
53 xmlParseChunk(ctxt, chars, res, 0);
54 }
55 xmlParseChunk(ctxt, chars, 0, 1);
56 doc = ctxt-&gt;myDoc;
57 xmlFreeParserCtxt(ctxt);
58 }
Daniel Veillardf781dba2006-06-09 13:34:49 +000059 }</pre><p>The HTML parser embedded into libxml2 also has a push interface; the
60functions are just prefixed by "html" rather than "xml".</p><h3 id="Invoking2">Invoking the parser: the SAX interface</h3><p>The tree-building interface makes the parser memory-hungry, first loading
61the document in memory and then building the tree itself. Reading a document
62without building the tree is possible using the SAX interfaces (see SAX.h and
63<a href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">James
64Henstridge's documentation</a>). Note also that the push interface can be
65limited to SAX: just use the two first arguments of
66<code>xmlCreatePushParserCtxt()</code>.</p><h3><a name="Building" id="Building">Building a tree from scratch</a></h3><p>The other way to get an XML tree in memory is by building it. Basically
67there is a set of functions dedicated to building new elements. (These are
68also described in &lt;libxml/tree.h&gt;.) For example, here is a piece of
69code that produces the XML document used in the previous examples:</p><pre> #include &lt;libxml/tree.h&gt;
Daniel Veillardc9484202001-10-24 12:35:52 +000070 xmlDocPtr doc;
71 xmlNodePtr tree, subtree;
72
Daniel Veillard024f1992003-12-10 16:43:49 +000073 doc = xmlNewDoc("1.0");
74 doc-&gt;children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
75 xmlSetProp(doc-&gt;children, "prop1", "gnome is great");
76 xmlSetProp(doc-&gt;children, "prop2", "&amp; linux too");
77 tree = xmlNewChild(doc-&gt;children, NULL, "head", NULL);
78 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
79 tree = xmlNewChild(doc-&gt;children, NULL, "chapter", NULL);
80 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
81 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
82 subtree = xmlNewChild(tree, NULL, "image", NULL);
Daniel Veillardf781dba2006-06-09 13:34:49 +000083 xmlSetProp(subtree, "href", "linus.gif");</pre><p>Not really rocket science ...</p><h3><a name="Traversing" id="Traversing">Traversing the tree</a></h3><p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your
84code has access to the internal structure of all the elements of the tree.
85The names should be somewhat simple like <strong>parent</strong>,
86<strong>children</strong>, <strong>next</strong>, <strong>prev</strong>,
87<strong>properties</strong>, etc... For example, still with the previous
88example:</p><pre><code>doc-&gt;children-&gt;children-&gt;children</code></pre><p>points to the title element,</p><pre>doc-&gt;children-&gt;children-&gt;next-&gt;children-&gt;children</pre><p>points to the text node containing the chapter title "The Linux
89adventure".</p><p><strong>NOTE</strong>: XML allows <em>PI</em>s and <em>comments</em> to be
90present before the document root, so <code>doc-&gt;children</code> may point
91to an element which is not the document Root Element; a function
92<code>xmlDocGetRootElement()</code> was added for this purpose.</p><h3><a name="Modifying" id="Modifying">Modifying the tree</a></h3><p>Functions are provided for reading and writing the document content. Here
Daniel Veillarde38217a2013-05-10 15:40:13 +080093is an excerpt from the <a href="html/libxml-tree.html">tree API</a>:</p><dl>
94 <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
Daniel Veillardf781dba2006-06-09 13:34:49 +000095 xmlChar *value);</code></dt>
96 <dd><p>This sets (or changes) an attribute carried by an ELEMENT node.
97 The value can be NULL.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +000098 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +080099</dl><dl>
100 <dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
Daniel Veillardf781dba2006-06-09 13:34:49 +0000101 *name);</code></dt>
102 <dd><p>This function returns a pointer to new copy of the property
103 content. Note that the user must deallocate the result.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000104 </dd>
Daniel Veillardf781dba2006-06-09 13:34:49 +0000105</dl><p>Two functions are provided for reading and writing the text associated
Daniel Veillarde38217a2013-05-10 15:40:13 +0800106with elements:</p><dl>
107 <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
Daniel Veillardf781dba2006-06-09 13:34:49 +0000108 *value);</code></dt>
109 <dd><p>This function takes an "external" string and converts it to one
110 text node or possibly to a list of entity and text nodes. All
111 non-predefined entity references like &amp;Gnome; will be stored
112 internally as entity nodes, hence the result of the function may not be
113 a single node.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000114 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800115</dl><dl>
116 <dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
Daniel Veillardf781dba2006-06-09 13:34:49 +0000117 inLine);</code></dt>
118 <dd><p>This function is the inverse of
119 <code>xmlStringGetNodeList()</code>. It generates a new string
120 containing the content of the text and entity nodes. Note the extra
121 argument inLine. If this argument is set to 1, the function will expand
122 entity references. For example, instead of returning the &amp;Gnome;
123 XML encoding in the string, it will substitute it with its value (say,
124 "GNU Network Object Model Environment").</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000125 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800126</dl><h3><a name="Saving" id="Saving">Saving a tree</a></h3><p>Basically 3 options are possible:</p><dl>
127 <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
Daniel Veillardf781dba2006-06-09 13:34:49 +0000128 *size);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000129 <dd><p>Returns a buffer into which the document has been saved.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000130 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800131</dl><dl>
132 <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000133 <dd><p>Dumps a document to an open file descriptor.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000134 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800135</dl><dl>
136 <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
Daniel Veillardf781dba2006-06-09 13:34:49 +0000137 <dd><p>Saves the document to a file. In this case, the compression
138 interface is triggered if it has been turned on.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000139 </dd>
Daniel Veillardf781dba2006-06-09 13:34:49 +0000140</dl><h3><a name="Compressio" id="Compressio">Compression</a></h3><p>The library transparently handles compression when doing file-based
141accesses. The level of compression on saves can be turned on either globally
Daniel Veillarde38217a2013-05-10 15:40:13 +0800142or individually for one file:</p><dl>
143 <dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000144 <dd><p>Gets the document compression ratio (0-9).</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000145 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800146</dl><dl>
147 <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000148 <dd><p>Sets the document compression ratio.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000149 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800150</dl><dl>
151 <dt><code>int xmlGetCompressMode(void);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000152 <dd><p>Gets the default compression ratio.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000153 </dd>
Daniel Veillarde38217a2013-05-10 15:40:13 +0800154</dl><dl>
155 <dt><code>void xmlSetCompressMode(int mode);</code></dt>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000156 <dd><p>Sets the default compression ratio.</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000157 </dd>
Daniel Veillard1177ca42003-04-26 22:29:54 +0000158</dl><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>