blob: 85579f64ccad91275b4143515d8bde0686568171 [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillardc9484202001-10-24 12:35:52 +00002<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
Daniel Veillard2c748c62002-01-16 15:37:50 +00006TD {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 14pt; font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
8H1 {font-size: 20pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 18pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000011A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000012--></style>
13<title>A real example</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>A real example</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000026<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000028<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000029<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardc9484202001-10-24 12:35:52 +000030<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000032<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000033<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
Daniel Veillard7b602b42002-01-08 13:26:00 +000038<li><a href="XMLinfo.html">XML</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000039<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000040<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000041<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000043<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000049<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000051<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000052<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000053<li><a href="DOM.html">DOM Principles</a></li>
54<li><a href="example.html">A real example</a></li>
55<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000056<li>
57<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
58</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000059</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000060</table>
61<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard3bf65be2002-01-23 12:36:34 +000062<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
63<tr><td bgcolor="#fffacd"><ul>
64<li><a href="APIconstructors.html">Constructors</a></li>
65<li><a href="APIfunctions.html">Functions/Types</a></li>
66<li><a href="APIfiles.html">Modules</a></li>
67<li><a href="APIsymbols.html">Symbols</a></li>
68</ul></td></tr>
69</table>
70<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard594cf0b2001-10-25 08:09:12 +000071<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000072<tr><td bgcolor="#fffacd"><ul>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000073<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
74<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
Daniel Veillard4a859202002-01-08 11:49:22 +000075<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000076<li><a href="ftp://xmlsoft.org/">FTP</a></li>
77<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000078<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillardc6271d22001-10-27 07:50:58 +000079<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000080</ul></td></tr>
81</table>
82</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000083<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
84<p>Here is a real size example, where the actual content of the application
85data is not kept in the DOM tree but uses internal structures. It is based on
86a proposal to keep a database of jobs related to Gnome, with an XML based
87storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
88base</a>:</p>
89<pre>&lt;?xml version=&quot;1.0&quot;?&gt;
90&lt;gjob:Helping xmlns:gjob=&quot;http://www.gnome.org/some-location&quot;&gt;
91 &lt;gjob:Jobs&gt;
92
93 &lt;gjob:Job&gt;
94 &lt;gjob:Project ID=&quot;3&quot;/&gt;
95 &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
96 &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
97
98 &lt;gjob:Update&gt;
99 &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
100 &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
101 &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
102 &lt;/gjob:Update&gt;
103
104 &lt;gjob:Developers&gt;
105 &lt;gjob:Developer&gt;
106 &lt;/gjob:Developer&gt;
107 &lt;/gjob:Developers&gt;
108
109 &lt;gjob:Contact&gt;
110 &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
111 &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
112 &lt;gjob:Company&gt;
113 &lt;/gjob:Company&gt;
114 &lt;gjob:Organisation&gt;
115 &lt;/gjob:Organisation&gt;
116 &lt;gjob:Webpage&gt;
117 &lt;/gjob:Webpage&gt;
118 &lt;gjob:Snailmail&gt;
119 &lt;/gjob:Snailmail&gt;
120 &lt;gjob:Phone&gt;
121 &lt;/gjob:Phone&gt;
122 &lt;/gjob:Contact&gt;
123
124 &lt;gjob:Requirements&gt;
125 The program should be released as free software, under the GPL.
126 &lt;/gjob:Requirements&gt;
127
128 &lt;gjob:Skills&gt;
129 &lt;/gjob:Skills&gt;
130
131 &lt;gjob:Details&gt;
132 A GNOME based system that will allow a superuser to configure
133 compressed and uncompressed files and/or file systems to be backed
134 up with a supported media in the system. This should be able to
135 perform via find commands generating a list of files that are passed
136 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
137 or via operations performed on the filesystem itself. Email
138 notification and GUI status display very important.
139 &lt;/gjob:Details&gt;
140
141 &lt;/gjob:Job&gt;
142
143 &lt;/gjob:Jobs&gt;
144&lt;/gjob:Helping&gt;</pre>
145<p>While loading the XML file into an internal DOM tree is a matter of
146calling only a couple of functions, browsing the tree to gather the ata and
147generate the internal structures is harder, and more error prone.</p>
148<p>The suggested principle is to be tolerant with respect to the input
149structure. For example, the ordering of the attributes is not significant,
150the XML specification is clear about it. It's also usually a good idea not to
151depend on the order of the children of a given node, unless it really makes
152things harder. Here is some code to parse the information for a person:</p>
153<pre>/*
154 * A person record
155 */
156typedef struct person {
157 char *name;
158 char *email;
159 char *company;
160 char *organisation;
161 char *smail;
162 char *webPage;
163 char *phone;
164} person, *personPtr;
165
166/*
167 * And the code needed to parse it
168 */
169personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
170 personPtr ret = NULL;
171
172DEBUG(&quot;parsePerson\n&quot;);
173 /*
174 * allocate the struct
175 */
176 ret = (personPtr) malloc(sizeof(person));
177 if (ret == NULL) {
178 fprintf(stderr,&quot;out of memory\n&quot;);
179 return(NULL);
180 }
181 memset(ret, 0, sizeof(person));
182
183 /* We don't care what the top level element name is */
184 cur = cur-&gt;xmlChildrenNode;
185 while (cur != NULL) {
186 if ((!strcmp(cur-&gt;name, &quot;Person&quot;)) &amp;&amp; (cur-&gt;ns == ns))
187 ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
188 if ((!strcmp(cur-&gt;name, &quot;Email&quot;)) &amp;&amp; (cur-&gt;ns == ns))
189 ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
190 cur = cur-&gt;next;
191 }
192
193 return(ret);
194}</pre>
195<p>Here are a couple of things to notice:</p>
196<ul>
197<li>Usually a recursive parsing style is the more convenient one: XML data
198 is by nature subject to repetitive constructs and usually exibits highly
199 stuctured patterns.</li>
200<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
201 i.e. the pointer to the global XML document and the namespace reserved to
202 the application. Document wide information are needed for example to
203 decode entities and it's a good coding practice to define a namespace for
204 your application set of data and test that the element and attributes
205 you're analyzing actually pertains to your application space. This is
206 done by a simple equality test (cur-&gt;ns == ns).</li>
207<li>To retrieve text and attributes value, you can use the function
208 <em>xmlNodeListGetString</em> to gather all the text and entity reference
209 nodes generated by the DOM output and produce an single text string.</li>
210</ul>
211<p>Here is another piece of code used to parse another level of the
212structure:</p>
213<pre>#include &lt;libxml/tree.h&gt;
214/*
215 * a Description for a Job
216 */
217typedef struct job {
218 char *projectID;
219 char *application;
220 char *category;
221 personPtr contact;
222 int nbDevelopers;
223 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
224} job, *jobPtr;
225
226/*
227 * And the code needed to parse it
228 */
229jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
230 jobPtr ret = NULL;
231
232DEBUG(&quot;parseJob\n&quot;);
233 /*
234 * allocate the struct
235 */
236 ret = (jobPtr) malloc(sizeof(job));
237 if (ret == NULL) {
238 fprintf(stderr,&quot;out of memory\n&quot;);
239 return(NULL);
240 }
241 memset(ret, 0, sizeof(job));
242
243 /* We don't care what the top level element name is */
244 cur = cur-&gt;xmlChildrenNode;
245 while (cur != NULL) {
246
247 if ((!strcmp(cur-&gt;name, &quot;Project&quot;)) &amp;&amp; (cur-&gt;ns == ns)) {
248 ret-&gt;projectID = xmlGetProp(cur, &quot;ID&quot;);
249 if (ret-&gt;projectID == NULL) {
250 fprintf(stderr, &quot;Project has no ID\n&quot;);
251 }
252 }
253 if ((!strcmp(cur-&gt;name, &quot;Application&quot;)) &amp;&amp; (cur-&gt;ns == ns))
254 ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
255 if ((!strcmp(cur-&gt;name, &quot;Category&quot;)) &amp;&amp; (cur-&gt;ns == ns))
256 ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
257 if ((!strcmp(cur-&gt;name, &quot;Contact&quot;)) &amp;&amp; (cur-&gt;ns == ns))
258 ret-&gt;contact = parsePerson(doc, ns, cur);
259 cur = cur-&gt;next;
260 }
261
262 return(ret);
263}</pre>
264<p>Once you are used to it, writing this kind of code is quite simple, but
265boring. Ultimately, it could be possble to write stubbers taking either C
266data structure definitions, a set of XML examples or an XML DTD and produce
267the code needed to import and export the content between C data and XML
268storage. This is left as an exercise to the reader :-)</p>
269<p>Feel free to use <a href="example/gjobread.c">the code for the full C
270parsing example</a> as a template, it is also available with Makefile in the
271Gnome CVS base under gnome-xml/example</p>
272<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
273</td></tr></table></td></tr></table></td></tr></table></td>
274</tr></table></td></tr></table>
275</body>
276</html>