blob: 2cf776348f01369490f86a9b36f33d38799f0087 [file] [log] [blame]
Daniel Veillardc9484202001-10-24 12:35:52 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000011A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000012--></style>
13<title>A real example</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>A real example</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000026<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000028<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000032<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000033<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000040<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000041<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000043<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000049<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000051<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000052<li><a href="DOM.html">DOM Principles</a></li>
53<li><a href="example.html">A real example</a></li>
54<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000055<li>
56<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
57</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000058</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000059</table>
60<table width="100%" border="0" cellspacing="1" cellpadding="3">
61<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
62<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
63<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
64<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
65<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
66<li><a href="ftp://xmlsoft.org/">FTP</a></li>
67<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
68<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li>
69</ul></td></tr>
70</table>
71</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000072<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
73<p>Here is a real size example, where the actual content of the application
74data is not kept in the DOM tree but uses internal structures. It is based on
75a proposal to keep a database of jobs related to Gnome, with an XML based
76storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
77base</a>:</p>
78<pre>&lt;?xml version=&quot;1.0&quot;?&gt;
79&lt;gjob:Helping xmlns:gjob=&quot;http://www.gnome.org/some-location&quot;&gt;
80 &lt;gjob:Jobs&gt;
81
82 &lt;gjob:Job&gt;
83 &lt;gjob:Project ID=&quot;3&quot;/&gt;
84 &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
85 &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
86
87 &lt;gjob:Update&gt;
88 &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
89 &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
90 &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
91 &lt;/gjob:Update&gt;
92
93 &lt;gjob:Developers&gt;
94 &lt;gjob:Developer&gt;
95 &lt;/gjob:Developer&gt;
96 &lt;/gjob:Developers&gt;
97
98 &lt;gjob:Contact&gt;
99 &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
100 &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
101 &lt;gjob:Company&gt;
102 &lt;/gjob:Company&gt;
103 &lt;gjob:Organisation&gt;
104 &lt;/gjob:Organisation&gt;
105 &lt;gjob:Webpage&gt;
106 &lt;/gjob:Webpage&gt;
107 &lt;gjob:Snailmail&gt;
108 &lt;/gjob:Snailmail&gt;
109 &lt;gjob:Phone&gt;
110 &lt;/gjob:Phone&gt;
111 &lt;/gjob:Contact&gt;
112
113 &lt;gjob:Requirements&gt;
114 The program should be released as free software, under the GPL.
115 &lt;/gjob:Requirements&gt;
116
117 &lt;gjob:Skills&gt;
118 &lt;/gjob:Skills&gt;
119
120 &lt;gjob:Details&gt;
121 A GNOME based system that will allow a superuser to configure
122 compressed and uncompressed files and/or file systems to be backed
123 up with a supported media in the system. This should be able to
124 perform via find commands generating a list of files that are passed
125 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
126 or via operations performed on the filesystem itself. Email
127 notification and GUI status display very important.
128 &lt;/gjob:Details&gt;
129
130 &lt;/gjob:Job&gt;
131
132 &lt;/gjob:Jobs&gt;
133&lt;/gjob:Helping&gt;</pre>
134<p>While loading the XML file into an internal DOM tree is a matter of
135calling only a couple of functions, browsing the tree to gather the ata and
136generate the internal structures is harder, and more error prone.</p>
137<p>The suggested principle is to be tolerant with respect to the input
138structure. For example, the ordering of the attributes is not significant,
139the XML specification is clear about it. It's also usually a good idea not to
140depend on the order of the children of a given node, unless it really makes
141things harder. Here is some code to parse the information for a person:</p>
142<pre>/*
143 * A person record
144 */
145typedef struct person {
146 char *name;
147 char *email;
148 char *company;
149 char *organisation;
150 char *smail;
151 char *webPage;
152 char *phone;
153} person, *personPtr;
154
155/*
156 * And the code needed to parse it
157 */
158personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
159 personPtr ret = NULL;
160
161DEBUG(&quot;parsePerson\n&quot;);
162 /*
163 * allocate the struct
164 */
165 ret = (personPtr) malloc(sizeof(person));
166 if (ret == NULL) {
167 fprintf(stderr,&quot;out of memory\n&quot;);
168 return(NULL);
169 }
170 memset(ret, 0, sizeof(person));
171
172 /* We don't care what the top level element name is */
173 cur = cur-&gt;xmlChildrenNode;
174 while (cur != NULL) {
175 if ((!strcmp(cur-&gt;name, &quot;Person&quot;)) &amp;&amp; (cur-&gt;ns == ns))
176 ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
177 if ((!strcmp(cur-&gt;name, &quot;Email&quot;)) &amp;&amp; (cur-&gt;ns == ns))
178 ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
179 cur = cur-&gt;next;
180 }
181
182 return(ret);
183}</pre>
184<p>Here are a couple of things to notice:</p>
185<ul>
186<li>Usually a recursive parsing style is the more convenient one: XML data
187 is by nature subject to repetitive constructs and usually exibits highly
188 stuctured patterns.</li>
189<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
190 i.e. the pointer to the global XML document and the namespace reserved to
191 the application. Document wide information are needed for example to
192 decode entities and it's a good coding practice to define a namespace for
193 your application set of data and test that the element and attributes
194 you're analyzing actually pertains to your application space. This is
195 done by a simple equality test (cur-&gt;ns == ns).</li>
196<li>To retrieve text and attributes value, you can use the function
197 <em>xmlNodeListGetString</em> to gather all the text and entity reference
198 nodes generated by the DOM output and produce an single text string.</li>
199</ul>
200<p>Here is another piece of code used to parse another level of the
201structure:</p>
202<pre>#include &lt;libxml/tree.h&gt;
203/*
204 * a Description for a Job
205 */
206typedef struct job {
207 char *projectID;
208 char *application;
209 char *category;
210 personPtr contact;
211 int nbDevelopers;
212 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
213} job, *jobPtr;
214
215/*
216 * And the code needed to parse it
217 */
218jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
219 jobPtr ret = NULL;
220
221DEBUG(&quot;parseJob\n&quot;);
222 /*
223 * allocate the struct
224 */
225 ret = (jobPtr) malloc(sizeof(job));
226 if (ret == NULL) {
227 fprintf(stderr,&quot;out of memory\n&quot;);
228 return(NULL);
229 }
230 memset(ret, 0, sizeof(job));
231
232 /* We don't care what the top level element name is */
233 cur = cur-&gt;xmlChildrenNode;
234 while (cur != NULL) {
235
236 if ((!strcmp(cur-&gt;name, &quot;Project&quot;)) &amp;&amp; (cur-&gt;ns == ns)) {
237 ret-&gt;projectID = xmlGetProp(cur, &quot;ID&quot;);
238 if (ret-&gt;projectID == NULL) {
239 fprintf(stderr, &quot;Project has no ID\n&quot;);
240 }
241 }
242 if ((!strcmp(cur-&gt;name, &quot;Application&quot;)) &amp;&amp; (cur-&gt;ns == ns))
243 ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
244 if ((!strcmp(cur-&gt;name, &quot;Category&quot;)) &amp;&amp; (cur-&gt;ns == ns))
245 ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
246 if ((!strcmp(cur-&gt;name, &quot;Contact&quot;)) &amp;&amp; (cur-&gt;ns == ns))
247 ret-&gt;contact = parsePerson(doc, ns, cur);
248 cur = cur-&gt;next;
249 }
250
251 return(ret);
252}</pre>
253<p>Once you are used to it, writing this kind of code is quite simple, but
254boring. Ultimately, it could be possble to write stubbers taking either C
255data structure definitions, a set of XML examples or an XML DTD and produce
256the code needed to import and export the content between C data and XML
257storage. This is left as an exercise to the reader :-)</p>
258<p>Feel free to use <a href="example/gjobread.c">the code for the full C
259parsing example</a> as a template, it is also available with Makefile in the
260Gnome CVS base under gnome-xml/example</p>
261<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
262</td></tr></table></td></tr></table></td></tr></table></td>
263</tr></table></td></tr></table>
264</body>
265</html>