blob: ff49607a54f4fff41ef7b4b0e6f755437885ff94 [file] [log] [blame]
Daniel Veillardc9484202001-10-24 12:35:52 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000011A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000012--></style>
13<title>A real example</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>A real example</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000026<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000028<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000032<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000033<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000040<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000041<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000043<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000049<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000051<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000052<li><a href="DOM.html">DOM Principles</a></li>
53<li><a href="example.html">A real example</a></li>
54<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000055<li>
56<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
57</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000058</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000059</table>
60<table width="100%" border="0" cellspacing="1" cellpadding="3">
61<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
62<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
63<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
64<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
65<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
66<li><a href="ftp://xmlsoft.org/">FTP</a></li>
67<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
68<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li>
Daniel Veillardc6271d22001-10-27 07:50:58 +000069<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000070</ul></td></tr>
71</table>
72</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000073<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
74<p>Here is a real size example, where the actual content of the application
75data is not kept in the DOM tree but uses internal structures. It is based on
76a proposal to keep a database of jobs related to Gnome, with an XML based
77storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
78base</a>:</p>
79<pre>&lt;?xml version=&quot;1.0&quot;?&gt;
80&lt;gjob:Helping xmlns:gjob=&quot;http://www.gnome.org/some-location&quot;&gt;
81 &lt;gjob:Jobs&gt;
82
83 &lt;gjob:Job&gt;
84 &lt;gjob:Project ID=&quot;3&quot;/&gt;
85 &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
86 &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
87
88 &lt;gjob:Update&gt;
89 &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
90 &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
91 &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
92 &lt;/gjob:Update&gt;
93
94 &lt;gjob:Developers&gt;
95 &lt;gjob:Developer&gt;
96 &lt;/gjob:Developer&gt;
97 &lt;/gjob:Developers&gt;
98
99 &lt;gjob:Contact&gt;
100 &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
101 &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
102 &lt;gjob:Company&gt;
103 &lt;/gjob:Company&gt;
104 &lt;gjob:Organisation&gt;
105 &lt;/gjob:Organisation&gt;
106 &lt;gjob:Webpage&gt;
107 &lt;/gjob:Webpage&gt;
108 &lt;gjob:Snailmail&gt;
109 &lt;/gjob:Snailmail&gt;
110 &lt;gjob:Phone&gt;
111 &lt;/gjob:Phone&gt;
112 &lt;/gjob:Contact&gt;
113
114 &lt;gjob:Requirements&gt;
115 The program should be released as free software, under the GPL.
116 &lt;/gjob:Requirements&gt;
117
118 &lt;gjob:Skills&gt;
119 &lt;/gjob:Skills&gt;
120
121 &lt;gjob:Details&gt;
122 A GNOME based system that will allow a superuser to configure
123 compressed and uncompressed files and/or file systems to be backed
124 up with a supported media in the system. This should be able to
125 perform via find commands generating a list of files that are passed
126 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
127 or via operations performed on the filesystem itself. Email
128 notification and GUI status display very important.
129 &lt;/gjob:Details&gt;
130
131 &lt;/gjob:Job&gt;
132
133 &lt;/gjob:Jobs&gt;
134&lt;/gjob:Helping&gt;</pre>
135<p>While loading the XML file into an internal DOM tree is a matter of
136calling only a couple of functions, browsing the tree to gather the ata and
137generate the internal structures is harder, and more error prone.</p>
138<p>The suggested principle is to be tolerant with respect to the input
139structure. For example, the ordering of the attributes is not significant,
140the XML specification is clear about it. It's also usually a good idea not to
141depend on the order of the children of a given node, unless it really makes
142things harder. Here is some code to parse the information for a person:</p>
143<pre>/*
144 * A person record
145 */
146typedef struct person {
147 char *name;
148 char *email;
149 char *company;
150 char *organisation;
151 char *smail;
152 char *webPage;
153 char *phone;
154} person, *personPtr;
155
156/*
157 * And the code needed to parse it
158 */
159personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
160 personPtr ret = NULL;
161
162DEBUG(&quot;parsePerson\n&quot;);
163 /*
164 * allocate the struct
165 */
166 ret = (personPtr) malloc(sizeof(person));
167 if (ret == NULL) {
168 fprintf(stderr,&quot;out of memory\n&quot;);
169 return(NULL);
170 }
171 memset(ret, 0, sizeof(person));
172
173 /* We don't care what the top level element name is */
174 cur = cur-&gt;xmlChildrenNode;
175 while (cur != NULL) {
176 if ((!strcmp(cur-&gt;name, &quot;Person&quot;)) &amp;&amp; (cur-&gt;ns == ns))
177 ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
178 if ((!strcmp(cur-&gt;name, &quot;Email&quot;)) &amp;&amp; (cur-&gt;ns == ns))
179 ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
180 cur = cur-&gt;next;
181 }
182
183 return(ret);
184}</pre>
185<p>Here are a couple of things to notice:</p>
186<ul>
187<li>Usually a recursive parsing style is the more convenient one: XML data
188 is by nature subject to repetitive constructs and usually exibits highly
189 stuctured patterns.</li>
190<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
191 i.e. the pointer to the global XML document and the namespace reserved to
192 the application. Document wide information are needed for example to
193 decode entities and it's a good coding practice to define a namespace for
194 your application set of data and test that the element and attributes
195 you're analyzing actually pertains to your application space. This is
196 done by a simple equality test (cur-&gt;ns == ns).</li>
197<li>To retrieve text and attributes value, you can use the function
198 <em>xmlNodeListGetString</em> to gather all the text and entity reference
199 nodes generated by the DOM output and produce an single text string.</li>
200</ul>
201<p>Here is another piece of code used to parse another level of the
202structure:</p>
203<pre>#include &lt;libxml/tree.h&gt;
204/*
205 * a Description for a Job
206 */
207typedef struct job {
208 char *projectID;
209 char *application;
210 char *category;
211 personPtr contact;
212 int nbDevelopers;
213 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
214} job, *jobPtr;
215
216/*
217 * And the code needed to parse it
218 */
219jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
220 jobPtr ret = NULL;
221
222DEBUG(&quot;parseJob\n&quot;);
223 /*
224 * allocate the struct
225 */
226 ret = (jobPtr) malloc(sizeof(job));
227 if (ret == NULL) {
228 fprintf(stderr,&quot;out of memory\n&quot;);
229 return(NULL);
230 }
231 memset(ret, 0, sizeof(job));
232
233 /* We don't care what the top level element name is */
234 cur = cur-&gt;xmlChildrenNode;
235 while (cur != NULL) {
236
237 if ((!strcmp(cur-&gt;name, &quot;Project&quot;)) &amp;&amp; (cur-&gt;ns == ns)) {
238 ret-&gt;projectID = xmlGetProp(cur, &quot;ID&quot;);
239 if (ret-&gt;projectID == NULL) {
240 fprintf(stderr, &quot;Project has no ID\n&quot;);
241 }
242 }
243 if ((!strcmp(cur-&gt;name, &quot;Application&quot;)) &amp;&amp; (cur-&gt;ns == ns))
244 ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
245 if ((!strcmp(cur-&gt;name, &quot;Category&quot;)) &amp;&amp; (cur-&gt;ns == ns))
246 ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
247 if ((!strcmp(cur-&gt;name, &quot;Contact&quot;)) &amp;&amp; (cur-&gt;ns == ns))
248 ret-&gt;contact = parsePerson(doc, ns, cur);
249 cur = cur-&gt;next;
250 }
251
252 return(ret);
253}</pre>
254<p>Once you are used to it, writing this kind of code is quite simple, but
255boring. Ultimately, it could be possble to write stubbers taking either C
256data structure definitions, a set of XML examples or an XML DTD and produce
257the code needed to import and export the content between C data and XML
258storage. This is left as an exercise to the reader :-)</p>
259<p>Feel free to use <a href="example/gjobread.c">the code for the full C
260parsing example</a> as a template, it is also available with Makefile in the
261Gnome CVS base under gnome-xml/example</p>
262<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
263</td></tr></table></td></tr></table></td></tr></table></td>
264</tr></table></td></tr></table>
265</body>
266</html>