blob: 480c37219dbafdd88c7b7da8706ab6d49bbe50dd [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillardc9484202001-10-24 12:35:52 +00002<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
Daniel Veillard373a4752002-02-21 14:46:29 +00006TD {font-family: Verdana,Arial,Helvetica}
7BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
8H1 {font-family: Verdana,Arial,Helvetica}
9H2 {font-family: Verdana,Arial,Helvetica}
10H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000011A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000012--></style>
13<title>A real example</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>A real example</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000026<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000028<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000029<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardc9484202001-10-24 12:35:52 +000030<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000032<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000033<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
Daniel Veillard7b602b42002-01-08 13:26:00 +000038<li><a href="XMLinfo.html">XML</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000039<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillard6dbcaf82002-02-20 14:37:47 +000040<li><a href="python.html">Python and bindings</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000041<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000042<li><a href="tree.html">The tree output</a></li>
43<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000044<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
45<li><a href="xmlmem.html">Memory Management</a></li>
46<li><a href="encoding.html">Encodings support</a></li>
47<li><a href="xmlio.html">I/O Interfaces</a></li>
48<li><a href="catalog.html">Catalog support</a></li>
49<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000050<li><a href="entities.html">Entities or no entities</a></li>
51<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000052<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000053<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000054<li><a href="DOM.html">DOM Principles</a></li>
55<li><a href="example.html">A real example</a></li>
56<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000057<li>
58<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
59</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000060</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000061</table>
62<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard3bf65be2002-01-23 12:36:34 +000063<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
64<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardf8592562002-01-23 17:58:17 +000065<li><a href="APIchunk0.html">Alphabetic</a></li>
Daniel Veillard3bf65be2002-01-23 12:36:34 +000066<li><a href="APIconstructors.html">Constructors</a></li>
67<li><a href="APIfunctions.html">Functions/Types</a></li>
68<li><a href="APIfiles.html">Modules</a></li>
69<li><a href="APIsymbols.html">Symbols</a></li>
70</ul></td></tr>
71</table>
72<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillard594cf0b2001-10-25 08:09:12 +000073<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000074<tr><td bgcolor="#fffacd"><ul>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000075<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
76<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
Daniel Veillard4a859202002-01-08 11:49:22 +000077<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000078<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000079<li><a href="ftp://xmlsoft.org/">FTP</a></li>
80<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000081<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000082<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000083</ul></td></tr>
84</table>
85</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000086<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
87<p>Here is a real size example, where the actual content of the application
88data is not kept in the DOM tree but uses internal structures. It is based on
89a proposal to keep a database of jobs related to Gnome, with an XML based
90storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
91base</a>:</p>
92<pre>&lt;?xml version=&quot;1.0&quot;?&gt;
93&lt;gjob:Helping xmlns:gjob=&quot;http://www.gnome.org/some-location&quot;&gt;
94 &lt;gjob:Jobs&gt;
95
96 &lt;gjob:Job&gt;
97 &lt;gjob:Project ID=&quot;3&quot;/&gt;
98 &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
99 &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
100
101 &lt;gjob:Update&gt;
102 &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
103 &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
104 &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
105 &lt;/gjob:Update&gt;
106
107 &lt;gjob:Developers&gt;
108 &lt;gjob:Developer&gt;
109 &lt;/gjob:Developer&gt;
110 &lt;/gjob:Developers&gt;
111
112 &lt;gjob:Contact&gt;
113 &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
114 &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
115 &lt;gjob:Company&gt;
116 &lt;/gjob:Company&gt;
117 &lt;gjob:Organisation&gt;
118 &lt;/gjob:Organisation&gt;
119 &lt;gjob:Webpage&gt;
120 &lt;/gjob:Webpage&gt;
121 &lt;gjob:Snailmail&gt;
122 &lt;/gjob:Snailmail&gt;
123 &lt;gjob:Phone&gt;
124 &lt;/gjob:Phone&gt;
125 &lt;/gjob:Contact&gt;
126
127 &lt;gjob:Requirements&gt;
128 The program should be released as free software, under the GPL.
129 &lt;/gjob:Requirements&gt;
130
131 &lt;gjob:Skills&gt;
132 &lt;/gjob:Skills&gt;
133
134 &lt;gjob:Details&gt;
135 A GNOME based system that will allow a superuser to configure
136 compressed and uncompressed files and/or file systems to be backed
137 up with a supported media in the system. This should be able to
138 perform via find commands generating a list of files that are passed
139 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
140 or via operations performed on the filesystem itself. Email
141 notification and GUI status display very important.
142 &lt;/gjob:Details&gt;
143
144 &lt;/gjob:Job&gt;
145
146 &lt;/gjob:Jobs&gt;
147&lt;/gjob:Helping&gt;</pre>
148<p>While loading the XML file into an internal DOM tree is a matter of
149calling only a couple of functions, browsing the tree to gather the ata and
150generate the internal structures is harder, and more error prone.</p>
151<p>The suggested principle is to be tolerant with respect to the input
152structure. For example, the ordering of the attributes is not significant,
153the XML specification is clear about it. It's also usually a good idea not to
154depend on the order of the children of a given node, unless it really makes
155things harder. Here is some code to parse the information for a person:</p>
156<pre>/*
157 * A person record
158 */
159typedef struct person {
160 char *name;
161 char *email;
162 char *company;
163 char *organisation;
164 char *smail;
165 char *webPage;
166 char *phone;
167} person, *personPtr;
168
169/*
170 * And the code needed to parse it
171 */
172personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
173 personPtr ret = NULL;
174
175DEBUG(&quot;parsePerson\n&quot;);
176 /*
177 * allocate the struct
178 */
179 ret = (personPtr) malloc(sizeof(person));
180 if (ret == NULL) {
181 fprintf(stderr,&quot;out of memory\n&quot;);
182 return(NULL);
183 }
184 memset(ret, 0, sizeof(person));
185
186 /* We don't care what the top level element name is */
187 cur = cur-&gt;xmlChildrenNode;
188 while (cur != NULL) {
189 if ((!strcmp(cur-&gt;name, &quot;Person&quot;)) &amp;&amp; (cur-&gt;ns == ns))
190 ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
191 if ((!strcmp(cur-&gt;name, &quot;Email&quot;)) &amp;&amp; (cur-&gt;ns == ns))
192 ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
193 cur = cur-&gt;next;
194 }
195
196 return(ret);
197}</pre>
198<p>Here are a couple of things to notice:</p>
199<ul>
200<li>Usually a recursive parsing style is the more convenient one: XML data
201 is by nature subject to repetitive constructs and usually exibits highly
202 stuctured patterns.</li>
203<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
204 i.e. the pointer to the global XML document and the namespace reserved to
205 the application. Document wide information are needed for example to
206 decode entities and it's a good coding practice to define a namespace for
207 your application set of data and test that the element and attributes
208 you're analyzing actually pertains to your application space. This is
209 done by a simple equality test (cur-&gt;ns == ns).</li>
210<li>To retrieve text and attributes value, you can use the function
211 <em>xmlNodeListGetString</em> to gather all the text and entity reference
212 nodes generated by the DOM output and produce an single text string.</li>
213</ul>
214<p>Here is another piece of code used to parse another level of the
215structure:</p>
216<pre>#include &lt;libxml/tree.h&gt;
217/*
218 * a Description for a Job
219 */
220typedef struct job {
221 char *projectID;
222 char *application;
223 char *category;
224 personPtr contact;
225 int nbDevelopers;
226 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
227} job, *jobPtr;
228
229/*
230 * And the code needed to parse it
231 */
232jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
233 jobPtr ret = NULL;
234
235DEBUG(&quot;parseJob\n&quot;);
236 /*
237 * allocate the struct
238 */
239 ret = (jobPtr) malloc(sizeof(job));
240 if (ret == NULL) {
241 fprintf(stderr,&quot;out of memory\n&quot;);
242 return(NULL);
243 }
244 memset(ret, 0, sizeof(job));
245
246 /* We don't care what the top level element name is */
247 cur = cur-&gt;xmlChildrenNode;
248 while (cur != NULL) {
249
250 if ((!strcmp(cur-&gt;name, &quot;Project&quot;)) &amp;&amp; (cur-&gt;ns == ns)) {
251 ret-&gt;projectID = xmlGetProp(cur, &quot;ID&quot;);
252 if (ret-&gt;projectID == NULL) {
253 fprintf(stderr, &quot;Project has no ID\n&quot;);
254 }
255 }
256 if ((!strcmp(cur-&gt;name, &quot;Application&quot;)) &amp;&amp; (cur-&gt;ns == ns))
257 ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
258 if ((!strcmp(cur-&gt;name, &quot;Category&quot;)) &amp;&amp; (cur-&gt;ns == ns))
259 ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
260 if ((!strcmp(cur-&gt;name, &quot;Contact&quot;)) &amp;&amp; (cur-&gt;ns == ns))
261 ret-&gt;contact = parsePerson(doc, ns, cur);
262 cur = cur-&gt;next;
263 }
264
265 return(ret);
266}</pre>
267<p>Once you are used to it, writing this kind of code is quite simple, but
268boring. Ultimately, it could be possble to write stubbers taking either C
269data structure definitions, a set of XML examples or an XML DTD and produce
270the code needed to import and export the content between C data and XML
271storage. This is left as an exercise to the reader :-)</p>
272<p>Feel free to use <a href="example/gjobread.c">the code for the full C
273parsing example</a> as a template, it is also available with Makefile in the
274Gnome CVS base under gnome-xml/example</p>
Daniel Veillard3f4c40f2002-02-13 09:19:28 +0000275<p><a href="bugs.html">Daniel Veillard</a></p>
Daniel Veillardc9484202001-10-24 12:35:52 +0000276</td></tr></table></td></tr></table></td></tr></table></td>
277</tr></table></td></tr></table>
278</body>
279</html>