blob: 194f61935d50770a43eee4130d76157e96c5e2d3 [file] [log] [blame]
Daniel Veillard1177ca42003-04-26 22:29:54 +00001<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Daniel Veillard373a4752002-02-21 14:46:29 +00004TD {font-family: Verdana,Arial,Helvetica}
5BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
6H1 {font-family: Verdana,Arial,Helvetica}
7H2 {font-family: Verdana,Arial,Helvetica}
8H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00009A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardfabafd52006-06-08 08:16:33 +000010</style><title>A real example</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>A real example</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Developer Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html" style="font-weight:bold">Main Menu</a></li><li><a href="html/index.html" style="font-weight:bold">Reference Manual</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="tutorial/index.html">Tutorial</a></li><li><a href="xmlreader.html">The Reader Interface</a></li><li><a href="ChangeLog.html">ChangeLog</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="python.html">Python and bindings</a></li><li><a href="architecture.html">libxml2 architecture</a></li><li><a href="tree.html">The tree output</a></li><li><a href="interface.html">The SAX interface</a></li><li><a href="xmlmem.html">Memory Management</a></li><li><a href="xmlio.html">I/O Interfaces</a></li><li><a href="library.html">The parser interfaces</a></li><li><a href="entities.html">Entities or no entities</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="upgrade.html">Upgrading 1.x code</a></li><li><a href="threads.html">Thread safety</a></li><li><a href="DOM.html">DOM Principles</a></li><li><a href="example.html">A real example</a></li><li><a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="APIchunk0.html">Alphabetic</a></li><li><a href="APIconstructors.html">Constructors</a></li><li><a href="APIfunctions.html">Functions/Types</a></li><li><a href="APIfiles.html">Modules</a></li><li><a href="APIsymbols.html">Symbols</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://www.blastwave.org/packages.php/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>Here is a real size example, where the actual content of
11theapplicationdata is not kept in the DOM tree but uses internal structures.
12Itis based ona proposal to keep a database of jobs related to Gnome, with
13anXML basedstorage structure. Here is an <a href="gjobs.xml">XML
14encodedjobsbase</a>:</p><pre>&lt;?xml version="1.0"?&gt;
Daniel Veillard024f1992003-12-10 16:43:49 +000015&lt;gjob:Helping xmlns:gjob="http://www.gnome.org/some-location"&gt;
Daniel Veillardc9484202001-10-24 12:35:52 +000016 &lt;gjob:Jobs&gt;
17
18 &lt;gjob:Job&gt;
Daniel Veillard024f1992003-12-10 16:43:49 +000019 &lt;gjob:Project ID="3"/&gt;
Daniel Veillardc9484202001-10-24 12:35:52 +000020 &lt;gjob:Application&gt;GBackup&lt;/gjob:Application&gt;
21 &lt;gjob:Category&gt;Development&lt;/gjob:Category&gt;
22
23 &lt;gjob:Update&gt;
24 &lt;gjob:Status&gt;Open&lt;/gjob:Status&gt;
25 &lt;gjob:Modified&gt;Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified&gt;
26 &lt;gjob:Salary&gt;USD 0.00&lt;/gjob:Salary&gt;
27 &lt;/gjob:Update&gt;
28
29 &lt;gjob:Developers&gt;
30 &lt;gjob:Developer&gt;
31 &lt;/gjob:Developer&gt;
32 &lt;/gjob:Developers&gt;
33
34 &lt;gjob:Contact&gt;
35 &lt;gjob:Person&gt;Nathan Clemons&lt;/gjob:Person&gt;
36 &lt;gjob:Email&gt;nathan@windsofstorm.net&lt;/gjob:Email&gt;
37 &lt;gjob:Company&gt;
38 &lt;/gjob:Company&gt;
39 &lt;gjob:Organisation&gt;
40 &lt;/gjob:Organisation&gt;
41 &lt;gjob:Webpage&gt;
42 &lt;/gjob:Webpage&gt;
43 &lt;gjob:Snailmail&gt;
44 &lt;/gjob:Snailmail&gt;
45 &lt;gjob:Phone&gt;
46 &lt;/gjob:Phone&gt;
47 &lt;/gjob:Contact&gt;
48
49 &lt;gjob:Requirements&gt;
50 The program should be released as free software, under the GPL.
51 &lt;/gjob:Requirements&gt;
52
53 &lt;gjob:Skills&gt;
54 &lt;/gjob:Skills&gt;
55
56 &lt;gjob:Details&gt;
57 A GNOME based system that will allow a superuser to configure
58 compressed and uncompressed files and/or file systems to be backed
59 up with a supported media in the system. This should be able to
60 perform via find commands generating a list of files that are passed
61 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
62 or via operations performed on the filesystem itself. Email
63 notification and GUI status display very important.
64 &lt;/gjob:Details&gt;
65
66 &lt;/gjob:Job&gt;
67
68 &lt;/gjob:Jobs&gt;
Daniel Veillardfabafd52006-06-08 08:16:33 +000069&lt;/gjob:Helping&gt;</pre><p>While loading the XML file into an internal DOM tree is a matter
70ofcallingonly a couple of functions, browsing the tree to gather the data
71andgeneratethe internal structures is harder, and more error prone.</p><p>The suggested principle is to be tolerant with respect to
72theinputstructure. For example, the ordering of the attributes is
73notsignificant,the XML specification is clear about it. It's also usually a
74goodidea not todepend on the order of the children of a given node, unless
75itreally makesthings harder. Here is some code to parse the information for
76aperson:</p><pre>/*
Daniel Veillardc9484202001-10-24 12:35:52 +000077 * A person record
78 */
79typedef struct person {
80 char *name;
81 char *email;
82 char *company;
83 char *organisation;
84 char *smail;
85 char *webPage;
86 char *phone;
87} person, *personPtr;
88
89/*
90 * And the code needed to parse it
91 */
92personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
93 personPtr ret = NULL;
94
Daniel Veillard024f1992003-12-10 16:43:49 +000095DEBUG("parsePerson\n");
Daniel Veillardc9484202001-10-24 12:35:52 +000096 /*
97 * allocate the struct
98 */
99 ret = (personPtr) malloc(sizeof(person));
100 if (ret == NULL) {
Daniel Veillard024f1992003-12-10 16:43:49 +0000101 fprintf(stderr,"out of memory\n");
Daniel Veillardc9484202001-10-24 12:35:52 +0000102 return(NULL);
103 }
104 memset(ret, 0, sizeof(person));
105
106 /* We don't care what the top level element name is */
107 cur = cur-&gt;xmlChildrenNode;
108 while (cur != NULL) {
Daniel Veillard024f1992003-12-10 16:43:49 +0000109 if ((!strcmp(cur-&gt;name, "Person")) &amp;&amp; (cur-&gt;ns == ns))
Daniel Veillardc9484202001-10-24 12:35:52 +0000110 ret-&gt;name = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
Daniel Veillard024f1992003-12-10 16:43:49 +0000111 if ((!strcmp(cur-&gt;name, "Email")) &amp;&amp; (cur-&gt;ns == ns))
Daniel Veillardc9484202001-10-24 12:35:52 +0000112 ret-&gt;email = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
113 cur = cur-&gt;next;
114 }
115
116 return(ret);
Daniel Veillardfabafd52006-06-08 08:16:33 +0000117}</pre><p>Here are a couple of things to notice:</p><ul><li>Usually a recursive parsing style is the more convenient one: XMLdatais
118 by nature subject to repetitive constructs and usually
119 exhibitshighlystructured patterns.</li>
120 <li>The two arguments of type <em>xmlDocPtr</em>and
121 <em>xmlNsPtr</em>,i.e.the pointer to the global XML document and the
122 namespace reserved totheapplication. Document wide information are needed
123 for example todecodeentities and it's a good coding practice to define a
124 namespace foryourapplication set of data and test that the element and
125 attributesyou'reanalyzing actually pertains to your application space.
126 This isdone by asimple equality test (cur-&gt;ns == ns).</li>
127 <li>To retrieve text and attributes value, you can use
128 thefunction<em>xmlNodeListGetString</em>to gather all the text and
129 entityreferencenodes generated by the DOM output and produce an single
130 textstring.</li>
131</ul><p>Here is another piece of code used to parse another level
132ofthestructure:</p><pre>#include &lt;libxml/tree.h&gt;
Daniel Veillardc9484202001-10-24 12:35:52 +0000133/*
134 * a Description for a Job
135 */
136typedef struct job {
137 char *projectID;
138 char *application;
139 char *category;
140 personPtr contact;
141 int nbDevelopers;
142 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
143} job, *jobPtr;
144
145/*
146 * And the code needed to parse it
147 */
148jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
149 jobPtr ret = NULL;
150
Daniel Veillard024f1992003-12-10 16:43:49 +0000151DEBUG("parseJob\n");
Daniel Veillardc9484202001-10-24 12:35:52 +0000152 /*
153 * allocate the struct
154 */
155 ret = (jobPtr) malloc(sizeof(job));
156 if (ret == NULL) {
Daniel Veillard024f1992003-12-10 16:43:49 +0000157 fprintf(stderr,"out of memory\n");
Daniel Veillardc9484202001-10-24 12:35:52 +0000158 return(NULL);
159 }
160 memset(ret, 0, sizeof(job));
161
162 /* We don't care what the top level element name is */
163 cur = cur-&gt;xmlChildrenNode;
164 while (cur != NULL) {
165
Daniel Veillard024f1992003-12-10 16:43:49 +0000166 if ((!strcmp(cur-&gt;name, "Project")) &amp;&amp; (cur-&gt;ns == ns)) {
167 ret-&gt;projectID = xmlGetProp(cur, "ID");
Daniel Veillardc9484202001-10-24 12:35:52 +0000168 if (ret-&gt;projectID == NULL) {
Daniel Veillard024f1992003-12-10 16:43:49 +0000169 fprintf(stderr, "Project has no ID\n");
Daniel Veillardc9484202001-10-24 12:35:52 +0000170 }
171 }
Daniel Veillard024f1992003-12-10 16:43:49 +0000172 if ((!strcmp(cur-&gt;name, "Application")) &amp;&amp; (cur-&gt;ns == ns))
Daniel Veillardc9484202001-10-24 12:35:52 +0000173 ret-&gt;application = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
Daniel Veillard024f1992003-12-10 16:43:49 +0000174 if ((!strcmp(cur-&gt;name, "Category")) &amp;&amp; (cur-&gt;ns == ns))
Daniel Veillardc9484202001-10-24 12:35:52 +0000175 ret-&gt;category = xmlNodeListGetString(doc, cur-&gt;xmlChildrenNode, 1);
Daniel Veillard024f1992003-12-10 16:43:49 +0000176 if ((!strcmp(cur-&gt;name, "Contact")) &amp;&amp; (cur-&gt;ns == ns))
Daniel Veillardc9484202001-10-24 12:35:52 +0000177 ret-&gt;contact = parsePerson(doc, ns, cur);
178 cur = cur-&gt;next;
179 }
180
181 return(ret);
Daniel Veillardfabafd52006-06-08 08:16:33 +0000182}</pre><p>Once you are used to it, writing this kind of code is quite
183simple,butboring. Ultimately, it could be possible to write stubbers taking
184eitherCdata structure definitions, a set of XML examples or an XML DTD
185andproducethe code needed to import and export the content between C data
186andXMLstorage. This is left as an exercise to the reader :-)</p><p>Feel free to use <a href="example/gjobread.c">the code for the
187fullCparsing example</a>as a template, it is also available with Makefile
188intheGnome CVS base under gnome-xml/example</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>