blob: edeb391e0f746f72e3b581c6e0fc0de79b9f0033 [file] [log] [blame]
Daniel Veillardccb09631998-10-27 06:21:04 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
Daniel Veillardb05deb71999-08-10 19:04:08 +00002 "http://www.w3.org/TR/REC-html40/loose.dtd">
Daniel Veillardccb09631998-10-27 06:21:04 +00003<html>
4<head>
Daniel Veillardb05deb71999-08-10 19:04:08 +00005 <title>The XML library for Gnome</title>
6 <meta name="GENERATOR" content="amaya V2.1">
Daniel Veillardccb09631998-10-27 06:21:04 +00007</head>
Daniel Veillardccb09631998-10-27 06:21:04 +00008
Daniel Veillardb05deb71999-08-10 19:04:08 +00009<body bgcolor="#ffffff">
Daniel Veillardccb09631998-10-27 06:21:04 +000010<h1 align="center">The XML library for Gnome</h1>
Daniel Veillardb05deb71999-08-10 19:04:08 +000011
12<p>This document describes the <a href="http://www.w3.org/XML/">XML</a>
13library provideed in the <a href="http://www.gnome.org/">Gnome</a> framework.
14XML is a standard to build tag based structured documents/data. </p>
15
16<p>The internal document repesentation is as close as possible to the <a
17href="http://www.w3.org/DOM/">DOM</a> interfaces. </p>
18
19<p>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX
20interface</a>, <a href="mailto:james@daa.com.au">James Henstridge</a> made <a
21href="http://www.daa.com.au/~james/gnome/xml-sax/xml-sax.html">a nice
22documentation</a> expaining how to use it. The interface is as compatible as
23possible with <a href="http://www.jclark.com/xml/expat.html">Expat</a>
24one.</p>
25
26<p>The code is commented in a <a href=""></a>way which allow <a
27href="http://rpmfind.net/veillard/XML/libxml.html">extensive documentation</a>
28to be automatically extracted.</p>
29
30<p>There is also a mailing-list <a
31href="xml@rufus.w3.org">xml@rufus.w3.org</a> for libxml, with an <a
32href="http://rpmfind.net/veillard/XML/messages">on-line archive</a>. To
33subscribe to this majordomo based list, send a mail to <a
34href="majordomo@rufus.w3.org">majordomo@rufus.w3.org</a> with "subscribe xml"
35in the <strong>content</strong> of the message.</p>
36
37<p>This library is released both under the W3C Copyright and the GNU LGP,
38basically everybody should be happy, if not, drop me a mail.</p>
39
40<p>People are invited to use the <a
41href="http://cvs.gnome.org/lxr/source/gdome/">gdome Gnome module to</a> get a
42full DOM interface, thanks to <a href="mailto:raph@levien.com">Raph
43Levien</a>, check his <a
44href="http://www.levien.com/gnome/domination.html">DOMination paper</a>. He
45uses it for his implementation of <a
46href="http://www.w3.org/Graphics/SVG/">SVG</a> called <a
47href="http://www.levien.com/svg/">gill</a>.</p>
Daniel Veillardccb09631998-10-27 06:21:04 +000048
49<h2>xml</h2>
Daniel Veillardb05deb71999-08-10 19:04:08 +000050
51<p>XML is a standard for markup based structured documents, here is <a
Daniel Veillard10c6a8f1998-10-28 01:00:12 +000052name="example">an example</a>:</p>
Daniel Veillardccb09631998-10-27 06:21:04 +000053<pre>&lt;?xml version="1.0"?>
Daniel Veillard14fff061999-06-22 21:49:07 +000054&lt;EXAMPLE prop1="gnome is great" prop2="&amp;amp; linux too">
Daniel Veillardccb09631998-10-27 06:21:04 +000055 &lt;head>
56 &lt;title>Welcome to Gnome&lt;/title>
57 &lt;/head>
58 &lt;chapter>
59 &lt;title>The Linux adventure&lt;/title>
60 &lt;p>bla bla bla ...&lt;/p>
61 &lt;image href="linus.gif"/>
62 &lt;p>...&lt;/p>
63 &lt;/chapter>
64&lt;/EXAMPLE></pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +000065
66<p>The first line specify that it's an XML document and gives useful
67informations about it's encoding. Then the document is a text format whose
68structure is specified by tags between brackets. <strong>Each tag opened have
69to be closed</strong> XML is pedantic about this, not that for example the
70image tag has no content (just an attribute) and is closed by ending up the
71tag with <code>/></code>.</p>
Daniel Veillardccb09631998-10-27 06:21:04 +000072
73<h2>The tree output</h2>
Daniel Veillardb05deb71999-08-10 19:04:08 +000074
75<p>The parser returns a tree built during the document analysis. The value
Daniel Veillardccb09631998-10-27 06:21:04 +000076returned is an <strong>xmlDocPtr</strong> (i.e. a pointer to an
77<strong>xmlDoc</strong> structure). This structure contains informations like
78the file name, the document type, and a <strong>root</strong> pointer which
79is the root of the document (or more exactly the first child under the root
80which is the document). The tree is made of <strong>xmlNode</strong>s, chained
81in double linked lists of siblings and with childs&lt;->parent relationship.
82An xmlNode can also carry properties (a chain of xmlAttr structures). An
83attribute may have a value which is a list of TEXT or ENTITY_REF nodes.</p>
Daniel Veillardb05deb71999-08-10 19:04:08 +000084
85<p>Here is an example (erroneous w.r.t. the XML spec since there should be
86only one ELEMENT under the root):</p>
87
88<p><img src="structure.gif" alt=" structure.gif "></p>
89
90<p>In the source package there is a small program (not installed by default)
Daniel Veillard10c6a8f1998-10-28 01:00:12 +000091called <strong>tester</strong> which parses XML files given as argument and
92prints them back as parsed, this is useful to detect errors both in XML code
93and in the XML parser itself. It has an option <strong>--debug</strong> which
94prints the actual in-memory structure of the document, here is the result with
95the <a href="#example">example</a> given before:</p>
96<pre>DOCUMENT
97version=1.0
98standalone=true
99 ELEMENT EXAMPLE
100 ATTRIBUTE prop1
101 TEXT
102 content=gnome is great
103 ATTRIBUTE prop2
104 ENTITY_REF
105 TEXT
106 content= too
107 ELEMENT head
108 ELEMENT title
Daniel Veillard25940b71998-10-29 05:51:30 +0000109 TEXT
110 content=Welcome to Gnome
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000111 ELEMENT chapter
112 ELEMENT title
Daniel Veillard25940b71998-10-29 05:51:30 +0000113 TEXT
114 content=The Linux adventure
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000115 ELEMENT p
Daniel Veillard25940b71998-10-29 05:51:30 +0000116 TEXT
117 content=bla bla bla ...
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000118 ELEMENT image
119 ATTRIBUTE href
120 TEXT
121 content=linus.gif
122 ELEMENT p
Daniel Veillard25940b71998-10-29 05:51:30 +0000123 TEXT
124 content=...</pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000125
126<p>This should be useful to learn the internal representation model.</p>
Daniel Veillardccb09631998-10-27 06:21:04 +0000127
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000128<h2>The XML library interfaces</h2>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000129
130<p>This section is directly intended to help programmers getting bootstrapped
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000131using the XML library from the C language. It doesn't intent to be extensive,
132I hope the automatically generated docs will provide the completeness
133required, but as a separated set of documents. The interfaces of the XML
134library are by principle low level, there is nearly zero abstration. Those
135interested in a higher level API should <a href="#DOM">look at DOM</a>
136(unfortunately not completed).</p>
Daniel Veillardccb09631998-10-27 06:21:04 +0000137
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000138<h3>Invoking the parser</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000139
140<p>Usually, the first thing to do is to read an XML input, the parser accepts
141to parse both memory mapped documents or direct files. The functions are
142defined in "parser.h":</p>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000143<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000144 <dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
145 <dd><p>parse a zero terminated string containing the document</p>
146 </dd>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000147</dl>
148<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000149 <dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
150 <dd><p>parse an XML document contained in a file (possibly compressed)</p>
151 </dd>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000152</dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000153
154<p>This returns a pointer to the document structure (or NULL in case of
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000155failure).</p>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000156
157<p>A couple of comments can be made, first this mean that the parser is
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000158memory-hungry, first to load the document in memory, second to build the tree.
159Reading a document without building the tree will be possible in the future by
160pluggin the code to the SAX interface (see SAX.c).</p>
Daniel Veillardccb09631998-10-27 06:21:04 +0000161
Daniel Veillard25940b71998-10-29 05:51:30 +0000162<h3>Building a tree from scratch</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000163
164<p>The other way to get an XML tree in memory is by building it. Basically
165there is a set of functions dedicated to building new elements, those are also
Daniel Veillard25940b71998-10-29 05:51:30 +0000166described in "tree.h", here is for example the piece of code producing the
167example used before:</p>
168<pre> xmlDocPtr doc;
169 xmlNodePtr tree, subtree;
170
171 doc = xmlNewDoc("1.0");
172 doc->root = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
173 xmlSetProp(doc->root, "prop1", "gnome is great");
174 xmlSetProp(doc->root, "prop2", "&amp;linux; too");
175 tree = xmlNewChild(doc->root, NULL, "head", NULL);
176 subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
177 tree = xmlNewChild(doc->root, NULL, "chapter", NULL);
178 subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
179 subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
180 subtree = xmlNewChild(tree, NULL, "image", NULL);
181 xmlSetProp(subtree, "href", "linus.gif");</pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000182
183<p>Not really rocket science ...</p>
Daniel Veillard25940b71998-10-29 05:51:30 +0000184
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000185<h3>Traversing the tree</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000186
187<p>Basically by including "tree.h" your code has access to the internal
188structure of all the element of the tree. The names should be somewhat simple
189like <strong>parent</strong>, <strong>childs</strong>, <strong>next</strong>,
Daniel Veillard25940b71998-10-29 05:51:30 +0000190<strong>prev</strong>, <strong>properties</strong>, etc... For example still
191with the previous example:</p>
192<pre><code>doc->root->childs->childs</code></pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000193
194<p>points to the title element,</p>
Daniel Veillard25940b71998-10-29 05:51:30 +0000195<pre>doc->root->childs->next->child->child</pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000196
197<p>points to the text node containing the chapter titlle "The Linux adventure"
Daniel Veillard25940b71998-10-29 05:51:30 +0000198and</p>
199<pre>doc->root->properties->next->val</pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000200
201<p>points to the entity reference containing the value of "&amp;linux" at the
Daniel Veillard25940b71998-10-29 05:51:30 +0000202beginning of the second attribute of the root element "EXAMPLE".</p>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000203
204<h3>Modifying the tree</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000205
206<p>functions are provided to read and write the document content:</p>
Daniel Veillard25940b71998-10-29 05:51:30 +0000207<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000208 <dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const CHAR *name, const
209 CHAR *value);</code></dt>
210 <dd><p>This set (or change) an attribute carried by an ELEMENT node the
211 value can be NULL</p>
212 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000213</dl>
214<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000215 <dt><code>const CHAR *xmlGetProp(xmlNodePtr node, const CHAR
216 *name);</code></dt>
217 <dd><p>This function returns a pointer to the property content, note that
218 no extra copy is made</p>
219 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000220</dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000221
222<p>Two functions must be used to read an write the text associated to
Daniel Veillard25940b71998-10-29 05:51:30 +0000223elements:</p>
224<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000225 <dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const CHAR
226 *value);</code></dt>
227 <dd><p>This function takes an "external" string and convert it to one text
228 node or possibly to a list of entity and text nodes. All non-predefined
229 entity references like &amp;Gnome; will be stored internally as an
230 entity node, hence the result of the function may not be a single
231 node.</p>
232 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000233</dl>
234<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000235 <dt><code>CHAR *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
236 inLine);</code></dt>
237 <dd><p>this is the dual function, which generate a new string containing
238 the content of the text and entity nodes. Note the extra argument
239 inLine, if set to 1 instead of returning the &amp;Gnome; XML encoding in
240 the string it will substitute it with it's value say "GNU Network Object
241 Model Environment". Set it if you want to use the string for non XML
242 usage like User Interface.</p>
243 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000244</dl>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000245
246<h3>Saving a tree</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000247
248<p>Basically 3 options are possible:</p>
Daniel Veillard25940b71998-10-29 05:51:30 +0000249<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000250 <dt><code>void xmlDocDumpMemory(xmlDocPtr cur, CHAR**mem, int
251 *size);</code></dt>
252 <dd><p>returns a buffer where the document has been saved</p>
253 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000254</dl>
255<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000256 <dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
257 <dd><p>dumps a buffer to an open file descriptor</p>
258 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000259</dl>
260<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000261 <dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
262 <dd><p>save the document ot a file. In that case the compression interface
263 is triggered if turned on</p>
264 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000265</dl>
Daniel Veillard10c6a8f1998-10-28 01:00:12 +0000266
Daniel Veillard25940b71998-10-29 05:51:30 +0000267<h3>Compression</h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000268
269<p>The library handle transparently compression when doing file based
270accesses, the level of compression on saves can be tuned either globally or
271individually for one file:</p>
Daniel Veillard25940b71998-10-29 05:51:30 +0000272<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000273 <dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
274 <dd><p>Get the document compression ratio (0-9)</p>
275 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000276</dl>
277<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000278 <dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
279 <dd><p>Set the document compression ratio</p>
280 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000281</dl>
282<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000283 <dt><code>int xmlGetCompressMode(void);</code></dt>
284 <dd><p>Get the default compression ratio</p>
285 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000286</dl>
287<dl>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000288 <dt><code>void xmlSetCompressMode(int mode);</code></dt>
289 <dd><p>set the default compression ratio</p>
290 </dd>
Daniel Veillard25940b71998-10-29 05:51:30 +0000291</dl>
292
293<h2><a name="DOM">DOM Principles</a></h2>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000294
295<p><a href="http://www.w3.org/DOM/">DOM</a> stands for the <em>Document Object
Daniel Veillardccb09631998-10-27 06:21:04 +0000296Model</em> this is an API for accessing XML or HTML structured documents.
297Native support for DOM in Gnome is on the way (module gnome-dom), and it will
Daniel Veillard25940b71998-10-29 05:51:30 +0000298be based on gnome-xml. This will be a far cleaner interface to manipulate XML
299files within Gnome since it won't expose the internal structure. DOM defiles a
300set of IDL (or Java) interfaces allowing to traverse and manipulate a
301document. The DOM library will allow accessing and modifying "live" documents
302presents on other programs like this:</p>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000303
304<p><img src="DOM.gif" alt=" DOM.gif "></p>
305
306<p>This should help greatly doing things like modifying a gnumeric spreadsheet
Daniel Veillardccb09631998-10-27 06:21:04 +0000307embedded in a GWP document for example.</p>
Daniel Veillard14fff061999-06-22 21:49:07 +0000308
309<h3><a name="Example">A real example</a></h3>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000310
311<p>Here is a real size example, where the actual content of the application
312data is not kept in the DOM tree but uses internal structures. It is based on
Daniel Veillard14fff061999-06-22 21:49:07 +0000313a proposal to keep a database of jobs related to Gnome, with an XML based
Daniel Veillardb05deb71999-08-10 19:04:08 +0000314storage structure. Here is an <a href="gjobs.xml">XML encoded jobs
315base</a>:</p>
316<pre>&lt;?xml version="1.0"?>
Daniel Veillard14fff061999-06-22 21:49:07 +0000317&lt;gjob:Helping xmlns:gjob="http://www.gnome.org/some-location">
318 &lt;gjob:Jobs>
319
320 &lt;gjob:Job>
321 &lt;gjob:Project ID="3"/>
322 &lt;gjob:Application>GBackup&lt;/gjob:Application>
323 &lt;gjob:Category>Development&lt;/gjob:Category>
324
325 &lt;gjob:Update>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000326 &lt;gjob:Status>Open&lt;/gjob:Status>
327 &lt;gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST&lt;/gjob:Modified>
Daniel Veillard14fff061999-06-22 21:49:07 +0000328 &lt;gjob:Salary>USD 0.00&lt;/gjob:Salary>
329 &lt;/gjob:Update>
330
331 &lt;gjob:Developers>
332 &lt;gjob:Developer>
333 &lt;/gjob:Developer>
334 &lt;/gjob:Developers>
335
336 &lt;gjob:Contact>
337 &lt;gjob:Person>Nathan Clemons&lt;/gjob:Person>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000338 &lt;gjob:Email>nathan@windsofstorm.net&lt;/gjob:Email>
Daniel Veillard14fff061999-06-22 21:49:07 +0000339 &lt;gjob:Company>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000340 &lt;/gjob:Company>
Daniel Veillard14fff061999-06-22 21:49:07 +0000341 &lt;gjob:Organisation>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000342 &lt;/gjob:Organisation>
Daniel Veillard14fff061999-06-22 21:49:07 +0000343 &lt;gjob:Webpage>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000344 &lt;/gjob:Webpage>
345 &lt;gjob:Snailmail>
346 &lt;/gjob:Snailmail>
347 &lt;gjob:Phone>
348 &lt;/gjob:Phone>
Daniel Veillard14fff061999-06-22 21:49:07 +0000349 &lt;/gjob:Contact>
350
351 &lt;gjob:Requirements>
352 The program should be released as free software, under the GPL.
353 &lt;/gjob:Requirements>
354
355 &lt;gjob:Skills>
356 &lt;/gjob:Skills>
357
358 &lt;gjob:Details>
359 A GNOME based system that will allow a superuser to configure
360 compressed and uncompressed files and/or file systems to be backed
361 up with a supported media in the system. This should be able to
362 perform via find commands generating a list of files that are passed
363 to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
364 or via operations performed on the filesystem itself. Email
365 notification and GUI status display very important.
366 &lt;/gjob:Details>
367
368 &lt;/gjob:Job>
369
370 &lt;/gjob:Jobs>
371&lt;/gjob:Helping>
Daniel Veillard14fff061999-06-22 21:49:07 +0000372</pre>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000373
374<p>While loading the XML file into an internal DOM tree is a matter of calling
375only a couple of functions, browsing the tree to gather the informations and
376generate the internals structures is harder, and more error prone.</p>
377
378<p>The suggested principle is to be tolerant with respect to the input
379structure. For example the ordering of the attributes is not significant, Cthe
380XML specification is clear about it. It's also usually a good idea to not be
381dependant of the orders of the childs of a given node, unless it really makes
382things harder. Here is some code to parse the informations for a person:</p>
383<pre>/*
Daniel Veillard14fff061999-06-22 21:49:07 +0000384 * A person record
385 */
386typedef struct person {
387 char *name;
388 char *email;
389 char *company;
390 char *organisation;
391 char *smail;
392 char *webPage;
393 char *phone;
394} person, *personPtr;
395
396/*
397 * And the code needed to parse it
398 */
399personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
400 personPtr ret = NULL;
401
402DEBUG("parsePerson\n");
403 /*
404 * allocate the struct
405 */
406 ret = (personPtr) malloc(sizeof(person));
407 if (ret == NULL) {
408 fprintf(stderr,"out of memory\n");
Daniel Veillardb05deb71999-08-10 19:04:08 +0000409 return(NULL);
Daniel Veillard14fff061999-06-22 21:49:07 +0000410 }
411 memset(ret, 0, sizeof(person));
412
413 /* We don't care what the top level element name is */
414 cur = cur->childs;
415 while (cur != NULL) {
416 if ((!strcmp(cur->name, "Person")) &amp;&amp; (cur->ns == ns))
Daniel Veillardb05deb71999-08-10 19:04:08 +0000417 ret->name = xmlNodeListGetString(doc, cur->childs, 1);
Daniel Veillard14fff061999-06-22 21:49:07 +0000418 if ((!strcmp(cur->name, "Email")) &amp;&amp; (cur->ns == ns))
Daniel Veillardb05deb71999-08-10 19:04:08 +0000419 ret->email = xmlNodeListGetString(doc, cur->childs, 1);
420 cur = cur->next;
Daniel Veillard14fff061999-06-22 21:49:07 +0000421 }
422
423 return(ret);
Daniel Veillardb05deb71999-08-10 19:04:08 +0000424}</pre>
425
426<p>Here is a couple of things to notice:</p>
Daniel Veillard14fff061999-06-22 21:49:07 +0000427<ul>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000428 <li>Usually a recursive parsing style is the more convenient one, XML data
429 being by nature subject to repetitive constructs and usualy exibit highly
430 stuctured patterns.</li>
431 <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, i.e.
432 the pointer to the global XML document and the namespace reserved to the
433 application. Document wide information are needed for example to decode
434 entities and it's a good coding practice to define a namespace for your
435 application set of data and test that the element and attributes you're
436 analyzing actually pertains to your application space. This is done by a
437 simple equality test (cur->ns == ns).</li>
438 <li>To retrieve text and attributes value, it is suggested to use the
439 function <em>xmlNodeListGetString</em> to gather all the text and entity
440 reference nodes generated by the DOM output and produce an single text
441 string.</li>
Daniel Veillard14fff061999-06-22 21:49:07 +0000442</ul>
Daniel Veillardb05deb71999-08-10 19:04:08 +0000443
444<p>Here is another piece of code used to parse another level of the
445structure:</p>
446<pre>/*
Daniel Veillard14fff061999-06-22 21:49:07 +0000447 * a Description for a Job
448 */
449typedef struct job {
450 char *projectID;
451 char *application;
452 char *category;
453 personPtr contact;
454 int nbDevelopers;
455 personPtr developers[100]; /* using dynamic alloc is left as an exercise */
456} job, *jobPtr;
457
458/*
459 * And the code needed to parse it
460 */
461jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
462 jobPtr ret = NULL;
463
464DEBUG("parseJob\n");
465 /*
466 * allocate the struct
467 */
468 ret = (jobPtr) malloc(sizeof(job));
469 if (ret == NULL) {
470 fprintf(stderr,"out of memory\n");
Daniel Veillardb05deb71999-08-10 19:04:08 +0000471 return(NULL);
Daniel Veillard14fff061999-06-22 21:49:07 +0000472 }
473 memset(ret, 0, sizeof(job));
474
475 /* We don't care what the top level element name is */
476 cur = cur->childs;
477 while (cur != NULL) {
478
479 if ((!strcmp(cur->name, "Project")) &amp;&amp; (cur->ns == ns)) {
Daniel Veillardb05deb71999-08-10 19:04:08 +0000480 ret->projectID = xmlGetProp(cur, "ID");
481 if (ret->projectID == NULL) {
482 fprintf(stderr, "Project has no ID\n");
483 }
484 }
Daniel Veillard14fff061999-06-22 21:49:07 +0000485 if ((!strcmp(cur->name, "Application")) &amp;&amp; (cur->ns == ns))
Daniel Veillardb05deb71999-08-10 19:04:08 +0000486 ret->application = xmlNodeListGetString(doc, cur->childs, 1);
Daniel Veillard14fff061999-06-22 21:49:07 +0000487 if ((!strcmp(cur->name, "Category")) &amp;&amp; (cur->ns == ns))
Daniel Veillardb05deb71999-08-10 19:04:08 +0000488 ret->category = xmlNodeListGetString(doc, cur->childs, 1);
Daniel Veillard14fff061999-06-22 21:49:07 +0000489 if ((!strcmp(cur->name, "Contact")) &amp;&amp; (cur->ns == ns))
Daniel Veillardb05deb71999-08-10 19:04:08 +0000490 ret->contact = parsePerson(doc, ns, cur);
491 cur = cur->next;
Daniel Veillard14fff061999-06-22 21:49:07 +0000492 }
493
494 return(ret);
Daniel Veillardb05deb71999-08-10 19:04:08 +0000495}</pre>
Daniel Veillard14fff061999-06-22 21:49:07 +0000496
Daniel Veillardb05deb71999-08-10 19:04:08 +0000497<p>One can notice that once used to it, writing this kind of code is quite
498simple, but boring. Ultimately, it could be possble to write stubbers taking
499either C data structure definitions, a set of XML examples or an XML DTD and
500produce the code needed to import and export the content between C data and
501XML storage. This is left as an exercise to the reader :-)</p>
502
503<p>Feel free to use <a href="gjobread.c">the code for the full C parsing
504example</a> as a template,</p>
505
506<p> <a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p>
Daniel Veillardccb09631998-10-27 06:21:04 +0000507</body>
508</html>