Big changes, seems that 1.2.0 wasn't commited, here is 1.3.0, Daniel
diff --git a/doc/xml.html b/doc/xml.html
index 5500349..03abc15 100644
--- a/doc/xml.html
+++ b/doc/xml.html
@@ -20,7 +20,7 @@
XML is a standard for markup based structured documents, here is <a
name="example">an example</a>:</p>
<pre><?xml version="1.0"?>
-<EXAMPLE prop1="gnome is great" prop2="&linux; too">
+<EXAMPLE prop1="gnome is great" prop2="&amp; linux too">
<head>
<title>Welcome to Gnome</title>
</head>
@@ -36,7 +36,7 @@
about it's encoding. Then the document is a text format whose structure is
specified by tags between brackets. <strong>Each tag opened have to be
closed</strong> XML is pedantic about this, not that for example the image
-tage has no content (just an attribute) and is closed by ending up the tag
+tag has no content (just an attribute) and is closed by ending up the tag
with <code>/></code>.</p>
<h2>The tree output</h2>
@@ -285,7 +285,213 @@
<p>
This should help greatly doing things like modifying a gnumeric spreadsheet
embedded in a GWP document for example.</p>
+
+<h3><a name="Example">A real example</a></h3>
<p>
+Here is a real size example, where the actual content of the application data
+is not kept in the DOM tree but uses internal structures. It is based on
+a proposal to keep a database of jobs related to Gnome, with an XML based
+storage structure. Here is an <a href="gjobs.xml">XML encoded jobs base</a>:
+<pre>
+<?xml version="1.0"?>
+<gjob:Helping xmlns:gjob="http://www.gnome.org/some-location">
+ <gjob:Jobs>
+
+ <gjob:Job>
+ <gjob:Project ID="3"/>
+ <gjob:Application>GBackup</gjob:Application>
+ <gjob:Category>Development</gjob:Category>
+
+ <gjob:Update>
+ <gjob:Status>Open</gjob:Status>
+ <gjob:Modified>Mon, 07 Jun 1999 20:27:45 -0400 MET DST</gjob:Modified>
+ <gjob:Salary>USD 0.00</gjob:Salary>
+ </gjob:Update>
+
+ <gjob:Developers>
+ <gjob:Developer>
+ </gjob:Developer>
+ </gjob:Developers>
+
+ <gjob:Contact>
+ <gjob:Person>Nathan Clemons</gjob:Person>
+ <gjob:Email>nathan@windsofstorm.net</gjob:Email>
+ <gjob:Company>
+ </gjob:Company>
+ <gjob:Organisation>
+ </gjob:Organisation>
+ <gjob:Webpage>
+ </gjob:Webpage>
+ <gjob:Snailmail>
+ </gjob:Snailmail>
+ <gjob:Phone>
+ </gjob:Phone>
+ </gjob:Contact>
+
+ <gjob:Requirements>
+ The program should be released as free software, under the GPL.
+ </gjob:Requirements>
+
+ <gjob:Skills>
+ </gjob:Skills>
+
+ <gjob:Details>
+ A GNOME based system that will allow a superuser to configure
+ compressed and uncompressed files and/or file systems to be backed
+ up with a supported media in the system. This should be able to
+ perform via find commands generating a list of files that are passed
+ to tar, dd, cpio, cp, gzip, etc., to be directed to the tape machine
+ or via operations performed on the filesystem itself. Email
+ notification and GUI status display very important.
+ </gjob:Details>
+
+ </gjob:Job>
+
+ </gjob:Jobs>
+</gjob:Helping>
+
+</pre>
+<p>
+While loading the XML file into an internal DOM tree is a matter of calling
+only a couple of functions, browsing the tree to gather the informations
+and generate the internals structures is harder, and more error prone.
</p>
+<p>
+The suggested principle is to be tolerant with respect to the input
+structure. For example the ordering of the attributes is not significant,
+Cthe XML specification is clear about it. It's also usually a good idea
+to not be dependant of the orders of the childs of a given node, unless it
+really makes things harder. Here is some code to parse the informations
+for a person:
+</p>
+<pre>
+/*
+ * A person record
+ */
+typedef struct person {
+ char *name;
+ char *email;
+ char *company;
+ char *organisation;
+ char *smail;
+ char *webPage;
+ char *phone;
+} person, *personPtr;
+
+/*
+ * And the code needed to parse it
+ */
+personPtr parsePerson(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
+ personPtr ret = NULL;
+
+DEBUG("parsePerson\n");
+ /*
+ * allocate the struct
+ */
+ ret = (personPtr) malloc(sizeof(person));
+ if (ret == NULL) {
+ fprintf(stderr,"out of memory\n");
+ return(NULL);
+ }
+ memset(ret, 0, sizeof(person));
+
+ /* We don't care what the top level element name is */
+ cur = cur->childs;
+ while (cur != NULL) {
+ if ((!strcmp(cur->name, "Person")) && (cur->ns == ns))
+ ret->name = xmlNodeListGetString(doc, cur->childs, 1);
+ if ((!strcmp(cur->name, "Email")) && (cur->ns == ns))
+ ret->email = xmlNodeListGetString(doc, cur->childs, 1);
+ cur = cur->next;
+ }
+
+ return(ret);
+}
+</pre>
+<p>
+Here is a couple of things to notice:</p>
+<ul>
+<li> Usually a recursive parsing style is the more convenient one,
+XML data being by nature subject to repetitive constructs and usualy exibit
+highly stuctured patterns.
+<li> The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, i.e.
+the pointer to the global XML document and the namespace reserved to the
+application. Document wide information are needed for example to decode
+entities and it's a good coding practice to define a namespace for your
+application set of data and test that the element and attributes you're
+analyzing actually pertains to your application space. This is done by a simple
+equality test (cur->ns == ns).
+<li> To retrieve text and attributes value, it is suggested to use
+the function <em>xmlNodeListGetString</em> to gather all the text and
+entity reference nodes generated by the DOM output and produce an
+single text string.
+</ul>
+<p>
+Here is another piece of code used to parse another level of the structure:
+</p>
+<pre>
+/*
+ * a Description for a Job
+ */
+typedef struct job {
+ char *projectID;
+ char *application;
+ char *category;
+ personPtr contact;
+ int nbDevelopers;
+ personPtr developers[100]; /* using dynamic alloc is left as an exercise */
+} job, *jobPtr;
+
+/*
+ * And the code needed to parse it
+ */
+jobPtr parseJob(xmlDocPtr doc, xmlNsPtr ns, xmlNodePtr cur) {
+ jobPtr ret = NULL;
+
+DEBUG("parseJob\n");
+ /*
+ * allocate the struct
+ */
+ ret = (jobPtr) malloc(sizeof(job));
+ if (ret == NULL) {
+ fprintf(stderr,"out of memory\n");
+ return(NULL);
+ }
+ memset(ret, 0, sizeof(job));
+
+ /* We don't care what the top level element name is */
+ cur = cur->childs;
+ while (cur != NULL) {
+
+ if ((!strcmp(cur->name, "Project")) && (cur->ns == ns)) {
+ ret->projectID = xmlGetProp(cur, "ID");
+ if (ret->projectID == NULL) {
+ fprintf(stderr, "Project has no ID\n");
+ }
+ }
+ if ((!strcmp(cur->name, "Application")) && (cur->ns == ns))
+ ret->application = xmlNodeListGetString(doc, cur->childs, 1);
+ if ((!strcmp(cur->name, "Category")) && (cur->ns == ns))
+ ret->category = xmlNodeListGetString(doc, cur->childs, 1);
+ if ((!strcmp(cur->name, "Contact")) && (cur->ns == ns))
+ ret->contact = parsePerson(doc, ns, cur);
+ cur = cur->next;
+ }
+
+ return(ret);
+}
+</pre>
+<p>
+One can notice that once used to it, writing this kind of code
+is quite simple, but boring. Ultimately, it could be possble to write
+stubbers taking either C data structure definitions, a set of XML examples
+or an XML DTD and produce the code needed to import and export the
+content between C data and XML storage. This is left as an exercise to
+the reader :-)</p>
+<p>
+Feel free to use <a href="gjobread.c">the code for the full C parsing
+example</a> as a template,
+
+<a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a>
</body>
</html>