Finished reintegrating the docs and unifying the look, may still
need a couple of pointers but looks fine now. valid.html si now
merged in xmldtd.html. Overall cleanup, Daniel
diff --git a/doc/xml.html b/doc/xml.html
index 19f3de8..de3f404 100644
--- a/doc/xml.html
+++ b/doc/xml.html
@@ -35,14 +35,6 @@
<p>Separate documents:</p>
<ul>
- <li><a href="upgrade.html">upgrade instructions for migrating to
- libxml2</a></li>
- <li><a href="encoding.html">libxml Internationalization support</a></li>
- <li><a href="xmlio.html">libxml Input/Output interfaces</a></li>
- <li><a href="xmlmem.html">libxml Memory interfaces</a></li>
- <li><a href="catalog.html">libxml Catalog support</a></li>
- <li><a href="xmldtd.html">a short introduction about DTDs and
- libxml</a></li>
<li><a href="http://xmlsoft.org/XSLT/">the libxslt page</a></li>
<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">the gdome2 page: a
standard DOM interface for libxml2</a></li>
@@ -89,6 +81,277 @@
style="background-color: #FF0000">Do Not Use libxml1</span></strong>, use
libxml2</p>
+<h2><a name="FAQ">FAQ</a></h2>
+
+<p>Table of Content:</p>
+<ul>
+ <li><a href="FAQ.html#Licence">Licence(s)</a></li>
+ <li><a href="FAQ.html#Installati">Installation</a></li>
+ <li><a href="FAQ.html#Compilatio">Compilation</a></li>
+ <li><a href="FAQ.html#Developer">Developer corner</a></li>
+</ul>
+
+<h3><a name="Licence">Licence</a>(s)</h3>
+<ol>
+ <li><em>Licensing Terms for libxml</em>
+ <p>libxml is released under 2 (compatible) licences:</p>
+ <ul>
+ <li>the <a href="http://www.gnu.org/copyleft/lgpl.html">LGPL</a>: GNU
+ Library General Public License</li>
+ <li>the <a
+ href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C
+ IPR</a>: very similar to the XWindow licence</li>
+ </ul>
+ </li>
+ <li><em>Can I embed libxml in a proprietary application ?</em>
+ <p>Yes. The W3C IPR allows you to also keep proprietary the changes you
+ made to libxml, but it would be graceful to provide back bugfixes and
+ improvements as patches for possible incorporation in the main
+ development tree</p>
+ </li>
+</ol>
+
+<h3><a name="Installati">Installation</a></h3>
+<ol>
+ <li>Unless you are forced to because your application links with a Gnome
+ library requiring it, <strong><span style="background-color: #FF0000">Do
+ Not Use libxml1</span></strong>, use libxml2</li>
+ <li><em>Where can I get libxml</em>
+ ?
+ <p>The original distribution comes from <a
+ href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a
+ href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p>
+ <p>Most linux and Bsd distribution includes libxml, this is probably the
+ safer way for end-users</p>
+ <p>David Doolin provides precompiled Windows versions at <a
+ href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/ ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p>
+ </li>
+ <li><em>I see libxml and libxml2 releases, which one should I install ?</em>
+ <ul>
+ <li>If you are not concerned by any existing backward compatibility
+ with existing application, install libxml2 only</li>
+ <li>If you are not doing development, you can safely install both.
+ usually the packages <a
+ href="http://rpmfind.net/linux/RPM/libxml.html">libxml</a> and <a
+ href="http://rpmfind.net/linux/RPM/libxml2.html">libxml2</a> are
+ compatible (this is not the case for development packages)</li>
+ <li>If you are a developer and your system provides separate packaging
+ for shared libraries and the development components, it is possible
+ to install libxml and libxml2, and also <a
+ href="http://rpmfind.net/linux/RPM/libxml-devel.html">libxml-devel</a>
+ and <a
+ href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml2-devel</a>
+ too for libxml2 >= 2.3.0</li>
+ <li>If you are developing a new application, please develop against
+ libxml2(-devel)</li>
+ </ul>
+ </li>
+ <li><em>I can't install the libxml package it conflicts with libxml0</em>
+ <p>You probably have an old libxml0 package used to provide the shared
+ library for libxml.so.0, you can probably safely remove it. Anyway the
+ libxml packages provided on <a
+ href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> provides
+ libxml.so.0</p>
+ </li>
+ <li><em>I can't install the libxml(2) RPM package due to failed
+ dependancies</em>
+ <p>The most generic solution is to refetch the latest src.rpm , and
+ rebuild it locally with</p>
+ <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p>
+ <p>if everything goes well it will generate two binary rpm (one providing
+ the shared libs and xmllint, and the other one, the -devel package
+ providing includes, static libraries and scripts needed to build
+ applications with libxml(2)) that you can install locally.</p>
+ </li>
+</ol>
+
+<h3><a name="Compilatio">Compilation</a></h3>
+<ol>
+ <li><em>What is the process to compile libxml ?</em>
+ <p>As most UNIX libraries libxml follows the "standard":</p>
+ <p><code>gunzip -c xxx.tar.gz | tar xvf -</code></p>
+ <p><code>cd libxml-xxxx</code></p>
+ <p><code>./configure --help</code></p>
+ <p>to see the options, then the compilation/installation proper</p>
+ <p><code>./configure [possible options]</code></p>
+ <p><code>make</code></p>
+ <p><code>make install</code></p>
+ <p>At that point you may have to rerun ldconfig or similar utility to
+ update your list of installed shared libs.</p>
+ </li>
+ <li><em>What other libraries are needed to compile/install libxml ?</em>
+ <p>Libxml does not requires any other library, the normal C ANSI API
+ should be sufficient (please report any violation to this rule you may
+ find).</p>
+ <p>However if found at configuration time libxml will detect and use the
+ following libs:</p>
+ <ul>
+ <li><a href="http://www.info-zip.org/pub/infozip/zlib/">libz</a>
+ : a highly portable and available widely compression library</li>
+ <li>iconv: a powerful character encoding conversion library. It's
+ included by default on recent glibc libraries, so it doesn't need to
+ be installed specifically on linux. It seems it's now <a
+ href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part
+ of the official UNIX</a> specification. Here is one <a
+ href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation
+ of the library</a> which source can be found <a
+ href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li>
+ </ul>
+ </li>
+ <li><em>libxml does not compile with HP-UX's optional ANSI-C compiler</em>
+ <p>this is due to macro limitations. Try to add " -Wp,-H16800 -Ae" to the
+ CFLAGS</p>
+ <p>you can also install and use gcc instead or use a precompiled version
+ of libxml, both available from the <a
+ href="http://hpux.cae.wisc.edu/hppd/auto/summary_all.html">HP-UX Porting
+ and Archive Centre</a></p>
+ </li>
+ <li><em>make check fails on some platforms</em>
+ <p>Sometime the regression tests results don't completely match the value
+ produced by the parser, and the makefile uses diff to print the delta. On
+ some platforms the diff return breaks the compilation process, if the
+ diff is small this is probably not a serious problem</p>
+ </li>
+ <li><em>I use the CVS version and there is no configure script</em>
+ <p>The configure (and other Makefiles) are generated. Use the autogen.sh
+ script to regenerate the configure and Makefiles, like:</p>
+ <p><code>./autogen.sh --prefix=/usr --disable-shared</code></p>
+ </li>
+ <li><em>I have troubles when running make tests with gcc-3.0</em>
+ <p>It seems the initial release of gcc-3.0 has a problem with the
+ optimizer which miscompiles the URI module. Please use another
+ compiler</p>
+ </li>
+</ol>
+
+<h3><a name="Developer">Developer</a> corner</h3>
+<ol>
+ <li><em>xmlDocDump() generates output on one line</em>
+ <p>libxml will not <strong>invent</strong> spaces in the content of a
+ document since <strong>all spaces in the content of a document are
+ significant</strong>. If you build a tree from the API and want
+ indentation:</p>
+ <ol>
+ <li>the correct way is to generate those yourself too</li>
+ <li>the dangerous way is to ask libxml to add those blanks to your
+ content <strong>modifying the content of your document in the
+ process</strong>. The result may not be what you expect. There is
+ <strong>NO</strong> way to guarantee that such a modification won't
+ impact other part of the content of your document. See <a
+ href="http://xmlsoft.org/html/libxml-parser.html#XMLKEEPBLANKSDEFAULT">xmlKeepBlanksDefault
+ ()</a> and <a
+ href="http://xmlsoft.org/html/libxml-tree.html#XMLSAVEFORMATFILE">xmlSaveFormatFile
+ ()</a></li>
+ </ol>
+ </li>
+ <li>Extra nodes in the document:
+ <p><em>For a XML file as below:</em></p>
+ <pre><?xml version="1.0"?>
+<PLAN xmlns="http://www.argus.ca/autotest/1.0/">
+<NODE CommFlag="0"/>
+<NODE CommFlag="1"/>
+</PLAN></pre>
+ <p><em>after parsing it with the function
+ pxmlDoc=xmlParseFile(...);</em></p>
+ <p><em>I want to the get the content of the first node (node with the
+ CommFlag="0")</em></p>
+ <p><em>so I did it as following;</em></p>
+ <pre>xmlNodePtr pode;
+pnode=pxmlDoc->children->children;</pre>
+ <p><em>but it does not work. If I change it to</em></p>
+ <pre>pnode=pxmlDoc->children->children->next;</pre>
+ <p><em>then it works. Can someone explain it to me.</em></p>
+ <p></p>
+ <p>In XML all characters in the content of the document are significant
+ <strong>including blanks and formatting line breaks</strong>.</p>
+ <p>The extra nodes you are wondering about are just that, text nodes with
+ the formatting spaces wich are part of the document but that people tend
+ to forget. There is a function <a
+ href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault
+ ()</a> to remove those at parse time, but that's an heuristic, and its
+ use should be limited to case where you are sure there is no
+ mixed-content in the document.</p>
+ </li>
+ <li><em>I get compilation errors of existing code like when accessing
+ <strong>root</strong> or <strong>childs fields</strong> of nodes</em>
+ <p>You are compiling code developed for libxml version 1 and using a
+ libxml2 development environment. Either switch back to libxml v1 devel or
+ even better fix the code to compile with libxml2 (or both) by <a
+ href="upgrade.html">following the instructions</a>.</p>
+ </li>
+ <li><em>I get compilation errors about non existing
+ <strong>xmlRootNode</strong> or <strong>xmlChildrenNode</strong>
+ fields</em>
+ <p>The source code you are using has been <a
+ href="upgrade.html">upgraded</a> to be able to compile with both libxml
+ and libxml2, but you need to install a more recent version:
+ libxml(-devel) >= 1.8.8 or libxml2(-devel) >= 2.1.0</p>
+ </li>
+ <li><em>XPath implementation looks seriously broken</em>
+ <p>XPath implementation prior to 2.3.0 was really incomplete, upgrade to
+ a recent version, the implementation and debug of libxslt generated fixes
+ for most obvious problems.</p>
+ </li>
+ <li><em>The example provided in the web page does not compile</em>
+ <p>It's hard to maintain the documentation in sync with the code
+ <grin/> ...</p>
+ <p>Check the previous points 1/ and 2/ raised before, and send
+ patches.</p>
+ </li>
+ <li><em>Where can I get more examples and informations than in the web
+ page</em>
+ <p>Ideally a libxml book would be nice. I have no such plan ... But you
+ can:</p>
+ <ul>
+ <li>check more deeply the <a href="html/libxml-lib.html">existing
+ generated doc</a></li>
+ <li>looks for examples of use for libxml function using the Gnome code
+ for example the following will query the full Gnome CVs base for the
+ use of the <strong>xmlAddChild()</strong> function:
+ <p><a
+ href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p>
+ <p>This may be slow, a large hardware donation to the gnome project
+ could cure this :-)</p>
+ </li>
+ <li><a
+ href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=gnome-xml">Browse
+ the libxml source</a>
+ , I try to write code as clean and documented as possible, so
+ looking at it may be helpful</li>
+ </ul>
+ </li>
+ <li>What about C++ ?
+ <p>libxml is written in pure C in order to allow easy reuse on a number
+ of platforms, including embedded systems. I don't intend to convert to
+ C++.</p>
+ <p>There is however a C++ wrapper provided by Ari Johnson
+ <ari@btigate.com> which may fullfill your needs:</p>
+ <p>Website: <a
+ href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p>
+ <p>Download: <a
+ href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p>
+ </li>
+ <li>How to validate a document a posteriori ?
+ <p>It is possible to validate documents which had not been validated at
+ initial parsing time or documents who have been built from scratch using
+ the API. Use the <a
+ href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a>
+ function. It is also possible to simply add a Dtd to an existing
+ document:</p>
+ <pre>xmlDocPtr doc; /* your existing document */
+ xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
+ dtd->name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */
+
+ doc->intSubset = dtd;
+ if (doc->children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd);
+ else xmlAddPrevSibling(doc->children, (xmlNodePtr)dtd);
+ </pre>
+ </li>
+ <li>etc ...</li>
+</ol>
+
+<p></p>
+
<h2><a name="Documentat">Documentation</a></h2>
<p>There are some on-line resources about using libxml:</p>
@@ -909,7 +1172,7 @@
supported and the progresses on the <a
href="http://cvs.gnome.org/lxr/source/libxslt/ChangeLog">Changelog</a></p>
-<h2><a name="architecture">An overview of libxml architecture</a></h2>
+<h2><a name="architecture">libxml architecture</a></h2>
<p>Libxml is made of multiple components; some of them are optional, and most
of the block interfaces are public. The main components are:</p>
@@ -1051,7 +1314,1170 @@
a set of registered default callbacks, without internal specific
interface.</p>
-<h2><a name="library">The XML library interfaces</a></h2>
+<h2><a name="Validation">Validation & DTDs</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="#General5">General overview</a></li>
+ <li><a href="#definition">The definition</a></li>
+ <li><a href="#Simple">Simple rules</a>
+ <ol>
+ <li><a href="#reference">How to reference a DTD from a
+ document</a></li>
+ <li><a href="#Declaring">Declaring elements</a></li>
+ <li><a href="#Declaring1">Declaring attributes</a></li>
+ </ol>
+ </li>
+ <li><a href="#Some">Some examples</a></li>
+ <li><a href="#validate">How to validate</a></li>
+ <li><a href="#Other">Other resources</a></li>
+</ol>
+
+<h3><a name="General5">General overview</a></h3>
+
+<p>Well what is validation and what is a DTD ?</p>
+
+<p>DTD is the acronym for Document Type Definition. This is a description of
+the content for a familly of XML files. This is part of the XML 1.0
+specification, and alows to describe and check that a given document instance
+conforms to a set of rules detailing its structure and content.</p>
+
+<p>Validation is the process of checking a document against a DTD (more
+generally against a set of construction rules).</p>
+
+<p>The validation process and building DTDs are the two most difficult parts
+of the XML life cycle. Briefly a DTD defines all the possibles element to be
+found within your document, what is the formal shape of your document tree
+(by defining the allowed content of an element, either text, a regular
+expression for the allowed list of children, or mixed content i.e. both text
+and children). The DTD also defines the allowed attributes for all elements
+and the types of the attributes.</p>
+
+<h3><a name="definition1">The definition</a></h3>
+
+<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a
+href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
+Rev1</a>):</p>
+<ul>
+ <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
+ elements</a></li>
+ <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
+ attributes</a></li>
+</ul>
+
+<p>(unfortunately) all this is inherited from the SGML world, the syntax is
+ancient...</p>
+
+<h3><a name="Simple1">Simple rules</a></h3>
+
+<p>Writing DTD can be done in multiple ways, the rules to build them if you
+need something fixed or something which can evolve over time can be radically
+different. Really complex DTD like Docbook ones are flexible but quite harder
+to design. I will just focuse on DTDs for a formats with a fixed simple
+structure. It is just a set of basic rules, and definitely not exhaustive nor
+useable for complex DTD design.</p>
+
+<h4><a name="reference1">How to reference a DTD from a document</a>:</h4>
+
+<p>Assuming the top element of the document is <code>spec</code> and the dtd
+is placed in the file <code>mydtd</code> in the subdirectory
+<code>dtds</code> of the directory from where the document were loaded:</p>
+
+<p><code><!DOCTYPE spec SYSTEM "dtds/mydtd"></code></p>
+
+<p>Notes:</p>
+<ul>
+ <li>the system string is actually an URI-Reference (as defined in <a
+ href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a
+ full URL string indicating the location of your DTD on the Web, this is a
+ really good thing to do if you want others to validate your document</li>
+ <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
+ magic string) so that the DTd is looked up in catalogs on the client side
+ without having to locate it on the web</li>
+ <li>a dtd contains a set of elements and attributes declarations, but they
+ don't define what the root of the document should be. This is explicitely
+ told to the parser/validator as the first element of the
+ <code>DOCTYPE</code> declaration.</li>
+</ul>
+
+<h4><a name="Declaring2">Declaring elements</a>:</h4>
+
+<p>The following declares an element <code>spec</code>:</p>
+
+<p><code><!ELEMENT spec (front, body, back?)></code></p>
+
+<p>it also expresses that the spec element contains one <code>front</code>,
+one <code>body</code> and one optionnal <code>back</code> children elements
+in this order. The declaration of one element of the structure and its
+content are done in a single declaration. Similary the following declares
+<code>div1</code> elements:</p>
+
+<p><code><!ELEMENT div1 (head, (p | list | note)*, div2*)></code></p>
+
+<p>means div1 contains one <code>head</code> then a series of optional
+<code>p</code>, <code>list</code>s and <code>note</code>s and then an
+optional <code>div2</code>. And last but not least an element can contain
+text:</p>
+
+<p><code><!ELEMENT b (#PCDATA)></code></p>
+
+<p><code>b</code> contains text or being of mixed content (text and elements
+in no particular order):</p>
+
+<p><code><!ELEMENT p (#PCDATA|a|ul|b|i|em)*></code></p>
+
+<p><code>p </code>can contain text or <code>a</code>, <code>ul</code>,
+<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
+order.</p>
+
+<h4><a name="Declaring1">Declaring attributes</a>:</h4>
+
+<p>again the attributes declaration includes their content definition:</p>
+
+<p><code><!ATTLIST termdef name CDATA #IMPLIED></code></p>
+
+<p>means that the element <code>termdef</code> can have a <code>name</code>
+attribute containing text (<code>CDATA</code>) and which is optionnal
+(<code>#IMPLIED</code>). The attribute value can also be defined within a
+set:</p>
+
+<p><code><!ATTLIST list type (bullets|ordered|glossary)
+"ordered"></code></p>
+
+<p>means <code>list</code> element have a <code>type</code> attribute with 3
+allowed values "bullets", "ordered" or "glossary" and which default to
+"ordered" if the attribute is not explicitely specified.</p>
+
+<p>The content type of an attribute can be text (<code>CDATA</code>),
+anchor/reference/references
+(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
+(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
+(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
+<code>chapter</code> element can have an optional <code>id</code> attribute
+of type <code>ID</code>, usable for reference from attribute of type
+IDREF:</p>
+
+<p><code><!ATTLIST chapter id ID #IMPLIED></code></p>
+
+<p>The last value of an attribute definition can be <code>#REQUIRED
+</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
+meaning that it is optional, or the default value (possibly prefixed by
+<code>#FIXED</code> if it is the only allowed).</p>
+
+<p>Notes:</p>
+<ul>
+ <li>usually the attributes pertaining to a given element are declared in a
+ single expression, but it is just a convention adopted by a lot of DTD
+ writers:
+ <pre><!ATTLIST termdef
+ id ID #REQUIRED
+ name CDATA #IMPLIED></pre>
+ <p>The previous construct defines both <code>id</code> and
+ <code>name</code> attributes for the element <code>termdef</code></p>
+ </li>
+</ul>
+
+<h3><a name="Some1">Some examples</a></h3>
+
+<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
+contains some complex DTD examples. The <code>test/valid/dia.xml</code>
+example shows an XML file where the simple DTD is directly included within
+the document.</p>
+
+<h3><a name="validate1">How to validate</a></h3>
+
+<p>The simplest is to use the xmllint program comming with libxml. The
+<code>--valid</code> option turn on validation of the files given as input,
+for example the following validates a copy of the first revision of the XML
+1.0 specification:</p>
+
+<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
+
+<p>the -- noout is used to not output the resulting tree.</p>
+
+<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against
+a given DTD.</p>
+
+<p>Libxml exports an API to handle DTDs and validation, check the <a
+href="http://xmlsoft.org/html/libxml-valid.html">associated
+description</a>.</p>
+
+<h3><a name="Other1">Other resources</a></h3>
+
+<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
+will just list one for now, others pointers welcome:</p>
+<ul>
+ <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li>
+</ul>
+
+<p>I suggest looking at the examples found under test/valid/dtd and any of
+the large number of books available on XML. The dia example in test/valid
+should be both simple and complete enough to allow you to build your own.</p>
+
+<p></p>
+
+<h2><a name="Memory">Memory Management</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="#General3">General overview</a></li>
+ <li><a href="#setting">Setting libxml set of memory
+ routines</a></li>
+ <li><a href="#cleanup">Cleaning up after parsing</a></li>
+ <li><a href="#Debugging">Debugging routines</a></li>
+ <li><a href="#General4">General memory requirements</a></li>
+</ol>
+
+<h3><a name="General3">General overview</a></h3>
+
+<p>The module <code><a
+href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlmemory.h</a></code>
+provides the interfaces to the libxml memory system:</p>
+<ul>
+ <li>libxml does not use the libc memory allocator directly but xmlFree(),
+ xmlMalloc() and xmlRealloc()</li>
+ <li>those routines can be reallocated to a specific set of routine, by
+ default the libc ones i.e. free(), malloc() and realloc()</li>
+ <li>the xmlmemory.c module includes a set of debugging routine</li>
+</ul>
+
+<h3><a name="setting">Setting libxml set of memory routines</a></h3>
+
+<p>It is sometimes useful to not use the default memory allocator, either for
+debugging, analysis or to implement a specific behaviour on memory management
+(like on embedded systems). Two function calls are available to do so:</p>
+<ul>
+ <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemGet ()</a>
+ which return the current set of functions in use by the parser</li>
+ <li><a
+ href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemSetup()</a>
+ which allow to set up a new set of memory allocation functions</li>
+</ul>
+
+<p>Of course a call to xmlMemSetup() should probably be done before calling
+any other libxml routines (unless you are sure your allocations routines are
+compatibles).</p>
+
+<h3><a name="cleanup">Cleaning up after parsing</a></h3>
+
+<p>Libxml is not stateless, there is a few set of memory structures needing
+allocation before the parser is fully functionnal (some encoding structures
+for example). This also mean that once parsing is finished there is a tiny
+amount of memory (a few hundred bytes) which can be recollected if you don't
+reuse the parser immediately:</p>
+<ul>
+ <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlCleanupParser
+ ()</a>
+ is a centralized routine to free the parsing states. Note that it won't
+ deallocate any produced tree if any (use the xmlFreeDoc() and related
+ routines for this).</li>
+ <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlInitParser
+ ()</a>
+ is the dual routine allowing to preallocate the parsing state which can
+ be useful for example to avoid initialization reentrancy problems when
+ using libxml in multithreaded applications</li>
+</ul>
+
+<p>Generally xmlCleanupParser() is safe, if needed the state will be rebuild
+at the next invocation of parser routines, but be careful of the consequences
+in multithreaded applications.</p>
+
+<h3><a name="Debugging">Debugging routines</a></h3>
+
+<p>When configured using --with-mem-debug flag (off by default), libxml uses
+a set of memory allocation debugging routineskeeping track of all allocated
+blocks and the location in the code where the routine was called. A couple of
+other debugging routines allow to dump the memory allocated infos to a file
+or call a specific routine when a given block number is allocated:</p>
+<ul>
+ <li><a
+ href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMallocLoc()</a>
+ <a
+ href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlReallocLoc()</a>
+ and <a
+ href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemStrdupLoc()</a>
+ are the memory debugging replacement allocation routines</li>
+ <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemoryDump
+ ()</a>
+ dumps all the informations about the allocated memory block lefts in the
+ <code>.memdump</code> file</li>
+</ul>
+
+<p>When developping libxml memory debug is enabled, the tests programs call
+xmlMemoryDump () and the "make test" regression tests will check for any
+memory leak during the full regression test sequence, this helps a lot
+ensuring that libxml does not leak memory and bullet proof memory
+allocations use (some libc implementations are known to be far too permissive
+resulting in major portability problems!).</p>
+
+<p>If the .memdump reports a leak, it displays the allocation function and
+also tries to give some informations about the content and structure of the
+allocated blocks left. This is sufficient in most cases to find the culprit,
+but not always. Assuming the allocation problem is reproductible, it is
+possible to find more easilly:</p>
+<ol>
+ <li>write down the block number xxxx not allocated</li>
+ <li>export the environement variable XML_MEM_BREAKPOINT=xxxx</li>
+ <li>run the program under a debugger and set a breakpoint on
+ xmlMallocBreakpoint() a specific function called when this precise block
+ is allocated</li>
+ <li>when the breakpoint is reached you can then do a fine analysis of the
+ allocation an step to see the condition resulting in the missing
+ deallocation.</li>
+</ol>
+
+<p>I used to use a commercial tool to debug libxml memory problems but after
+noticing that it was not detecting memory leaks that simple mechanism was
+used and proved extremely efficient until now.</p>
+
+<h3><a name="General4">General memory requirements</a></h3>
+
+<p>How much libxml memory require ? It's hard to tell in average it depends
+of a number of things:</p>
+<ul>
+ <li>the parser itself should work in a fixed amout of memory, except for
+ information maintained about the stacks of names and entities locations.
+ The I/O and encoding handlers will probably account for a few KBytes.
+ This is true for both the XML and HTML parser (though the HTML parser
+ need more state).</li>
+ <li>If you are generating the DOM tree then memory requirements will grow
+ nearly lineary with the size of the data. In general for a balanced
+ textual document the internal memory requirement is about 4 times the
+ size of the UTF8 serialization of this document (exmple the XML-1.0
+ recommendation is a bit more of 150KBytes and takes 650KBytes of main
+ memory when parsed). Validation will add a amount of memory required for
+ maintaining the external Dtd state which should be linear with the
+ complexity of the content model defined by the Dtd</li>
+ <li>If you don't care about the advanced features of libxml like
+ validation, DOM, XPath or XPointer, but really need to work fixed memory
+ requirements, then the SAX interface should be used.</li>
+</ul>
+
+<p></p>
+
+<h2><a name="Encodings">Encodings support</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="encoding.html#What">What does internationalization support
+ mean ?</a></li>
+ <li><a href="encoding.html#internal">The internal encoding, how and
+ why</a></li>
+ <li><a href="encoding.html#implemente">How is it implemented ?</a></li>
+ <li><a href="encoding.html#Default">Default supported encodings</a></li>
+ <li><a href="encoding.html#extend">How to extend the existing
+ support</a></li>
+</ol>
+
+<h3><a name="What">What does internationalization support mean ?</a></h3>
+
+<p>XML was designed from the start to allow the support of any character set
+by using Unicode. Any conformant XML parser has to support the UTF-8 and
+UTF-16 default encodings which can both express the full unicode ranges. UTF8
+is a variable length encoding whose greatest point are to resuse the same
+emcoding for ASCII and to save space for Western encodings, but it is a bit
+more complex to handle in practice. UTF-16 use 2 bytes per characters (and
+sometimes combines two pairs), it makes implementation easier, but looks a
+bit overkill for Western languages encoding. Moreover the XML specification
+allows document to be encoded in other encodings at the condition that they
+are clearly labelled as such. For example the following is a wellformed XML
+document encoded in ISO-8859 1 and using accentuated letter that we French
+likes for both markup and content:</p>
+<pre><?xml version="1.0" encoding="ISO-8859-1"?>
+<très>là</très></pre>
+
+<p>Having internationalization support in libxml means the foolowing:</p>
+<ul>
+ <li>the document is properly parsed</li>
+ <li>informations about it's encoding are saved</li>
+ <li>it can be modified</li>
+ <li>it can be saved in its original encoding</li>
+ <li>it can also be saved in another encoding supported by libxml (for
+ example straight UTF8 or even an ASCII form)</li>
+</ul>
+
+<p>Another very important point is that the whole libxml API, with the
+exception of a few routines to read with a specific encoding or save to a
+specific encoding, is completely agnostic about the original encoding of the
+document.</p>
+
+<p>It should be noted too that the HTML parser embedded in libxml now obbey
+the same rules too, the following document will be (as of 2.2.2) handled in
+an internationalized fashion by libxml too:</p>
+<pre><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
+ "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html lang="fr">
+<head>
+ <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
+</head>
+<body>
+<p>W3C crée des standards pour le Web.</body>
+</html></pre>
+
+<h3><a name="internal">The internal encoding, how and why</a></h3>
+
+<p>One of the core decision was to force all documents to be converted to a
+default internal encoding, and that encoding to be UTF-8, here are the
+rationale for those choices:</p>
+<ul>
+ <li>keeping the native encoding in the internal form would force the libxml
+ users (or the code associated) to be fully aware of the encoding of the
+ original document, for examples when adding a text node to a document,
+ the content would have to be provided in the document encoding, i.e. the
+ client code would have to check it before hand, make sure it's conformant
+ to the encoding, etc ... Very hard in practice, though in some specific
+ cases this may make sense.</li>
+ <li>the second decision was which encoding. From the XML spec only UTF8 and
+ UTF16 really makes sense as being the two only encodings for which there
+ is amndatory support. UCS-4 (32 bits fixed size encoding) could be
+ considered an intelligent choice too since it's a direct Unicode mapping
+ support. I selected UTF-8 on the basis of efficiency and compatibility
+ with surrounding software:
+ <ul>
+ <li>UTF-8 while a bit more complex to convert from/to (i.e. slightly
+ more costly to import and export CPU wise) is also far more compact
+ than UTF-16 (and UCS-4) for a majority of the documents I see it used
+ for right now (RPM RDF catalogs, advogato data, various configuration
+ file formats, etc.) and the key point for today's computer
+ architecture is efficient uses of caches. If one nearly double the
+ memory requirement to store the same amount of data, this will trash
+ caches (main memory/external caches/internal caches) and my take is
+ that this harms the system far more than the CPU requirements needed
+ for the conversion to UTF-8</li>
+ <li>Most of libxml version 1 users were using it with straight ASCII
+ most of the time, doing the conversion with an internal encoding
+ requiring all their code to be rewritten was a serious show-stopper
+ for using UTF-16 or UCS-4.</li>
+ <li>UTF-8 is being used as the de-facto internal encoding standard for
+ related code like the <a href="http://www.pango.org/">pango</a>
+ upcoming Gnome text widget, and a lot of Unix code (yep another place
+ where Unix programmer base takes a different approach from Microsoft
+ - they are using UTF-16)</li>
+ </ul>
+ </li>
+</ul>
+
+<p>What does this mean in practice for the libxml user:</p>
+<ul>
+ <li>xmlChar, the libxml data type is a byte, those bytes must be assembled
+ as UTF-8 valid strings. The proper way to terminate an xmlChar * string
+ is simply to append 0 byte, as usual.</li>
+ <li>One just need to make sure that when using chars outside the ASCII set,
+ the values has been properly converted to UTF-8</li>
+</ul>
+
+<h3><a name="implemente">How is it implemented ?</a></h3>
+
+<p>Let's describe how all this works within libxml, basically the I18N
+(internationalization) support get triggered only during I/O operation, i.e.
+when reading a document or saving one. Let's look first at the reading
+sequence:</p>
+<ol>
+ <li>when a document is processed, we usually don't know the encoding, a
+ simple heuristic allows to detect UTF-18 and UCS-4 from whose where the
+ ASCII range (0-0x7F) maps with ASCII</li>
+ <li>the xml declaration if available is parsed, including the encoding
+ declaration. At that point, if the autodetected encoding is different
+ from the one declared a call to xmlSwitchEncoding() is issued.</li>
+ <li>If there is no encoding declaration, then the input has to be in either
+ UTF-8 or UTF-16, if it is not then at some point when processing the
+ input, the converter/checker of UTF-8 form will raise an encoding error.
+ You may end-up with a garbled document, or no document at all ! Example:
+ <pre>~/XML -> ./xmllint err.xml
+err.xml:1: error: Input is not proper UTF-8, indicate encoding !
+<très>là</très>
+ ^
+err.xml:1: error: Bytes: 0xE8 0x73 0x3E 0x6C
+<très>là</très>
+ ^</pre>
+ </li>
+ <li>xmlSwitchEncoding() does an encoding name lookup, canonalize it, and
+ then search the default registered encoding converters for that encoding.
+ If it's not within the default set and iconv() support has been compiled
+ it, it will ask iconv for such an encoder. If this fails then the parser
+ will report an error and stops processing:
+ <pre>~/XML -> ./xmllint err2.xml
+err2.xml:1: error: Unsupported encoding UnsupportedEnc
+<?xml version="1.0" encoding="UnsupportedEnc"?>
+ ^</pre>
+ </li>
+ <li>From that point the encoder process progressingly the input (it is
+ plugged as a front-end to the I/O module) for that entity. It captures
+ and convert on-the-fly the document to be parsed to UTF-8. The parser
+ itself just does UTF-8 checking of this input and process it
+ transparently. The only difference is that the encoding information has
+ been added to the parsing context (more precisely to the input
+ corresponding to this entity).</li>
+ <li>The result (when using DOM) is an internal form completely in UTF-8
+ with just an encoding information on the document node.</li>
+</ol>
+
+<p>Ok then what's happen when saving the document (assuming you
+colllected/built an xmlDoc DOM like structure) ? It depends on the function
+called, xmlSaveFile() will just try to save in the original encoding, while
+xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given
+encoding:</p>
+<ol>
+ <li>if no encoding is given, libxml will look for an encoding value
+ associated to the document and if it exists will try to save to that
+ encoding,
+ <p>otherwise everything is written in the internal form, i.e. UTF-8</p>
+ </li>
+ <li>so if an encoding was specified, either at the API level or on the
+ document, libxml will again canonalize the encoding name, lookup for a
+ converter in the registered set or through iconv. If not found the
+ function will return an error code</li>
+ <li>the converter is placed before the I/O buffer layer, as another kind of
+ buffer, then libxml will simply push the UTF-8 serialization to through
+ that buffer, which will then progressively be converted and pushed onto
+ the I/O layer.</li>
+ <li>It is possible that the converter code fails on some input, for example
+ trying to push an UTF-8 encoded chinese character through the UTF-8 to
+ ISO-8859-1 converter won't work. Since the encoders are progressive they
+ will just report the error and the number of bytes converted, at that
+ point libxml will decode the offending character, remove it from the
+ buffer and replace it with the associated charRef encoding &#123; and
+ resume the convertion. This guarante that any document will be saved
+ without losses (except for markup names where this is not legal, this is
+ a problem in the current version, in pactice avoid using non-ascci
+ characters for tags or attributes names @@). A special "ascii" encoding
+ name is used to save documents to a pure ascii form can be used when
+ portability is really crucial</li>
+</ol>
+
+<p>Here is a few examples based on the same test document:</p>
+<pre>~/XML -> ./xmllint isolat1
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<très>là</très>
+~/XML -> ./xmllint --encode UTF-8 isolat1
+<?xml version="1.0" encoding="UTF-8"?>
+<très>là </très>
+~/XML -> </pre>
+
+<p>The same processing is applied (and reuse most of the code) for HTML I18N
+processing. Looking up and modifying the content encoding is a bit more
+difficult since it is located in a <meta> tag under the <head>,
+so a couple of functions htmlGetMetaEncoding() and htmlSetMetaEncoding() have
+been provided. The parser also attempts to switch encoding on the fly when
+detecting such a tag on input. Except for that the processing is the same
+(and again reuses the same code).</p>
+
+<h3><a name="Default">Default supported encodings</a></h3>
+
+<p>libxml has a set of default converters for the following encodings
+(located in encoding.c):</p>
+<ol>
+ <li>UTF-8 is supported by default (null handlers)</li>
+ <li>UTF-16, both little and big endian</li>
+ <li>ISO-Latin-1 (ISO-8859-1) covering most western languages</li>
+ <li>ASCII, useful mostly for saving</li>
+ <li>HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML
+ predefined entities like &copy; for the Copyright sign.</li>
+</ol>
+
+<p>More over when compiled on an Unix platfor with iconv support the full set
+of encodings supported by iconv can be instantly be used by libxml. On a
+linux machine with glibc-2.1 the list of supported encodings and aliases fill
+3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the
+various Japanese ones.</p>
+
+<h4>Encoding aliases</h4>
+
+<p>From 2.2.3, libxml has support to register encoding names aliases. The
+goal is to be able to parse document whose encoding is supported but where
+the name differs (for example from the default set of names accepted by
+iconv). The following functions allow to register and handle new aliases for
+existing encodings. Once registered libxml will automatically lookup the
+aliases when handling a document:</p>
+<ul>
+ <li>int xmlAddEncodingAlias(const char *name, const char *alias);</li>
+ <li>int xmlDelEncodingAlias(const char *alias);</li>
+ <li>const char * xmlGetEncodingAlias(const char *alias);</li>
+ <li>void xmlCleanupEncodingAliases(void);</li>
+</ul>
+
+<h3><a name="extend">How to extend the existing support</a></h3>
+
+<p>Well adding support for new encoding, or overriding one of the encoders
+(assuming it is buggy) should not be hard, just write an input and output
+conversion routines to/from UTF-8, and register them using
+xmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx), and they will be
+called automatically if the parser(s) encounter such an encoding name
+(register it uppercase, this will help). The description of the encoders,
+their arguments and expected return values are described in the encoding.h
+header.</p>
+
+<p>A quick note on the topic of subverting the parser to use a different
+internal encoding than UTF-8, in some case people will absolutely want to
+keep the internal encoding different, I think it's still possible (but the
+encoding must be compliant with ASCII on the same subrange) though I didn't
+tried it. The key is to override the default conversion routines (by
+registering null encoders/decoders for your charsets), and bypass the UTF-8
+checking of the parser by setting the parser context charset
+(ctxt->charset) to something different than XML_CHAR_ENCODING_UTF8, but
+there is no guarantee taht this will work. You may also have some troubles
+saving back.</p>
+
+<p>Basically proper I18N support is important, this requires at least
+libxml-2.0.0, but a lot of features and corrections are really available only
+starting 2.2.</p>
+
+<h2><a name="IO">I/O Interfaces</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="#General1">General overview</a></li>
+ <li><a href="#basic">The basic buffer type</a></li>
+ <li><a href="#Input">Input I/O handlers</a></li>
+ <li><a href="#Output">Output I/O handlers</a></li>
+ <li><a href="#entities">The entities loader</a></li>
+ <li><a href="#Example2">Example of customized I/O</a></li>
+</ol>
+
+<h3><a name="General1">General overview</a></h3>
+
+<p>The module <code><a
+href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides
+the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
+<ul>
+ <li>Entities loader, this is a routine which tries to fetch the entities
+ (files) based on their PUBLIC and SYSTEM identifiers. The default loader
+ don't look at the public identifier since libxml do not maintain a
+ catalog. You can redefine you own entity loader by using
+ <code>xmlGetExternalEntityLoader()</code> and
+ <code>xmlSetExternalEntityLoader()</code>. <a
+ href="#entities">Check the example</a>.</li>
+ <li>Input I/O buffers which are a commodity structure used by the parser(s)
+ input layer to handle fetching the informations to feed the parser. This
+ provides buffering and is also a placeholder where the encoding
+ convertors to UTF8 are piggy-backed.</li>
+ <li>Output I/O buffers are similar to the Input ones and fulfill similar
+ task but when generating a serialization from a tree.</li>
+ <li>A mechanism to register sets of I/O callbacks and associate them with
+ specific naming schemes like the protocol part of the URIs.
+ <p>This affect the default I/O operations and allows to use specific I/O
+ handlers for certain names.</p>
+ </li>
+</ul>
+
+<p>The general mechanism used when loading http://rpmfind.net/xml.html for
+example in the HTML parser is the following:</p>
+<ol>
+ <li>The default entity loader calls <code>xmlNewInputFromFile()</code> with
+ the parsing context and the URI string.</li>
+ <li>the URI string is checked against the existing registered handlers
+ using their match() callback function, if the HTTP module was compiled
+ in, it is registered and its match() function will succeeds</li>
+ <li>the open() function of the handler is called and if successful will
+ return an I/O Input buffer</li>
+ <li>the parser will the start reading from this buffer and progressively
+ fetch information from the resource, calling the read() function of the
+ handler until the resource is exhausted</li>
+ <li>if an encoding change is detected it will be installed on the input
+ buffer, providing buffering and efficient use of the conversion
+ routines</li>
+ <li>once the parser has finished, the close() function of the handler is
+ called once and the Input buffer and associed resources are
+ deallocated.</li>
+</ol>
+
+<p>The user defined callbacks are checked first to allow overriding of the
+default libxml I/O routines.</p>
+
+<h3><a name="basic">The basic buffer type</a></h3>
+
+<p>All the buffer manipulation handling is done using the
+<code>xmlBuffer</code> type define in <code><a
+href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a
+resizable memory buffer. The buffer allocation strategy can be selected to be
+either best-fit or use an exponential doubling one (CPU vs. memory use
+tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
+<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
+system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
+of functions allows to manipulate buffers with names starting with the
+<code>xmlBuffer...</code> prefix.</p>
+
+<h3><a name="Input">Input I/O handlers</a></h3>
+
+<p>An Input I/O handler is a simple structure
+<code>xmlParserInputBuffer</code> containing a context associated to the
+resource (file descriptor, or pointer to a protocol handler), the read() and
+close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset
+encoding handler are also present to support charset conversion when
+needed.</p>
+
+<h3><a name="Output">Output I/O handlers</a></h3>
+
+<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an
+Input one except the callbacks are write() and close().</p>
+
+<h3><a name="entities">The entities loader</a></h3>
+
+<p>The entity loader resolves requests for new entities and create inputs for
+the parser. Creating an input from a filename or an URI string is done
+through the xmlNewInputFromFile() routine. The default entity loader do not
+handle the PUBLIC identifier associated with an entity (if any). So it just
+calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in
+XML).</p>
+
+<p>If you want to hook up a catalog mechanism then you simply need to
+override the default entity loader, here is an example:</p>
+<pre>#include <libxml/xmlIO.h>
+
+xmlExternalEntityLoader defaultLoader = NULL;
+
+xmlParserInputPtr
+xmlMyExternalEntityLoader(const char *URL, const char *ID,
+ xmlParserCtxtPtr ctxt) {
+ xmlParserInputPtr ret;
+ const char *fileID = NULL;
+ /* lookup for the fileID depending on ID */
+
+ ret = xmlNewInputFromFile(ctxt, fileID);
+ if (ret != NULL)
+ return(ret);
+ if (defaultLoader != NULL)
+ ret = defaultLoader(URL, ID, ctxt);
+ return(ret);
+}
+
+int main(..) {
+ ...
+
+ /*
+ * Install our own entity loader
+ */
+ defaultLoader = xmlGetExternalEntityLoader();
+ xmlSetExternalEntityLoader(xmlMyExternalEntityLoader);
+
+ ...
+}</pre>
+
+<h3><a name="Example2">Example of customized I/O</a></h3>
+
+<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a
+real use case</a>, xmlDocDump() closes the FILE * passed by the application
+and this was a problem. The <a
+href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
+new output handler with the closing call deactivated:</p>
+<ol>
+ <li>First define a new I/O ouput allocator where the output don't close the
+ file:
+ <pre>xmlOutputBufferPtr
+xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
+ xmlOutputBufferPtr ret;
+
+ if (xmlOutputCallbackInitialized == 0)
+ xmlRegisterDefaultOutputCallbacks();
+
+ if (file == NULL) return(NULL);
+ ret = xmlAllocOutputBuffer(encoder);
+ if (ret != NULL) {
+ ret->context = file;
+ ret->writecallback = xmlFileWrite;
+ ret->closecallback = NULL; /* No close callback */
+ }
+ return(ret); <br>
+
+
+
+} </pre>
+ </li>
+ <li>And then use it to save the document:
+ <pre>FILE *f;
+xmlOutputBufferPtr output;
+xmlDocPtr doc;
+int res;
+
+f = ...
+doc = ....
+
+output = xmlOutputBufferCreateOwn(f, NULL);
+res = xmlSaveFileTo(output, doc, NULL);
+ </pre>
+ </li>
+</ol>
+
+<h2><a name="Catalog">Catalog support</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="General2">General overview</a></li>
+ <li><a href="#definition">The definition</a></li>
+ <li><a href="#Simple">Using catalogs</a></li>
+ <li><a href="#Some">Some examples</a></li>
+ <li><a href="#reference">How to tune catalog usage</a></li>
+ <li><a href="#validate">How to debug catalog processing</a></li>
+ <li><a href="#Declaring">How to create and maintain catalogs</a></li>
+ <li><a href="#implemento">The implementor corner quick review of the
+ API</a></li>
+ <li><a href="#Other">Other resources</a></li>
+</ol>
+
+<h3><a name="General2">General overview</a></h3>
+
+<p>What is a catalog? Basically it's a lookup mechanism used when an entity
+(a file or a remote resource) references another entity. The catalog lookup
+is inserted between the moment the reference is recognized by the software
+(XML parser, stylesheet processing, or even images referenced for inclusion
+in a rendering) and the time where loading that resource is actually
+started.</p>
+
+<p>It is basically used for 3 things:</p>
+<ul>
+ <li>mapping from "logical" names, the public identifiers and a more
+ concrete name usable for download (and URI). For example it can associate
+ the logical name
+ <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p>
+ <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
+ downloaded</p>
+ <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p>
+ </li>
+ <li>remapping from a given URL to another one, like an HTTP indirection
+ saying that
+ <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
+ <p>should really be looked at</p>
+ <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p>
+ </li>
+ <li>providing a local cache mechanism allowing to load the entities
+ associated to public identifiers or remote resources, this is a really
+ important feature for any significant deployment of XML or SGML since it
+ allows to avoid the aleas and delays associated to fetching remote
+ resources.</li>
+</ul>
+
+<h3><a name="definition">The definitions</a></h3>
+
+<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
+<ul>
+ <li>the older SGML catalogs, the official spec is SGML Open Technical
+ Resolution TR9401:1997, but is better understood by reading <a
+ href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
+ James Clark. This is relatively old and not the preferred mode of
+ operation of libxml.</li>
+ <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
+ Catalogs</a>
+ is far more flexible, more recent, uses an XML syntax and should scale
+ quite better. This is the default option of libxml.</li>
+</ul>
+
+<p></p>
+
+<h3><a name="Simple">Using catalog</a></h3>
+
+<p>In a normal environment libxml will by default check the presence of a
+catalog in /etc/xml/catalog, and assuming it has been correctly populated,
+the processing is completely transparent to the document user. To take a
+concrete example, suppose you are authoring a DocBook document, this one
+starts with the following DOCTYPE definition:</p>
+<pre><?xml version='1.0'?>
+<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
+ "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre>
+
+<p>When validating the document with libxml, the catalog will be
+automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
+DocBk XML V3.1.4//EN" and the system identifier
+"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
+been installed on your system and the catalogs actually point to them, libxml
+will fetch them from the local disk.</p>
+
+<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
+DOCTYPE example it's a really old version, but is fine as an example.</p>
+
+<p>Libxml will check the catalog each time that it is requested to load an
+entity, this includes DTD, external parsed entities, stylesheets, etc ... If
+your system is correctly configured all the authoring phase and processing
+should use only local files, even if your document stays portable because it
+uses the canonical public and system ID, referencing the remote document.</p>
+
+<h3><a name="Some">Some examples:</a></h3>
+
+<p>Here is a couple of fragments from XML Catalogs used in libxml early
+regression tests in <code>test/catalogs</code> :</p>
+<pre><?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC
+ "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+ <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+ uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+...</pre>
+
+<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
+written in XML, there is a specific namespace for catalog elements
+"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
+catalog is a <code>public</code> mapping it allows to associate a Public
+Identifier with an URI.</p>
+<pre>...
+ <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+ rewritePrefix="file:///usr/share/xml/docbook/"/>
+...</pre>
+
+<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
+any URI starting with a given prefix should be looked at another URI
+constructed by replacing the prefix with an new one. In effect this acts like
+a cache system for a full area of the Web. In practice it is extremely useful
+with a file prefix if you have installed a copy of those resources on your
+local system.</p>
+<pre>...
+<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateURI uriStartString="http://www.oasis-open.org/docbook/"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+...</pre>
+
+<p>Delegation is the core features which allows to build a tree of catalogs,
+easier to maintain than a single catalog, based on Public Identifier, System
+Identifier or URI prefixes it instructs the catalog software to look up
+entries in another resource. This feature allow to build hierarchies of
+catalogs, the set of entries presented should be sufficient to redirect the
+resolution of all DocBook references to the specific catalog in
+<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
+references for DocBook 4.2.1 to a specific catalog installed at the same time
+as the DocBook resources on the local machine.</p>
+
+<h3><a name="reference">How to tune catalog usage:</a></h3>
+
+<p>The user can change the default catalog behaviour by redirecting queries
+to its own set of catalogs, this can be done by setting the
+<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
+empty one should deactivate loading the default <code>/etc/xml/catalog</code>
+default catalog</p>
+
+<h3><a name="validate">How to debug catalog processing:</a></h3>
+
+<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
+make libxml output debugging informations for each catalog operations, for
+example:</p>
+<pre>orchis:~/XML -> xmllint --memory --noout test/ent2
+warning: failed to load external entity "title.xml"
+orchis:~/XML -> export XML_DEBUG_CATALOG=
+orchis:~/XML -> xmllint --memory --noout test/ent2
+Failed to parse catalog /etc/xml/catalog
+Failed to parse catalog /etc/xml/catalog
+warning: failed to load external entity "title.xml"
+Catalogs cleanup
+orchis:~/XML -> </pre>
+
+<p>The test/ent2 references an entity, running the parser from memory makes
+the base URI unavailable and the the "title.xml" entity cannot be loaded.
+Setting up the debug environment variable allows to detect that an attempt is
+made to load the <code>/etc/xml/catalog</code> but since it's not present the
+resolution fails.</p>
+
+<p>But the most advanced way to debug XML catalog processing is to use the
+<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
+catalogs and make resolution queries to see what is going on. This is also
+used for the regression tests:</p>
+<pre>orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \
+ "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+orchis:~/XML -> </pre>
+
+<p>For debugging what is going on, adding one -v flags increase the verbosity
+level to indicate the processing done (adding a second flag also indicate
+what elements are recognized at parsing):</p>
+<pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
+ "-//OASIS//DTD DocBook XML V4.1.2//EN"
+Parsing catalog test/catalogs/docbook.xml's content
+Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+Catalogs cleanup
+orchis:~/XML -> </pre>
+
+<p>A shell interface is also available to debug and process multiple queries
+(and for regression tests):</p>
+<pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
+ "-//OASIS//DTD DocBook XML V4.1.2//EN"
+> help
+Commands available:
+public PublicID: make a PUBLIC identifier lookup
+system SystemID: make a SYSTEM identifier lookup
+resolve PublicID SystemID: do a full resolver lookup
+add 'type' 'orig' 'replace' : add an entry
+del 'values' : remove values
+dump: print the current catalog state
+debug: increase the verbosity level
+quiet: decrease the verbosity level
+exit: quit the shell
+> public "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+> quit
+orchis:~/XML -> </pre>
+
+<p>This should be sufficient for most debugging purpose, this was actually
+used heavily to debug the XML Catalog implementation itself.</p>
+
+<h3><a name="Declaring">How to create and maintain</a> catalogs:</h3>
+
+<p>Basically XML Catalogs are XML files, you can either use XML tools to
+manage them or use <strong>xmlcatalog</strong> for this. The basic step is
+to create a catalog the -create option provide this facility:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --create tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> </pre>
+
+<p>By default xmlcatalog does not overwrite the original catalog and save the
+result on the standard output, this can be overridden using the -noout
+option. The <code>-add</code> command allows to add entries in the
+catalog:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \
+ "-//OASIS//DTD DocBook XML V4.1.2//EN" \
+ http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
+orchis:~/XML -> cat tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+ uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+</catalog>
+orchis:~/XML -> </pre>
+
+<p>The <code>-add</code> option will always take 3 parameters even if some of
+the XML Catalog constructs (like nextCatalog) will have only a single
+argument, just pass a third empty string, it will be ignored.</p>
+
+<p>Similarly the <code>-del</code> option remove matching entries from the
+catalog:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --del \
+ "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> </pre>
+
+<p>The catalog is now empty. Note that the matching of <code>-del</code> is
+exact and would have worked in a similar fashion with the Public ID
+string.</p>
+
+<p>This is rudimentary but should be sufficient to manage a not too complex
+catalog tree of resources.</p>
+
+<h3><a name="implemento">The implementor corner quick review of the
+API:</a></h3>
+
+<p>First, and like for every other module of libxml, there is an
+automatically generated <a href="html/libxml-catalog.html">API page for
+catalog support</a>.</p>
+
+<p>The header for the catalog interfaces should be included as:</p>
+<pre>#include <libxml/catalog.h></pre>
+
+<p>The API is voluntarily kept very simple. First it is not obvious that
+applications really need access to it since it is the default behaviour of
+libxml (Note: it is possible to completely override libxml default catalog by
+using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to
+plug an application specific resolver).</p>
+
+<p>Basically libxml support 2 catalog lists:</p>
+<ul>
+ <li>the default one, global shared by all the application</li>
+ <li>a per-document catalog, this one is built if the document uses the
+ <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is
+ associated to the parser context and destroyed when the parsing context
+ is destroyed.</li>
+</ul>
+
+<p>the document one will be used first if it exists.</p>
+
+<h4>Initialization routines:</h4>
+
+<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be
+used at startup to initialize the catalog, if the catalog should be
+initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs()
+should be called before xmlInitializeCatalog() which would otherwise do a
+default initialization first.</p>
+
+<p>The xmlCatalogAddLocal() call is used by the parser to grow the document
+own catalog list if needed.</p>
+
+<h4>Preferences setup:</h4>
+
+<p>The XML Catalog spec requires the possibility to select default
+preferences between public and system delegation,
+xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
+xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should
+be forbidden, allowed for global catalog, for document catalog or both, the
+default is to allow both.</p>
+
+<p>And of course xmlCatalogSetDebug() allows to generate debug messages
+(through the xmlGenericError() mechanism).</p>
+
+<h4>Querying routines:</h4>
+
+<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()
+and xmlCatalogResolveURI() are relatively explicit if you read the XML
+Catalog specification they correspond to section 7 algorithms, they should
+also work if you have loaded an SGML catalog with a simplified semantic.</p>
+
+<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but
+operate on the document catalog list</p>
+
+<h4>Cleanup and Miscellaneous:</h4>
+
+<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is
+the per-document equivalent.</p>
+
+<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the
+first catalog in the global list, and xmlCatalogDump() allows to dump a
+catalog state, those routines are primarily designed for xmlcatalog, I'm not
+sure that exposing more complex interfaces (like navigation ones) would be
+really useful.</p>
+
+<p>The xmlParseCatalogFile() is a function used to load XML Catalog files,
+it's similar as xmlParseFile() except it bypass all catalog lookups, it's
+provided because this functionality may be useful for client tools.</p>
+
+<h4>threaded environments:</h4>
+
+<p>Since the catalog tree is built progressively, some care has been taken to
+try to avoid troubles in multithreaded environments. The code is now thread
+safe assuming that the libxml library has been compiled with threads
+support.</p>
+
+<p></p>
+
+<h3><a name="Other">Other resources</a></h3>
+
+<p>The XML Catalog specification is relatively recent so there isn't much
+literature to point at:</p>
+<ul>
+ <li>You can find an good rant from Norm Walsh about <a
+ href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
+ need for catalogs</a>, it provides a lot of context informations even if
+ I don't agree with everything presented.</li>
+ <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
+ catalog proposal</a> from John Cowan</li>
+ <li>The <a href="http://www.rddl.org/">Resource Directory Description
+ Language</a> (RDDL) another catalog system but more oriented toward
+ providing metadata for XML namespaces.</li>
+ <li>the page from the OASIS Technical <a
+ href="http://www.oasis-open.org/committees/entity/">Committee on Entity
+ Resolution</a> who maintains XML Catalog, you will find pointers to the
+ specification update, some background and pointers to others tools
+ providing XML Catalog support</li>
+ <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a
+ mall tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to
+ work fine for me</li>
+ <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog
+ manual page</a></li>
+</ul>
+
+<p>If you have suggestions for corrections or additions, simply contact
+me:</p>
+
+<h2><a name="library">The parser interfaces</a></h2>
<p>This section is directly intended to help programmers getting bootstrapped
using the XML library from the C language. It is not intended to be
@@ -1400,50 +2826,130 @@
try to provide ways to do this, but this may not be portable or
standardized.</p>
-<h2><a name="Validation">Validation, or are you afraid of DTDs ?</a></h2>
+<h2><a name="Upgrading">Upgrading 1.x code</a></h2>
-<p>Well what is validation and what is a DTD ?</p>
+<p>Incompatible changes:</p>
-<p>Validation is the process of checking a document against a set of
-construction rules; a <strong>DTD</strong> (Document Type Definition) is such
-a set of rules.</p>
+<p>Version 2 of libxml is the first version introducing serious backward
+incompatible changes. The main goals were:</p>
+<ul>
+ <li>a general cleanup. A number of mistakes inherited from the very early
+ versions couldn't be changed due to compatibility constraints. Example
+ the "childs" element in the nodes.</li>
+ <li>Uniformization of the various nodes, at least for their header and link
+ parts (doc, parent, children, prev, next), the goal is a simpler
+ programming model and simplifying the task of the DOM implementors.</li>
+ <li>better conformances to the XML specification, for example version 1.x
+ had an heuristic to try to detect ignorable white spaces. As a result the
+ SAX event generated were ignorableWhitespace() while the spec requires
+ character() in that case. This also mean that a number of DOM node
+ containing blank text may populate the DOM tree which were not present
+ before.</li>
+</ul>
-<p>The validation process and building DTDs are the two most difficult parts
-of the XML life cycle. Briefly a DTD defines all the possibles element to be
-found within your document, what is the formal shape of your document tree
-(by defining the allowed content of an element, either text, a regular
-expression for the allowed list of children, or mixed content i.e. both text
-and children). The DTD also defines the allowed attributes for all elements
-and the types of the attributes. For more detailed information, I suggest
-that you read the related parts of the XML specification, the examples found
-under gnome-xml/test/valid/dtd and any of the large number of books available
-on XML. The dia example in gnome-xml/test/valid should be both simple and
-complete enough to allow you to build your own.</p>
+<h3>How to fix libxml-1.x code:</h3>
-<p>A word of warning, building a good DTD which will fit the needs of your
-application in the long-term is far from trivial; however, the extra level of
-quality it can ensure is well worth the price for some sets of applications
-or if you already have already a DTD defined for your application field.</p>
+<p>So client code of libxml designed to run with version 1.x may have to be
+changed to compile against version 2.x of libxml. Here is a list of changes
+that I have collected, they may not be sufficient, so in case you find other
+change which are required, <a href="mailto:Daniel.Ïeillardw3.org">drop me a
+mail</a>:</p>
+<ol>
+ <li>The package name have changed from libxml to libxml2, the library name
+ is now -lxml2 . There is a new xml2-config script which should be used to
+ select the right parameters libxml2</li>
+ <li>Node <strong>childs</strong> field has been renamed
+ <strong>children</strong> so s/childs/children/g should be applied
+ (probablility of having "childs" anywere else is close to 0+</li>
+ <li>The document don't have anymore a <strong>root</strong> element it has
+ been replaced by <strong>children</strong> and usually you will get a
+ list of element here. For example a Dtd element for the internal subset
+ and it's declaration may be found in that list, as well as processing
+ instructions or comments found before or after the document root element.
+ Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
+ a document. Alternatively if you are sure to not reference Dtds nor have
+ PIs or comments before or after the root element
+ s/->root/->children/g will probably do it.</li>
+ <li>The white space issue, this one is more complex, unless special case of
+ validating parsing, the line breaks and spaces usually used for indenting
+ and formatting the document content becomes significant. So they are
+ reported by SAX and if your using the DOM tree, corresponding nodes are
+ generated. Too approach can be taken:
+ <ol>
+ <li>lazy one, use the compatibility call
+ <strong>xmlKeepBlanksDefault(0)</strong> but be aware that you are
+ relying on a special (and possibly broken) set of heuristics of
+ libxml to detect ignorable blanks. Don't complain if it breaks or
+ make your application not 100% clean w.r.t. to it's input.</li>
+ <li>the Right Way: change you code to accept possibly unsignificant
+ blanks characters, or have your tree populated with weird blank text
+ nodes. You can spot them using the comodity function
+ <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
+ nodes.</li>
+ </ol>
+ <p>Note also that with the new default the output functions don't add any
+ extra indentation when saving a tree in order to be able to round trip
+ (read and save) without inflating the document with extra formatting
+ chars.</p>
+ </li>
+ <li>The include path has changed to $prefix/libxml/ and the includes
+ themselves uses this new prefix in includes instructions... If you are
+ using (as expected) the
+ <pre>xml2-config --cflags</pre>
+ <p>output to generate you compile commands this will probably work out of
+ the box</p>
+ </li>
+ <li>xmlDetectCharEncoding takes an extra argument indicating the lenght in
+ byte of the head of the document available for character detection.</li>
+</ol>
-<p>The validation is not completely finished but in a (very IMHO) usable
-state. Until a real validation interface is defined the way to do it is to
-define and set the <strong>xmlDoValidityCheckingDefaultValue</strong>
-external variable to 1, this will of course be changed at some point:</p>
+<h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3>
-<p>extern int xmlDoValidityCheckingDefaultValue;</p>
+<p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released
+to allow smoth upgrade of existing libxml v1code while retaining
+compatibility. They offers the following:</p>
+<ol>
+ <li>similar include naming, one should use
+ <strong>#include<libxml/...></strong> in both cases.</li>
+ <li>similar identifiers defined via macros for the child and root fields:
+ respectively <strong>xmlChildrenNode</strong> and
+ <strong>xmlRootNode</strong></li>
+ <li>a new macro <strong>LIBXML_TEST_VERSION</strong> which should be
+ inserted once in the client code</li>
+</ol>
-<p>...</p>
+<p>So the roadmap to upgrade your existing libxml applications is the
+following:</p>
+<ol>
+ <li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li>
+ <li>find all occurences where the xmlDoc <strong>root</strong> field is
+ used and change it to <strong>xmlRootNode</strong></li>
+ <li>similary find all occurences where the xmlNode <strong>childs</strong>
+ field is used and change it to <strong>xmlChildrenNode</strong></li>
+ <li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your
+ <strong>main()</strong> or in the library init entry point</li>
+ <li>Recompile, check compatibility, it should still work</li>
+ <li>Change your configure script to look first for xml2-config and fallback
+ using xml-config . Use the --cflags and --libs ouptut of the command as
+ the Include and Linking parameters needed to use libxml.</li>
+ <li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and
+ libxml-devel-1.8.y can be kept simultaneously)</li>
+ <li>remove your config.cache, relaunch your configuration mechanism, and
+ recompile, if steps 2 and 3 were done right it should compile as-is</li>
+ <li>Test that your application is still running correctly, if not this may
+ be due to extra empty nodes due to formating spaces being kept in libxml2
+ contrary to libxml1, in that case insert xmlKeepBlanksDefault(1) in your
+ code before calling the parser (next to
+ <strong>LIBXML_TEST_VERSION</strong> is a fine place).</li>
+</ol>
-<p>xmlDoValidityCheckingDefaultValue = 1;</p>
+<p>Following those steps should work. It worked for some of my own code.</p>
-<p></p>
-
-<p>To handle external entities, use the function
-<strong>xmlSetExternalEntityLoader</strong>(xmlExternalEntityLoader f); to
-link in you HTTP/FTP/Entities database library to the standard libxml
-core.</p>
-
-<p>@@interfaces@@</p>
+<p>Let me put some emphasis on the fact that there is far more changes from
+libxml 1.x to 2.x than the ones you may have to patch for. The overall code
+has been considerably cleaned up and the conformance to the XML specification
+has been drastically improved too. Don't take those changes as an excuse to
+not upgrade, it may cost a lot on the long term ...</p>
<h2><a name="DOM"></a><a name="Principles">DOM Principles</a></h2>
@@ -1659,7 +3165,11 @@
<h2><a name="Contributi">Contributions</a></h2>
<ul>
- <li><a href="mailto:ari@lusis.org">Ari Johnson</a>
+ <li>Bjorn Reese, William Brack and Thomas Broyer have provided a number of
+ patches, Gary Pennington worked on the validation API, threading support
+ and Solaris port.</li>
+ <li>John Fleck helps maintaining the documentation and man pages.</li>
+ <li><p><a href="mailto:ari@lusis.org">Ari Johnson</a></p>
provides a C++ wrapper for libxml:
<p>Website: <a
href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p>
@@ -1698,6 +3208,6 @@
<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
-<p>$Id: xml.html,v 1.113 2001/10/19 14:50:57 veillard Exp $</p>
+<p>$Id: xml.html,v 1.114 2001/10/24 12:35:52 veillard Exp $</p>
</body>
</html>