Finished reintegrating the docs and unifying the look, may still need a couple of pointers but looks fine now. valid.html si now merged in xmldtd.html. Overall cleanup, Daniel

commit: b8cfbd12680cbd28c9eaafea2642b8f1cbd52a48 [log] [tgz]
author: Daniel Veillard <veillard@src.gnome.org> Thu Oct 25 10:53:28 2001 +0000
committer: Daniel Veillard <veillard@src.gnome.org> Thu Oct 25 10:53:28 2001 +0000
tree: ecb3a4c2213bc9f2b3670f0a4f232d5600760ca1
parent: 594cf0b2f20c5484cb915731cba921fd941745a7 [diff] [blame]
diff --git a/doc/xml.html b/doc/xml.html
index 19f3de8..de3f404 100644
--- a/doc/xml.html
+++ b/doc/xml.html

@@ -35,14 +35,6 @@
 
 <p>Separate documents:</p>
 <ul>
-  <li><a href="upgrade.html">upgrade instructions for migrating to
-  libxml2</a></li>
-  <li><a href="encoding.html">libxml Internationalization support</a></li>
-  <li><a href="xmlio.html">libxml Input/Output interfaces</a></li>
-  <li><a href="xmlmem.html">libxml Memory interfaces</a></li>
-  <li><a href="catalog.html">libxml Catalog support</a></li>
-  <li><a href="xmldtd.html">a short introduction about DTDs and
-  libxml</a></li>
   <li><a href="http://xmlsoft.org/XSLT/">the libxslt page</a></li>
   <li><a href="http://www.cs.unibo.it/~casarini/gdome2/">the gdome2 page: a
     standard DOM interface for libxml2</a></li>
@@ -89,6 +81,277 @@
 style="background-color: #FF0000">Do Not Use libxml1</span></strong>, use
 libxml2</p>
 
+<h2><a name="FAQ">FAQ</a></h2>
+
+<p>Table of Content:</p>
+<ul>
+  <li><a href="FAQ.html#Licence">Licence(s)</a></li>
+  <li><a href="FAQ.html#Installati">Installation</a></li>
+  <li><a href="FAQ.html#Compilatio">Compilation</a></li>
+  <li><a href="FAQ.html#Developer">Developer corner</a></li>
+</ul>
+
+<h3><a name="Licence">Licence</a>(s)</h3>
+<ol>
+  <li><em>Licensing Terms for libxml</em>
+    <p>libxml is released under 2 (compatible) licences:</p>
+    <ul>
+      <li>the <a href="http://www.gnu.org/copyleft/lgpl.html">LGPL</a>: GNU
+        Library General Public License</li>
+      <li>the <a
+        href="http://www.w3.org/Consortium/Legal/copyright-software-19980720.html">W3C
+        IPR</a>: very similar to the XWindow licence</li>
+    </ul>
+  </li>
+  <li><em>Can I embed libxml in a proprietary application ?</em>
+    <p>Yes. The W3C IPR allows you to also keep proprietary the changes you
+    made to libxml, but it would be graceful to provide back bugfixes and
+    improvements as patches for possible incorporation in the main
+    development tree</p>
+  </li>
+</ol>
+
+<h3><a name="Installati">Installation</a></h3>
+<ol>
+  <li>Unless you are forced to because your application links with a Gnome
+    library requiring it,  <strong><span style="background-color: #FF0000">Do
+    Not Use libxml1</span></strong>, use libxml2</li>
+  <li><em>Where can I get libxml</em>
+     ?
+    <p>The original distribution comes from <a
+    href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a
+    href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p>
+    <p>Most linux and Bsd distribution includes libxml, this is probably the
+    safer way for end-users</p>
+    <p>David Doolin provides precompiled Windows versions at <a
+    href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/         ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p>
+  </li>
+  <li><em>I see libxml and libxml2 releases, which one should I install ?</em>
+    <ul>
+      <li>If you are not concerned by any existing backward compatibility
+        with existing application, install libxml2 only</li>
+      <li>If you are not doing development, you can safely install both.
+        usually the packages <a
+        href="http://rpmfind.net/linux/RPM/libxml.html">libxml</a> and <a
+        href="http://rpmfind.net/linux/RPM/libxml2.html">libxml2</a> are
+        compatible (this is not the case for development packages)</li>
+      <li>If you are a developer and your system provides separate packaging
+        for shared libraries and the development components, it is possible
+        to install libxml and libxml2, and also <a
+        href="http://rpmfind.net/linux/RPM/libxml-devel.html">libxml-devel</a>
+        and <a
+        href="http://rpmfind.net/linux/RPM/libxml2-devel.html">libxml2-devel</a>
+        too for libxml2 &gt;= 2.3.0</li>
+      <li>If you are developing a new application, please develop against
+        libxml2(-devel)</li>
+    </ul>
+  </li>
+  <li><em>I can't install the libxml package it conflicts with libxml0</em>
+    <p>You probably have an old libxml0 package used to provide the shared
+    library for libxml.so.0, you can probably safely remove it. Anyway the
+    libxml packages provided on <a
+    href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> provides
+    libxml.so.0</p>
+  </li>
+  <li><em>I can't install the libxml(2) RPM package due to failed
+    dependancies</em>
+    <p>The most generic solution is to refetch the latest src.rpm , and
+    rebuild it locally with</p>
+    <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p>
+    <p>if everything goes well it will generate two binary rpm (one providing
+    the shared libs and xmllint, and the other one, the -devel package
+    providing includes, static libraries and scripts needed to build
+    applications with libxml(2)) that you can install locally.</p>
+  </li>
+</ol>
+
+<h3><a name="Compilatio">Compilation</a></h3>
+<ol>
+  <li><em>What is the process to compile libxml ?</em>
+    <p>As most UNIX libraries libxml follows the "standard":</p>
+    <p><code>gunzip -c xxx.tar.gz | tar xvf -</code></p>
+    <p><code>cd libxml-xxxx</code></p>
+    <p><code>./configure --help</code></p>
+    <p>to see the options, then the compilation/installation proper</p>
+    <p><code>./configure [possible options]</code></p>
+    <p><code>make</code></p>
+    <p><code>make install</code></p>
+    <p>At that point you may have to rerun ldconfig or similar utility to
+    update your list of installed shared libs.</p>
+  </li>
+  <li><em>What other libraries are needed to compile/install libxml ?</em>
+    <p>Libxml does not requires any other library, the normal C ANSI API
+    should be sufficient (please report any violation to this rule you may
+    find).</p>
+    <p>However if found at configuration time libxml will detect and use the
+    following libs:</p>
+    <ul>
+      <li><a href="http://www.info-zip.org/pub/infozip/zlib/">libz</a>
+         : a highly portable and available widely compression library</li>
+      <li>iconv: a powerful character encoding conversion library. It's
+        included by default on recent glibc libraries, so it doesn't need to
+        be installed specifically on linux. It seems it's now <a
+        href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part
+        of the official UNIX</a> specification. Here is one <a
+        href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation
+        of the library</a> which source can be found <a
+        href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li>
+    </ul>
+  </li>
+  <li><em>libxml does not compile with HP-UX's optional ANSI-C compiler</em>
+    <p>this is due to macro limitations. Try to add " -Wp,-H16800 -Ae" to the
+    CFLAGS</p>
+    <p>you can also install and use gcc instead or use a precompiled version
+    of libxml, both available from the <a
+    href="http://hpux.cae.wisc.edu/hppd/auto/summary_all.html">HP-UX Porting
+    and Archive Centre</a></p>
+  </li>
+  <li><em>make check fails on some platforms</em>
+    <p>Sometime the regression tests results don't completely match the value
+    produced by the parser, and the makefile uses diff to print the delta. On
+    some platforms the diff return breaks the compilation process, if the
+    diff is small this is probably not a serious problem</p>
+  </li>
+  <li><em>I use the CVS version and there is no configure script</em>
+    <p>The configure (and other Makefiles) are generated. Use the autogen.sh
+    script to regenerate the configure and Makefiles, like:</p>
+    <p><code>./autogen.sh --prefix=/usr --disable-shared</code></p>
+  </li>
+  <li><em>I have troubles when running make tests with gcc-3.0</em>
+    <p>It seems the initial release of gcc-3.0 has a problem with the
+    optimizer which miscompiles the URI module. Please use another
+    compiler</p>
+  </li>
+</ol>
+
+<h3><a name="Developer">Developer</a> corner</h3>
+<ol>
+  <li><em>xmlDocDump() generates output on one line</em>
+    <p>libxml will not <strong>invent</strong> spaces in the content of a
+    document since <strong>all spaces in the content of a document are
+    significant</strong>. If you build a tree from the API and want
+    indentation:</p>
+    <ol>
+      <li>the correct way is to generate those yourself too</li>
+      <li>the dangerous way is to ask libxml to add those blanks to your
+        content <strong>modifying the content of your document in the
+        process</strong>. The result may not be what you expect. There is
+        <strong>NO</strong> way to guarantee that such a modification won't
+        impact other part of the content of your document. See <a
+        href="http://xmlsoft.org/html/libxml-parser.html#XMLKEEPBLANKSDEFAULT">xmlKeepBlanksDefault
+        ()</a> and <a
+        href="http://xmlsoft.org/html/libxml-tree.html#XMLSAVEFORMATFILE">xmlSaveFormatFile
+        ()</a></li>
+    </ol>
+  </li>
+  <li>Extra nodes in the document:
+    <p><em>For a XML file as below:</em></p>
+    <pre>&lt;?xml version="1.0"?&gt;
+&lt;PLAN xmlns="http://www.argus.ca/autotest/1.0/"&gt;
+&lt;NODE CommFlag="0"/&gt;
+&lt;NODE CommFlag="1"/&gt;
+&lt;/PLAN&gt;</pre>
+    <p><em>after parsing it with the function
+    pxmlDoc=xmlParseFile(...);</em></p>
+    <p><em>I want to the get the content of the first node (node with the
+    CommFlag="0")</em></p>
+    <p><em>so I did it as following;</em></p>
+    <pre>xmlNodePtr pode;
+pnode=pxmlDoc-&gt;children-&gt;children;</pre>
+    <p><em>but it does not work. If I change it to</em></p>
+    <pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre>
+    <p><em>then it works.  Can someone explain it to me.</em></p>
+    <p></p>
+    <p>In XML all characters in the content of the document are significant
+    <strong>including blanks and formatting line breaks</strong>.</p>
+    <p>The extra nodes you are wondering about are just that, text nodes with
+    the formatting spaces wich are part of the document but that people tend
+    to forget. There is a function <a
+    href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault
+    ()</a>  to remove those at parse time, but that's an heuristic, and its
+    use should be limited to case where you are sure there is no
+    mixed-content in the document.</p>
+  </li>
+  <li><em>I get compilation errors of existing code like when accessing
+    <strong>root</strong> or <strong>childs fields</strong> of nodes</em>
+    <p>You are compiling code developed for libxml version 1 and using a
+    libxml2 development environment. Either switch back to libxml v1 devel or
+    even better fix the code to compile with libxml2 (or both) by <a
+    href="upgrade.html">following the instructions</a>.</p>
+  </li>
+  <li><em>I get compilation errors about non existing
+    <strong>xmlRootNode</strong> or <strong>xmlChildrenNode</strong>
+    fields</em>
+    <p>The source code you are using has been <a
+    href="upgrade.html">upgraded</a> to be able to compile with both libxml
+    and libxml2, but you need to install a more recent version:
+    libxml(-devel) &gt;= 1.8.8 or libxml2(-devel) &gt;= 2.1.0</p>
+  </li>
+  <li><em>XPath implementation looks seriously broken</em>
+    <p>XPath implementation prior to 2.3.0 was really incomplete, upgrade to
+    a recent version, the implementation and debug of libxslt generated fixes
+    for most obvious problems.</p>
+  </li>
+  <li><em>The example provided in the web page does not compile</em>
+    <p>It's hard to maintain the documentation in sync with the code
+    &lt;grin/&gt; ...</p>
+    <p>Check the previous points 1/ and 2/ raised before, and send
+    patches.</p>
+  </li>
+  <li><em>Where can I get more examples and informations than in the web
+    page</em>
+    <p>Ideally a libxml book would be nice. I have no such plan ... But you
+    can:</p>
+    <ul>
+      <li>check more deeply the <a href="html/libxml-lib.html">existing
+        generated doc</a></li>
+      <li>looks for examples of use for libxml function using the Gnome code
+        for example the following will query the full Gnome CVs base for the
+        use of the <strong>xmlAddChild()</strong> function:
+        <p><a
+        href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p>
+        <p>This may be slow, a large hardware donation to the gnome project
+        could cure this :-)</p>
+      </li>
+      <li><a
+        href="http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&amp;dir=gnome-xml">Browse
+        the libxml source</a>
+         , I try to write code as clean and documented as possible, so
+        looking at it may be helpful</li>
+    </ul>
+  </li>
+  <li>What about C++ ?
+    <p>libxml is written in pure C in order to allow easy reuse on a number
+    of platforms, including embedded systems. I don't intend to convert to
+    C++.</p>
+    <p>There is however a C++ wrapper provided by Ari Johnson
+    &lt;ari@btigate.com&gt; which may fullfill your needs:</p>
+    <p>Website: <a
+    href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p>
+    <p>Download: <a
+    href="http://lusis.org/~ari/xml++/libxml++.tar.gz">http://lusis.org/~ari/xml++/libxml++.tar.gz</a></p>
+  </li>
+  <li>How to validate a document a posteriori ?
+    <p>It is possible to validate documents which had not been validated at
+    initial parsing time or documents who have been built from scratch using
+    the API. Use the <a
+    href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a>
+    function. It is also possible to simply add a Dtd to an existing
+    document:</p>
+    <pre>xmlDocPtr doc; /* your existing document */
+        xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
+        dtd-&gt;name = xmlStrDup((xmlChar*)"root_name"); /* use the given root */
+
+        doc-&gt;intSubset = dtd;
+        if (doc-&gt;children == NULL) xmlAddChild((xmlNodePtr)doc, (xmlNodePtr)dtd);
+        else xmlAddPrevSibling(doc-&gt;children, (xmlNodePtr)dtd);
+          </pre>
+  </li>
+  <li>etc ...</li>
+</ol>
+
+<p></p>
+
 <h2><a name="Documentat">Documentation</a></h2>
 
 <p>There are some on-line resources about using libxml:</p>
@@ -909,7 +1172,7 @@
 supported and the progresses on the <a
 href="http://cvs.gnome.org/lxr/source/libxslt/ChangeLog">Changelog</a></p>
 
-<h2><a name="architecture">An overview of libxml architecture</a></h2>
+<h2><a name="architecture">libxml architecture</a></h2>
 
 <p>Libxml is made of multiple components; some of them are optional, and most
 of the block interfaces are public. The main components are:</p>
@@ -1051,7 +1314,1170 @@
 a set of registered default callbacks, without internal specific
 interface.</p>
 
-<h2><a name="library">The XML library interfaces</a></h2>
+<h2><a name="Validation">Validation &amp; DTDs</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+  <li><a href="#General5">General overview</a></li>
+  <li><a href="#definition">The definition</a></li>
+  <li><a href="#Simple">Simple rules</a>
+    <ol>
+      <li><a href="#reference">How to reference a DTD from a
+        document</a></li>
+      <li><a href="#Declaring">Declaring elements</a></li>
+      <li><a href="#Declaring1">Declaring attributes</a></li>
+    </ol>
+  </li>
+  <li><a href="#Some">Some examples</a></li>
+  <li><a href="#validate">How to validate</a></li>
+  <li><a href="#Other">Other resources</a></li>
+</ol>
+
+<h3><a name="General5">General overview</a></h3>
+
+<p>Well what is validation and what is a DTD ?</p>
+
+<p>DTD is the acronym for Document Type Definition. This is a description of
+the content for a familly of XML files. This is part of the XML 1.0
+specification, and alows to describe and check that a given document instance
+conforms to a set of rules detailing its structure and content.</p>
+
+<p>Validation is the process of checking a document against a DTD (more
+generally against a set of construction rules).</p>
+
+<p>The validation process and building DTDs are the two most difficult parts
+of the XML life cycle. Briefly a DTD defines all the possibles element to be
+found within your document, what is the formal shape of your document tree
+(by defining the allowed content of an element, either text, a regular
+expression for the allowed list of children, or mixed content i.e. both text
+and children). The DTD also defines the allowed attributes for all elements
+and the types of the attributes.</p>
+
+<h3><a name="definition1">The definition</a></h3>
+
+<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a
+href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
+Rev1</a>):</p>
+<ul>
+  <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
+  elements</a></li>
+  <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
+  attributes</a></li>
+</ul>
+
+<p>(unfortunately) all this is inherited from the SGML world, the syntax is
+ancient...</p>
+
+<h3><a name="Simple1">Simple rules</a></h3>
+
+<p>Writing DTD can be done in multiple ways, the rules to build them if you
+need something fixed or something which can evolve over time can be radically
+different. Really complex DTD like Docbook ones are flexible but quite harder
+to design. I will just focuse on DTDs for a formats with a fixed simple
+structure. It is just a set of basic rules, and definitely not exhaustive nor
+useable for complex DTD design.</p>
+
+<h4><a name="reference1">How to reference a DTD from a document</a>:</h4>
+
+<p>Assuming the top element of the document is <code>spec</code> and the dtd
+is placed in the file <code>mydtd</code> in the subdirectory
+<code>dtds</code> of the directory from where the document were loaded:</p>
+
+<p><code>&lt;!DOCTYPE spec SYSTEM "dtds/mydtd"&gt;</code></p>
+
+<p>Notes:</p>
+<ul>
+  <li>the system string is actually an URI-Reference (as defined in <a
+    href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a
+    full URL string indicating the location of your DTD on the Web, this is a
+    really good thing to do if you want others to validate your document</li>
+  <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
+    magic string) so that the DTd is looked up in catalogs on the client side
+    without having to locate it on the web</li>
+  <li>a dtd contains a set of elements and attributes declarations, but they
+    don't define what the root of the document should be. This is explicitely
+    told to the parser/validator as the first element of the
+    <code>DOCTYPE</code> declaration.</li>
+</ul>
+
+<h4><a name="Declaring2">Declaring elements</a>:</h4>
+
+<p>The following declares an element <code>spec</code>:</p>
+
+<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
+
+<p>it also expresses that the spec element contains one <code>front</code>,
+one <code>body</code> and one optionnal <code>back</code> children elements
+in this order. The declaration of one element of the structure and its
+content are done in a single declaration. Similary the following declares
+<code>div1</code> elements:</p>
+
+<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)&gt;</code></p>
+
+<p>means div1 contains one <code>head</code> then a series of optional
+<code>p</code>, <code>list</code>s and <code>note</code>s and then an
+optional <code>div2</code>. And last but not least an element can contain
+text:</p>
+
+<p><code>&lt;!ELEMENT b (#PCDATA)&gt;</code></p>
+
+<p><code>b</code> contains text or being of mixed content (text and elements
+in no particular order):</p>
+
+<p><code>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*&gt;</code></p>
+
+<p><code>p </code>can contain text or <code>a</code>, <code>ul</code>,
+<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
+order.</p>
+
+<h4><a name="Declaring1">Declaring attributes</a>:</h4>
+
+<p>again the attributes declaration includes their content definition:</p>
+
+<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
+
+<p>means that the element <code>termdef</code> can have a <code>name</code>
+attribute containing text (<code>CDATA</code>) and which is optionnal
+(<code>#IMPLIED</code>). The attribute value can also be defined within a
+set:</p>
+
+<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
+"ordered"&gt;</code></p>
+
+<p>means <code>list</code> element have a <code>type</code> attribute with 3
+allowed values "bullets", "ordered" or "glossary" and which default to
+"ordered" if the attribute is not explicitely specified.</p>
+
+<p>The content type of an attribute can be text (<code>CDATA</code>),
+anchor/reference/references
+(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
+(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
+(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
+<code>chapter</code> element can have an optional <code>id</code> attribute
+of type <code>ID</code>, usable for reference from attribute of type
+IDREF:</p>
+
+<p><code>&lt;!ATTLIST chapter id ID #IMPLIED&gt;</code></p>
+
+<p>The last value of an attribute definition can be <code>#REQUIRED
+</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
+meaning that it is optional, or the default value (possibly prefixed by
+<code>#FIXED</code> if it is the only allowed).</p>
+
+<p>Notes:</p>
+<ul>
+  <li>usually the attributes pertaining to a given element are declared in a
+    single expression, but it is just a convention adopted by a lot of DTD
+    writers:
+    <pre>&lt;!ATTLIST termdef
+          id      ID      #REQUIRED
+          name    CDATA   #IMPLIED&gt;</pre>
+    <p>The previous construct defines both <code>id</code> and
+    <code>name</code> attributes for the element <code>termdef</code></p>
+  </li>
+</ul>
+
+<h3><a name="Some1">Some examples</a></h3>
+
+<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
+contains some complex DTD examples. The  <code>test/valid/dia.xml</code>
+example shows an XML file where the simple DTD is directly included within
+the document.</p>
+
+<h3><a name="validate1">How to validate</a></h3>
+
+<p>The simplest is to use the xmllint program comming with libxml. The
+<code>--valid</code> option turn on validation of the files given as input,
+for example the following validates a copy of the first revision of the XML
+1.0 specification:</p>
+
+<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
+
+<p>the -- noout is used to not output the resulting tree.</p>
+
+<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against
+a given DTD.</p>
+
+<p>Libxml exports an API to handle DTDs and validation, check the <a
+href="http://xmlsoft.org/html/libxml-valid.html">associated
+description</a>.</p>
+
+<h3><a name="Other1">Other resources</a></h3>
+
+<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
+will just list one for now, others pointers welcome:</p>
+<ul>
+  <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li>
+</ul>
+
+<p>I suggest looking at the examples found under test/valid/dtd and any of
+the large number of books available on XML. The dia example in test/valid
+should be both simple and complete enough to allow you to build your own.</p>
+
+<p></p>
+
+<h2><a name="Memory">Memory Management</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+  <li><a href="#General3">General overview</a></li>
+  <li><a href="#setting">Setting libxml set of memory
+  routines</a></li>
+  <li><a href="#cleanup">Cleaning up after parsing</a></li>
+  <li><a href="#Debugging">Debugging routines</a></li>
+  <li><a href="#General4">General memory requirements</a></li>
+</ol>
+
+<h3><a name="General3">General overview</a></h3>
+
+<p>The module <code><a
+href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlmemory.h</a></code>
+provides the interfaces to the libxml memory system:</p>
+<ul>
+  <li>libxml does not use the libc memory allocator directly but xmlFree(),
+    xmlMalloc() and xmlRealloc()</li>
+  <li>those routines can be reallocated to a specific set of routine, by
+    default the libc ones i.e. free(), malloc() and realloc()</li>
+  <li>the xmlmemory.c module includes a set of debugging routine</li>
+</ul>
+
+<h3><a name="setting">Setting libxml set of memory routines</a></h3>
+
+<p>It is sometimes useful to not use the default memory allocator, either for
+debugging, analysis or to implement a specific behaviour on memory management
+(like on embedded systems). Two function calls are available to do so:</p>
+<ul>
+  <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemGet ()</a>
+     which return the current set of functions in use by the parser</li>
+  <li><a
+    href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemSetup()</a>
+     which allow to set up a new set of memory allocation functions</li>
+</ul>
+
+<p>Of course a call to xmlMemSetup() should probably be done before calling
+any other libxml routines (unless you are sure your allocations routines are
+compatibles).</p>
+
+<h3><a name="cleanup">Cleaning up after parsing</a></h3>
+
+<p>Libxml is not stateless, there is a few set of memory structures needing
+allocation before the parser is fully functionnal (some encoding structures
+for example). This also mean that once parsing is finished there is a tiny
+amount of memory (a few hundred bytes) which can be recollected if you don't
+reuse the parser immediately:</p>
+<ul>
+  <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlCleanupParser
+    ()</a>
+     is a centralized routine to free the parsing states. Note that it won't
+    deallocate any produced tree if any (use the xmlFreeDoc() and related
+    routines for this).</li>
+  <li><a href="http://xmlsoft.org/html/libxml-parser.html">xmlInitParser
+    ()</a>
+     is the dual routine allowing to preallocate the parsing state which can
+    be useful for example to avoid initialization reentrancy problems when
+    using libxml in multithreaded applications</li>
+</ul>
+
+<p>Generally xmlCleanupParser() is safe, if needed the state will be rebuild
+at the next invocation of parser routines, but be careful of the consequences
+in multithreaded applications.</p>
+
+<h3><a name="Debugging">Debugging routines</a></h3>
+
+<p>When configured using --with-mem-debug flag (off by default), libxml uses
+a set of memory allocation debugging routineskeeping track of all allocated
+blocks and the location in the code where the routine was called. A couple of
+other debugging routines allow to dump the memory allocated infos to a file
+or call a specific routine when a given block number is allocated:</p>
+<ul>
+  <li><a
+    href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMallocLoc()</a>
+     <a
+    href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlReallocLoc()</a>
+    and <a
+    href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemStrdupLoc()</a>
+    are the memory debugging replacement allocation routines</li>
+  <li><a href="http://xmlsoft.org/html/libxml-xmlmemory.html">xmlMemoryDump
+    ()</a>
+     dumps all the informations about the allocated memory block lefts in the
+    <code>.memdump</code> file</li>
+</ul>
+
+<p>When developping libxml memory debug is enabled, the tests programs call
+xmlMemoryDump () and the "make test" regression tests will check for any
+memory leak during the full regression test sequence, this helps a lot
+ensuring that libxml  does not leak memory and bullet proof memory
+allocations use (some libc implementations are known to be far too permissive
+resulting in major portability problems!).</p>
+
+<p>If the .memdump reports a leak, it displays the allocation function and
+also tries to give some informations about the content and structure of the
+allocated blocks left. This is sufficient in most cases to find the culprit,
+but not always. Assuming the allocation problem is reproductible, it is
+possible to find more easilly:</p>
+<ol>
+  <li>write down the block number xxxx not allocated</li>
+  <li>export the environement variable XML_MEM_BREAKPOINT=xxxx</li>
+  <li>run the program under a debugger and set a breakpoint on
+    xmlMallocBreakpoint() a specific function called when this precise block
+    is allocated</li>
+  <li>when the breakpoint is reached you can then do a fine analysis of the
+    allocation an step  to see the condition resulting in the missing
+    deallocation.</li>
+</ol>
+
+<p>I used to use a commercial tool to debug libxml memory problems but after
+noticing that it was not detecting memory leaks that simple mechanism was
+used and proved extremely efficient until now.</p>
+
+<h3><a name="General4">General memory requirements</a></h3>
+
+<p>How much libxml memory require ? It's hard to tell in average it depends
+of a number of things:</p>
+<ul>
+  <li>the parser itself should work  in a fixed amout of memory, except for
+    information maintained about the stacks of names and  entities locations.
+    The I/O and encoding handlers will probably account for a few KBytes.
+    This is true for both the XML and HTML parser (though the HTML parser
+    need more state).</li>
+  <li>If you are generating the DOM tree then memory requirements will grow
+    nearly lineary with the size of the data. In general for a balanced
+    textual document the internal memory requirement is about 4 times the
+    size of the UTF8 serialization of this document (exmple the XML-1.0
+    recommendation is a bit more of 150KBytes and takes 650KBytes of main
+    memory when parsed). Validation will add a amount of memory required for
+    maintaining the external Dtd state which should be linear with the
+    complexity of the content model defined by the Dtd</li>
+  <li>If you don't care about the advanced features of libxml like
+    validation, DOM, XPath or XPointer, but really need to work fixed memory
+    requirements, then the SAX interface should be used.</li>
+</ul>
+
+<p></p>
+
+<h2><a name="Encodings">Encodings support</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+  <li><a href="encoding.html#What">What does internationalization support
+    mean ?</a></li>
+  <li><a href="encoding.html#internal">The internal encoding, how and
+  why</a></li>
+  <li><a href="encoding.html#implemente">How is it implemented ?</a></li>
+  <li><a href="encoding.html#Default">Default supported encodings</a></li>
+  <li><a href="encoding.html#extend">How to extend the existing
+  support</a></li>
+</ol>
+
+<h3><a name="What">What does internationalization support mean ?</a></h3>
+
+<p>XML was designed from the start to allow the support of any character set
+by using Unicode. Any conformant XML parser has to support the UTF-8 and
+UTF-16 default encodings which can both express the full unicode ranges. UTF8
+is a variable length encoding whose greatest point are to resuse the same
+emcoding for ASCII and to save space for Western encodings, but it is a bit
+more complex to handle in practice. UTF-16 use 2 bytes per characters (and
+sometimes combines two pairs), it makes implementation easier, but looks a
+bit overkill for Western languages encoding. Moreover the XML specification
+allows document to be encoded in other encodings at the condition that they
+are clearly labelled as such. For example the following is a wellformed XML
+document encoded in ISO-8859 1 and using accentuated letter that we French
+likes for both markup and content:</p>
+<pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
+&lt;très&gt;là&lt;/très&gt;</pre>
+
+<p>Having internationalization support in libxml means the foolowing:</p>
+<ul>
+  <li>the document is properly parsed</li>
+  <li>informations about it's encoding are saved</li>
+  <li>it can be modified</li>
+  <li>it can be saved in its original encoding</li>
+  <li>it can also be saved in another encoding supported by libxml (for
+    example straight UTF8 or even an ASCII form)</li>
+</ul>
+
+<p>Another very important point is that the whole libxml API, with the
+exception of a few routines to read with a specific encoding or save to a
+specific encoding, is completely agnostic about the original encoding of the
+document.</p>
+
+<p>It should be noted too that the HTML parser embedded in libxml now obbey
+the same rules too, the following document will be (as of 2.2.2) handled  in
+an internationalized fashion by libxml too:</p>
+<pre>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
+                      "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
+&lt;html lang="fr"&gt;
+&lt;head&gt;
+  &lt;META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"&gt;
+&lt;/head&gt;
+&lt;body&gt;
+&lt;p&gt;W3C crée des standards pour le Web.&lt;/body&gt;
+&lt;/html&gt;</pre>
+
+<h3><a name="internal">The internal encoding, how and why</a></h3>
+
+<p>One of the core decision was to force all documents to be converted to a
+default internal encoding, and that encoding to be UTF-8, here are the
+rationale for those choices:</p>
+<ul>
+  <li>keeping the native encoding in the internal form would force the libxml
+    users (or the code associated) to be fully aware of the encoding of the
+    original document, for examples when adding a text node to a document,
+    the content would have to be provided in the document encoding, i.e. the
+    client code would have to check it before hand, make sure it's conformant
+    to the encoding, etc ... Very hard in practice, though in some specific
+    cases this may make sense.</li>
+  <li>the second decision was which encoding. From the XML spec only UTF8 and
+    UTF16 really makes sense as being the two only encodings for which there
+    is amndatory support. UCS-4 (32 bits fixed size encoding) could be
+    considered an intelligent choice too since it's a direct Unicode mapping
+    support. I selected UTF-8 on the basis of efficiency and compatibility
+    with surrounding software:
+    <ul>
+      <li>UTF-8 while a bit more complex to convert from/to (i.e. slightly
+        more costly to import and export CPU wise) is also far more compact
+        than UTF-16 (and UCS-4) for a majority of the documents I see it used
+        for right now (RPM RDF catalogs, advogato data, various configuration
+        file formats, etc.) and the key point for today's computer
+        architecture is efficient uses of caches. If one nearly double the
+        memory requirement to store the same amount of data, this will trash
+        caches (main memory/external caches/internal caches) and my take is
+        that this harms the system far more than the CPU requirements needed
+        for the conversion to UTF-8</li>
+      <li>Most of libxml version 1 users were using it with straight ASCII
+        most of the time, doing the conversion with an internal encoding
+        requiring all their code to be rewritten was a serious show-stopper
+        for using UTF-16 or UCS-4.</li>
+      <li>UTF-8 is being used as the de-facto internal encoding standard for
+        related code like the <a href="http://www.pango.org/">pango</a>
+        upcoming Gnome text widget, and a lot of Unix code (yep another place
+        where Unix programmer base takes a different approach from Microsoft
+        - they are using UTF-16)</li>
+    </ul>
+  </li>
+</ul>
+
+<p>What does this mean in practice for the libxml user:</p>
+<ul>
+  <li>xmlChar, the libxml data type is a byte, those bytes must be assembled
+    as UTF-8 valid strings. The proper way to terminate an xmlChar * string
+    is simply to append 0 byte, as usual.</li>
+  <li>One just need to make sure that when using chars outside the ASCII set,
+    the values has been properly converted to UTF-8</li>
+</ul>
+
+<h3><a name="implemente">How is it implemented ?</a></h3>
+
+<p>Let's describe how all this works within libxml, basically the I18N
+(internationalization) support get triggered only during I/O operation, i.e.
+when reading a document or saving one. Let's look first at the reading
+sequence:</p>
+<ol>
+  <li>when a document is processed, we usually don't know the encoding, a
+    simple heuristic allows to detect UTF-18 and UCS-4 from whose where the
+    ASCII range (0-0x7F) maps with ASCII</li>
+  <li>the xml declaration if available is parsed, including the encoding
+    declaration. At that point, if the autodetected encoding is different
+    from the one declared a call to xmlSwitchEncoding() is issued.</li>
+  <li>If there is no encoding declaration, then the input has to be in either
+    UTF-8 or UTF-16, if it is not then at some point when processing the
+    input, the converter/checker of UTF-8 form will raise an encoding error.
+    You may end-up with a garbled document, or no document at all ! Example:
+    <pre>~/XML -&gt; ./xmllint err.xml 
+err.xml:1: error: Input is not proper UTF-8, indicate encoding !
+&lt;très&gt;là&lt;/très&gt;
+   ^
+err.xml:1: error: Bytes: 0xE8 0x73 0x3E 0x6C
+&lt;très&gt;là&lt;/très&gt;
+   ^</pre>
+  </li>
+  <li>xmlSwitchEncoding() does an encoding name lookup, canonalize it, and
+    then search the default registered encoding converters for that encoding.
+    If it's not within the default set and iconv() support has been compiled
+    it, it will ask iconv for such an encoder. If this fails then the parser
+    will report an error and stops processing:
+    <pre>~/XML -&gt; ./xmllint err2.xml 
+err2.xml:1: error: Unsupported encoding UnsupportedEnc
+&lt;?xml version="1.0" encoding="UnsupportedEnc"?&gt;
+                                             ^</pre>
+  </li>
+  <li>From that point the encoder process progressingly the input (it is
+    plugged as a front-end to the I/O module) for that entity. It captures
+    and convert on-the-fly the document to be parsed to UTF-8. The parser
+    itself just does UTF-8 checking of this input and process it
+    transparently. The only difference is that the encoding information has
+    been added to the parsing context (more precisely to the input
+    corresponding to this entity).</li>
+  <li>The result (when using DOM) is an internal form completely in UTF-8
+    with just an encoding information on the document node.</li>
+</ol>
+
+<p>Ok then what's happen when saving the document (assuming you
+colllected/built an xmlDoc DOM like structure) ? It depends on the function
+called, xmlSaveFile() will just try to save in the original encoding, while
+xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given
+encoding:</p>
+<ol>
+  <li>if no encoding is given, libxml will look for an encoding value
+    associated to the document and if it exists will try to save to that
+    encoding,
+    <p>otherwise everything is written in the internal form, i.e. UTF-8</p>
+  </li>
+  <li>so if an encoding was specified, either at the API level or on the
+    document, libxml will again canonalize the encoding name, lookup for a
+    converter in the registered set or through iconv. If not found the
+    function will return an error code</li>
+  <li>the converter is placed before the I/O buffer layer, as another kind of
+    buffer, then libxml will simply push the UTF-8 serialization to through
+    that buffer, which will then progressively be converted and pushed onto
+    the I/O layer.</li>
+  <li>It is possible that the converter code fails on some input, for example
+    trying to push an UTF-8 encoded chinese character through the UTF-8 to
+    ISO-8859-1 converter won't work. Since the encoders are progressive they
+    will just report the error and the number of bytes converted, at that
+    point libxml will decode the offending character, remove it from the
+    buffer and replace it with the associated charRef encoding &amp;#123; and
+    resume the convertion. This guarante that any document will be saved
+    without losses (except for markup names where this is not legal, this is
+    a problem in the current version, in pactice avoid using non-ascci
+    characters for tags or attributes names  @@). A special "ascii" encoding
+    name is used to save documents to a pure ascii form can be used when
+    portability is really crucial</li>
+</ol>
+
+<p>Here is a few examples based on the same test document:</p>
+<pre>~/XML -&gt; ./xmllint isolat1 
+&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
+&lt;très&gt;là&lt;/très&gt;
+~/XML -&gt; ./xmllint --encode UTF-8 isolat1 
+&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;trÃ¨s&gt;lÃ  &lt;/trÃ¨s&gt;
+~/XML -&gt; </pre>
+
+<p>The same processing is applied (and reuse most of the code) for HTML I18N
+processing. Looking up and modifying the content encoding is a bit more
+difficult since it is located in a &lt;meta&gt; tag under the &lt;head&gt;,
+so a couple of functions htmlGetMetaEncoding() and htmlSetMetaEncoding() have
+been provided. The parser also attempts to switch encoding on the fly when
+detecting such a tag on input. Except for that the processing is the same
+(and again reuses the same code).</p>
+
+<h3><a name="Default">Default supported encodings</a></h3>
+
+<p>libxml has a set of default converters for the following encodings
+(located in encoding.c):</p>
+<ol>
+  <li>UTF-8 is supported by default (null handlers)</li>
+  <li>UTF-16, both little and big endian</li>
+  <li>ISO-Latin-1 (ISO-8859-1) covering most western languages</li>
+  <li>ASCII, useful mostly for saving</li>
+  <li>HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML
+    predefined entities like &amp;copy; for the Copyright sign.</li>
+</ol>
+
+<p>More over when compiled on an Unix platfor with iconv support the full set
+of encodings supported by iconv can be instantly be used by libxml. On a
+linux machine with glibc-2.1 the list of supported encodings and aliases fill
+3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the
+various Japanese ones.</p>
+
+<h4>Encoding aliases</h4>
+
+<p>From 2.2.3, libxml has support to register encoding names aliases. The
+goal is to be able to parse document whose encoding is supported but where
+the name differs (for example from the default set of names accepted by
+iconv). The following functions allow to register and handle new aliases for
+existing encodings. Once registered libxml will automatically lookup the
+aliases when handling a document:</p>
+<ul>
+  <li>int xmlAddEncodingAlias(const char *name, const char *alias);</li>
+  <li>int xmlDelEncodingAlias(const char *alias);</li>
+  <li>const char * xmlGetEncodingAlias(const char *alias);</li>
+  <li>void xmlCleanupEncodingAliases(void);</li>
+</ul>
+
+<h3><a name="extend">How to extend the existing support</a></h3>
+
+<p>Well adding support for new encoding, or overriding one of the encoders
+(assuming it is buggy) should not be hard, just write an input and output
+conversion routines to/from UTF-8, and register them using
+xmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx),  and they will be
+called automatically if the parser(s) encounter such an encoding name
+(register it uppercase, this will help). The description of the encoders,
+their arguments and expected return values are described in the encoding.h
+header.</p>
+
+<p>A quick note on the topic of subverting the parser to use a different
+internal encoding than UTF-8, in some case people will absolutely want to
+keep the internal encoding different, I think it's still possible (but the
+encoding must be compliant with ASCII on the same subrange) though I didn't
+tried it. The key is to override the default conversion routines (by
+registering null encoders/decoders for your charsets), and bypass the UTF-8
+checking of the parser by setting the parser context charset
+(ctxt-&gt;charset) to something different than XML_CHAR_ENCODING_UTF8, but
+there is no guarantee taht this will work. You may also have some troubles
+saving back.</p>
+
+<p>Basically proper I18N support is important, this requires at least
+libxml-2.0.0, but a lot of features and corrections are really available only
+starting 2.2.</p>
+
+<h2><a name="IO">I/O Interfaces</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+  <li><a href="#General1">General overview</a></li>
+  <li><a href="#basic">The basic buffer type</a></li>
+  <li><a href="#Input">Input I/O handlers</a></li>
+  <li><a href="#Output">Output I/O handlers</a></li>
+  <li><a href="#entities">The entities loader</a></li>
+  <li><a href="#Example2">Example of customized I/O</a></li>
+</ol>
+
+<h3><a name="General1">General overview</a></h3>
+
+<p>The module <code><a
+href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides
+the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
+<ul>
+  <li>Entities loader, this is a routine which tries to fetch the entities
+    (files) based on their PUBLIC and SYSTEM identifiers. The default loader
+    don't look at the public identifier since libxml do not maintain a
+    catalog. You can redefine you own entity loader by using
+    <code>xmlGetExternalEntityLoader()</code> and
+    <code>xmlSetExternalEntityLoader()</code>. <a
+    href="#entities">Check the example</a>.</li>
+  <li>Input I/O buffers which are a commodity structure used by the parser(s)
+    input layer to handle fetching the informations to feed the parser. This
+    provides buffering and is also a placeholder where the encoding
+    convertors to UTF8 are piggy-backed.</li>
+  <li>Output I/O buffers are similar to the Input ones and fulfill similar
+    task but when generating a serialization from a tree.</li>
+  <li>A mechanism to register sets of I/O callbacks and associate them with
+    specific naming schemes like the protocol part of the URIs.
+    <p>This affect the default I/O operations and allows to use specific I/O
+    handlers for certain names.</p>
+  </li>
+</ul>
+
+<p>The general mechanism used when loading http://rpmfind.net/xml.html for
+example in the HTML parser is the following:</p>
+<ol>
+  <li>The default entity loader calls <code>xmlNewInputFromFile()</code> with
+    the parsing context and the URI string.</li>
+  <li>the URI string is checked against the existing registered handlers
+    using their match() callback function, if the HTTP module was compiled
+    in, it is registered and its match() function will succeeds</li>
+  <li>the open() function of the handler is called and if successful will
+    return an I/O Input buffer</li>
+  <li>the parser will the start reading from this buffer and progressively
+    fetch information from the resource, calling the read() function of the
+    handler until the resource is exhausted</li>
+  <li>if an encoding change is detected it will be installed on the input
+    buffer, providing buffering and efficient use of the conversion
+  routines</li>
+  <li>once the parser has finished, the close() function of the handler is
+    called once and the Input buffer and associed resources are
+  deallocated.</li>
+</ol>
+
+<p>The user defined callbacks are checked first to allow overriding of the
+default libxml I/O routines.</p>
+
+<h3><a name="basic">The basic buffer type</a></h3>
+
+<p>All the buffer manipulation handling is done using the
+<code>xmlBuffer</code> type define in <code><a
+href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a
+resizable memory buffer. The buffer allocation strategy can be selected to be
+either best-fit or use an exponential doubling one (CPU vs. memory use
+tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
+<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
+system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
+of functions allows to manipulate buffers with names starting with the
+<code>xmlBuffer...</code> prefix.</p>
+
+<h3><a name="Input">Input I/O handlers</a></h3>
+
+<p>An Input I/O handler is a simple structure
+<code>xmlParserInputBuffer</code> containing a context associated to the
+resource (file descriptor, or pointer to a protocol handler), the read() and
+close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset
+encoding handler are also present to support charset conversion when
+needed.</p>
+
+<h3><a name="Output">Output I/O handlers</a></h3>
+
+<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an
+Input one except the callbacks are write() and close().</p>
+
+<h3><a name="entities">The entities loader</a></h3>
+
+<p>The entity loader resolves requests for new entities and create inputs for
+the parser. Creating an input from a filename or an URI string is done
+through the xmlNewInputFromFile() routine.  The default entity loader do not
+handle the PUBLIC identifier associated with an entity (if any). So it just
+calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in
+XML).</p>
+
+<p>If you want to hook up a catalog mechanism then you simply need to
+override the default entity loader, here is an example:</p>
+<pre>#include &lt;libxml/xmlIO.h&gt;
+
+xmlExternalEntityLoader defaultLoader = NULL;
+
+xmlParserInputPtr
+xmlMyExternalEntityLoader(const char *URL, const char *ID,
+                               xmlParserCtxtPtr ctxt) {
+    xmlParserInputPtr ret;
+    const char *fileID = NULL;
+    /* lookup for the fileID depending on ID */
+
+    ret = xmlNewInputFromFile(ctxt, fileID);
+    if (ret != NULL)
+        return(ret);
+    if (defaultLoader != NULL)
+        ret = defaultLoader(URL, ID, ctxt);
+    return(ret);
+}
+
+int main(..) {
+    ...
+
+    /*
+     * Install our own entity loader
+     */
+    defaultLoader = xmlGetExternalEntityLoader();
+    xmlSetExternalEntityLoader(xmlMyExternalEntityLoader);
+
+    ...
+}</pre>
+
+<h3><a name="Example2">Example of customized I/O</a></h3>
+
+<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a
+real use case</a>,  xmlDocDump() closes the FILE * passed by the application
+and this was a problem. The <a
+href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
+new output handler with the closing call deactivated:</p>
+<ol>
+  <li>First define a new I/O ouput allocator where the output don't close the
+    file:
+    <pre>xmlOutputBufferPtr
+xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
+    xmlOutputBufferPtr ret;
+    
+    if (xmlOutputCallbackInitialized == 0)
+        xmlRegisterDefaultOutputCallbacks();
+
+    if (file == NULL) return(NULL);
+    ret = xmlAllocOutputBuffer(encoder);
+    if (ret != NULL) {
+        ret-&gt;context = file;
+        ret-&gt;writecallback = xmlFileWrite;
+        ret-&gt;closecallback = NULL;  /* No close callback */
+    }
+    return(ret); <br>
+
+
+
+} </pre>
+  </li>
+  <li>And then use it to save the document:
+    <pre>FILE *f;
+xmlOutputBufferPtr output;
+xmlDocPtr doc;
+int res;
+
+f = ...
+doc = ....
+
+output = xmlOutputBufferCreateOwn(f, NULL);
+res = xmlSaveFileTo(output, doc, NULL);
+    </pre>
+  </li>
+</ol>
+
+<h2><a name="Catalog">Catalog support</a></h2>
+
+<p>Table of Content:</p>
+<ol>
+  <li><a href="General2">General overview</a></li>
+  <li><a href="#definition">The definition</a></li>
+  <li><a href="#Simple">Using catalogs</a></li>
+  <li><a href="#Some">Some examples</a></li>
+  <li><a href="#reference">How to tune  catalog usage</a></li>
+  <li><a href="#validate">How to debug catalog processing</a></li>
+  <li><a href="#Declaring">How to create and maintain catalogs</a></li>
+  <li><a href="#implemento">The implementor corner quick review of the
+  API</a></li>
+  <li><a href="#Other">Other resources</a></li>
+</ol>
+
+<h3><a name="General2">General overview</a></h3>
+
+<p>What is a catalog? Basically it's a lookup mechanism used when an entity
+(a file or a remote resource) references another entity. The catalog lookup
+is inserted between the moment the reference is recognized by the software
+(XML parser, stylesheet processing, or even images referenced for inclusion
+in a rendering) and the time where loading that resource is actually
+started.</p>
+
+<p>It is basically used for 3 things:</p>
+<ul>
+  <li>mapping from "logical" names, the public identifiers and a more
+    concrete name usable for download (and URI). For example it can associate
+    the logical name
+    <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p>
+    <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
+    downloaded</p>
+    <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p>
+  </li>
+  <li>remapping from a given URL to another one, like an HTTP indirection
+    saying that
+    <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
+    <p>should really be looked at</p>
+    <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p>
+  </li>
+  <li>providing a local cache mechanism allowing to load the entities
+    associated to public identifiers or remote resources, this is a really
+    important feature for any significant deployment of XML or SGML since it
+    allows to avoid the aleas and delays associated to fetching remote
+    resources.</li>
+</ul>
+
+<h3><a name="definition">The definitions</a></h3>
+
+<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
+<ul>
+  <li>the older SGML catalogs, the official spec is  SGML Open Technical
+    Resolution TR9401:1997, but is better understood by reading <a
+    href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
+    James Clark. This is relatively old and not the preferred mode of
+    operation of libxml.</li>
+  <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
+    Catalogs</a>
+     is far more flexible, more recent, uses an XML syntax and should scale
+    quite better. This is the default option of libxml.</li>
+</ul>
+
+<p></p>
+
+<h3><a name="Simple">Using catalog</a></h3>
+
+<p>In a normal environment libxml will by default check the presence of a
+catalog in /etc/xml/catalog, and assuming it has been correctly populated,
+the processing is completely transparent to the document user. To take a
+concrete example, suppose you are authoring a DocBook document, this one
+starts with the following DOCTYPE definition:</p>
+<pre>&lt;?xml version='1.0'?&gt;
+&lt;!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
+          "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"&gt;</pre>
+
+<p>When validating the document with libxml, the catalog will be
+automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
+DocBk XML V3.1.4//EN" and the system identifier
+"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
+been installed on your system and the catalogs actually point to them, libxml
+will fetch them from the local disk.</p>
+
+<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
+DOCTYPE example it's a really old version, but is fine as an example.</p>
+
+<p>Libxml will check the catalog each time that it is requested to load an
+entity, this includes DTD, external parsed entities, stylesheets, etc ... If
+your system is correctly configured all the authoring phase and processing
+should use only local files, even if your document stays portable because it
+uses the canonical public and system ID, referencing the remote document.</p>
+
+<h3><a name="Some">Some examples:</a></h3>
+
+<p>Here is a couple of fragments from XML Catalogs used in libxml early
+regression tests in <code>test/catalogs</code> :</p>
+<pre>&lt;?xml version="1.0"?&gt;
+&lt;!DOCTYPE catalog PUBLIC 
+   "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
+&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
+  &lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+   uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
+...</pre>
+
+<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
+written in XML,  there is a specific namespace for catalog elements
+"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
+catalog is a <code>public</code> mapping it allows to associate a Public
+Identifier with an URI.</p>
+<pre>...
+    &lt;rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+                   rewritePrefix="file:///usr/share/xml/docbook/"/&gt;
+...</pre>
+
+<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
+any URI starting with a given prefix should be looked at another  URI
+constructed by replacing the prefix with an new one. In effect this acts like
+a cache system for a full area of the Web. In practice it is extremely useful
+with a file prefix if you have installed a copy of those resources on your
+local system.</p>
+<pre>...
+&lt;delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
+                catalog="file:///usr/share/xml/docbook.xml"/&gt;
+&lt;delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
+                catalog="file:///usr/share/xml/docbook.xml"/&gt;
+&lt;delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
+                catalog="file:///usr/share/xml/docbook.xml"/&gt;
+&lt;delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+                catalog="file:///usr/share/xml/docbook.xml"/&gt;
+&lt;delegateURI uriStartString="http://www.oasis-open.org/docbook/"
+                catalog="file:///usr/share/xml/docbook.xml"/&gt;
+...</pre>
+
+<p>Delegation is the core features which allows to build a tree of catalogs,
+easier to maintain than a single catalog, based on Public Identifier, System
+Identifier or URI prefixes it instructs the catalog software to look up
+entries in another resource. This feature allow to build hierarchies of
+catalogs, the set of entries presented should be sufficient to redirect the
+resolution of all DocBook references to the specific catalog in
+<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
+references for DocBook 4.2.1 to a specific catalog installed at the same time
+as the DocBook resources on the local machine.</p>
+
+<h3><a name="reference">How to tune catalog usage:</a></h3>
+
+<p>The user can change the default catalog behaviour by redirecting queries
+to its own set of catalogs, this can be done by setting the
+<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
+empty one should deactivate loading the default <code>/etc/xml/catalog</code>
+default catalog</p>
+
+<h3><a name="validate">How to debug catalog processing:</a></h3>
+
+<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
+make libxml output debugging informations for each catalog operations, for
+example:</p>
+<pre>orchis:~/XML -&gt; xmllint --memory --noout test/ent2
+warning: failed to load external entity "title.xml"
+orchis:~/XML -&gt; export XML_DEBUG_CATALOG=
+orchis:~/XML -&gt; xmllint --memory --noout test/ent2
+Failed to parse catalog /etc/xml/catalog
+Failed to parse catalog /etc/xml/catalog
+warning: failed to load external entity "title.xml"
+Catalogs cleanup
+orchis:~/XML -&gt; </pre>
+
+<p>The test/ent2 references an entity, running the parser from memory makes
+the base URI unavailable and the the "title.xml" entity cannot be loaded.
+Setting up the debug environment variable allows to detect that an attempt is
+made to load the <code>/etc/xml/catalog</code> but since it's not present the
+resolution fails.</p>
+
+<p>But the most advanced way to debug XML catalog processing is to use the
+<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
+catalogs and make resolution queries to see what is going on. This is also
+used for the regression tests:</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+orchis:~/XML -&gt; </pre>
+
+<p>For debugging what is going on, adding one -v flags increase the verbosity
+level to indicate the processing done (adding a second flag also indicate
+what elements are recognized at parsing):</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog -v test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+Parsing catalog test/catalogs/docbook.xml's content
+Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+Catalogs cleanup
+orchis:~/XML -&gt; </pre>
+
+<p>A shell interface is also available to debug and process multiple queries
+(and for regression tests):</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog -shell test/catalogs/docbook.xml \
+                   "-//OASIS//DTD DocBook XML V4.1.2//EN"
+&gt; help   
+Commands available:
+public PublicID: make a PUBLIC identifier lookup
+system SystemID: make a SYSTEM identifier lookup
+resolve PublicID SystemID: do a full resolver lookup
+add 'type' 'orig' 'replace' : add an entry
+del 'values' : remove values
+dump: print the current catalog state
+debug: increase the verbosity level
+quiet: decrease the verbosity level
+exit:  quit the shell
+&gt; public "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+&gt; quit
+orchis:~/XML -&gt; </pre>
+
+<p>This should be sufficient for most debugging purpose, this was actually
+used heavily to debug the XML Catalog implementation itself.</p>
+
+<h3><a name="Declaring">How to create and maintain</a> catalogs:</h3>
+
+<p>Basically XML Catalogs are XML files, you can either use XML tools to
+manage them or use  <strong>xmlcatalog</strong> for this. The basic step is
+to create a catalog the -create option provide this facility:</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog --create tst.xml
+&lt;?xml version="1.0"?&gt;
+&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+         "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
+&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
+orchis:~/XML -&gt; </pre>
+
+<p>By default xmlcatalog does not overwrite the original catalog and save the
+result on the standard output, this can be overridden using the -noout
+option. The <code>-add</code> command allows to add entries in the
+catalog:</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog --noout --create --add "public" \
+  "-//OASIS//DTD DocBook XML V4.1.2//EN" \
+  http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
+orchis:~/XML -&gt; cat tst.xml
+&lt;?xml version="1.0"?&gt;
+&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \
+  "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
+&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
+&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+        uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
+&lt;/catalog&gt;
+orchis:~/XML -&gt; </pre>
+
+<p>The <code>-add</code> option will always take 3 parameters even if some of
+the XML Catalog constructs (like nextCatalog) will have only a single
+argument, just pass a third empty string, it will be ignored.</p>
+
+<p>Similarly the <code>-del</code> option remove matching entries from the
+catalog:</p>
+<pre>orchis:~/XML -&gt; ./xmlcatalog --del \
+  "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
+&lt;?xml version="1.0"?&gt;
+&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+    "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
+&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
+orchis:~/XML -&gt; </pre>
+
+<p>The catalog is now empty. Note that the matching of <code>-del</code> is
+exact and would have worked in a similar fashion with the Public ID
+string.</p>
+
+<p>This is rudimentary but should be sufficient to manage a not too complex
+catalog tree of resources.</p>
+
+<h3><a name="implemento">The implementor corner quick review of the
+API:</a></h3>
+
+<p>First, and like for every other module of libxml, there is an
+automatically generated <a href="html/libxml-catalog.html">API page for
+catalog support</a>.</p>
+
+<p>The header for the catalog interfaces should be included as:</p>
+<pre>#include &lt;libxml/catalog.h&gt;</pre>
+
+<p>The API is voluntarily kept very simple. First it is not obvious that
+applications really need access to it since it is the default behaviour of
+libxml (Note: it is possible to completely override libxml default catalog by
+using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to
+plug an application specific resolver).</p>
+
+<p>Basically libxml support 2 catalog lists:</p>
+<ul>
+  <li>the default one, global shared by all the application</li>
+  <li>a per-document catalog, this one is built if the document uses the
+    <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is
+    associated to the parser context and destroyed when the parsing context
+    is destroyed.</li>
+</ul>
+
+<p>the document one will be used first if it exists.</p>
+
+<h4>Initialization routines:</h4>
+
+<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be
+used at startup to initialize the catalog, if the catalog should be
+initialized with specific values xmlLoadCatalog()  or xmlLoadCatalogs()
+should be called before xmlInitializeCatalog() which would otherwise do a
+default initialization first.</p>
+
+<p>The xmlCatalogAddLocal() call is used by the parser to grow the document
+own catalog list if needed.</p>
+
+<h4>Preferences setup:</h4>
+
+<p>The XML Catalog spec requires the possibility to select default
+preferences between  public and system delegation,
+xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
+xmlCatalogGetDefaults() allow to control  if XML Catalogs resolution should
+be forbidden, allowed for global catalog, for document catalog or both, the
+default is to allow both.</p>
+
+<p>And of course xmlCatalogSetDebug() allows to generate debug messages
+(through the xmlGenericError() mechanism).</p>
+
+<h4>Querying routines:</h4>
+
+<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()
+and xmlCatalogResolveURI() are relatively explicit if you read the XML
+Catalog specification they correspond to section 7 algorithms, they should
+also work if you have loaded an SGML catalog with a simplified semantic.</p>
+
+<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but
+operate on the document catalog list</p>
+
+<h4>Cleanup and Miscellaneous:</h4>
+
+<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is
+the per-document equivalent.</p>
+
+<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the
+first catalog in the global list, and xmlCatalogDump() allows to dump a
+catalog state, those routines are primarily designed for xmlcatalog, I'm not
+sure that exposing more complex interfaces (like navigation ones) would be
+really useful.</p>
+
+<p>The xmlParseCatalogFile() is a function used to load XML Catalog files,
+it's similar as xmlParseFile() except it bypass all catalog lookups, it's
+provided because this functionality may be useful for client tools.</p>
+
+<h4>threaded environments:</h4>
+
+<p>Since the catalog tree is built progressively, some care has been taken to
+try to avoid troubles in multithreaded environments. The code is now thread
+safe assuming that the libxml library has been compiled with threads
+support.</p>
+
+<p></p>
+
+<h3><a name="Other">Other resources</a></h3>
+
+<p>The XML Catalog specification is relatively recent so there isn't much
+literature to point at:</p>
+<ul>
+  <li>You can find an good rant from Norm Walsh about <a
+    href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
+    need for catalogs</a>, it provides a lot of context informations even if
+    I don't agree with everything presented.</li>
+  <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
+    catalog proposal</a> from John Cowan</li>
+  <li>The <a href="http://www.rddl.org/">Resource Directory Description
+    Language</a> (RDDL) another catalog system but more oriented toward
+    providing metadata for XML namespaces.</li>
+  <li>the page from the OASIS Technical <a
+    href="http://www.oasis-open.org/committees/entity/">Committee on Entity
+    Resolution</a> who maintains XML Catalog, you will find pointers to the
+    specification update, some background and pointers to others tools
+    providing XML Catalog support</li>
+  <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a
+    mall tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to
+    work fine for me</li>
+  <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog
+    manual page</a></li>
+</ul>
+
+<p>If you have suggestions for corrections or additions, simply contact
+me:</p>
+
+<h2><a name="library">The parser interfaces</a></h2>
 
 <p>This section is directly intended to help programmers getting bootstrapped
 using the XML library from the C language. It is not intended to be
@@ -1400,50 +2826,130 @@
 try to provide ways to do this, but this may not be portable or
 standardized.</p>
 
-<h2><a name="Validation">Validation, or are you afraid of DTDs ?</a></h2>
+<h2><a name="Upgrading">Upgrading 1.x code</a></h2>
 
-<p>Well what is validation and what is a DTD ?</p>
+<p>Incompatible changes:</p>
 
-<p>Validation is the process of checking a document against a set of
-construction rules; a <strong>DTD</strong> (Document Type Definition) is such
-a set of rules.</p>
+<p>Version 2 of libxml is the first version introducing serious backward
+incompatible changes. The main goals were:</p>
+<ul>
+  <li>a general cleanup. A number of mistakes inherited from the very early
+    versions couldn't be changed due to compatibility constraints. Example
+    the "childs" element in the nodes.</li>
+  <li>Uniformization of the various nodes, at least for their header and link
+    parts (doc, parent, children, prev, next), the goal is a simpler
+    programming model and simplifying the task of the DOM implementors.</li>
+  <li>better conformances to the XML specification, for example version 1.x
+    had an heuristic to try to detect ignorable white spaces. As a result the
+    SAX event generated were ignorableWhitespace() while the spec requires
+    character() in that case. This also mean that a number of DOM node
+    containing blank text may populate the DOM tree which were not present
+    before.</li>
+</ul>
 
-<p>The validation process and building DTDs are the two most difficult parts
-of the XML life cycle. Briefly a DTD defines all the possibles element to be
-found within your document, what is the formal shape of your document tree
-(by defining the allowed content of an element, either text, a regular
-expression for the allowed list of children, or mixed content i.e. both text
-and children). The DTD also defines the allowed attributes for all elements
-and the types of the attributes. For more detailed information, I suggest
-that you read the related parts of the XML specification, the examples found
-under gnome-xml/test/valid/dtd and any of the large number of books available
-on XML. The dia example in gnome-xml/test/valid should be both simple and
-complete enough to allow you to build your own.</p>
+<h3>How to fix libxml-1.x code:</h3>
 
-<p>A word of warning, building a good DTD which will fit the needs of your
-application in the long-term is far from trivial; however, the extra level of
-quality it can ensure is well worth the price for some sets of applications
-or if you already have already a DTD defined for your application field.</p>
+<p>So client code of libxml designed to run with version 1.x may have to be
+changed to compile against version 2.x of libxml. Here is a list of changes
+that I have collected, they may not be sufficient, so in case you find other
+change which are required, <a href="mailto:Daniel.Ïeillardw3.org">drop me a
+mail</a>:</p>
+<ol>
+  <li>The package name have changed from libxml to libxml2, the library name
+    is now -lxml2 . There is a new xml2-config script which should be used to
+    select the right parameters libxml2</li>
+  <li>Node <strong>childs</strong> field has been renamed
+    <strong>children</strong> so s/childs/children/g should be  applied
+    (probablility of having "childs" anywere else is close to 0+</li>
+  <li>The document don't have anymore a <strong>root</strong> element it has
+    been replaced by <strong>children</strong> and usually you will get a
+    list of element here. For example a Dtd element for the internal subset
+    and it's declaration may be found in that list, as well as processing
+    instructions or comments found before or after the document root element.
+    Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
+    a document. Alternatively if you are sure to not reference Dtds nor have
+    PIs or comments before or after the root element
+    s/-&gt;root/-&gt;children/g will probably do it.</li>
+  <li>The white space issue, this one is more complex, unless special case of
+    validating parsing, the line breaks and spaces usually used for indenting
+    and formatting the document content becomes significant. So they are
+    reported by SAX and if your using the DOM tree, corresponding nodes are
+    generated. Too approach can be taken:
+    <ol>
+      <li>lazy one, use the compatibility call
+        <strong>xmlKeepBlanksDefault(0)</strong> but be aware that you are
+        relying on a special (and possibly broken) set of heuristics of
+        libxml to detect ignorable blanks. Don't complain if it breaks or
+        make your application not 100% clean w.r.t. to it's input.</li>
+      <li>the Right Way: change you code to accept possibly unsignificant
+        blanks characters, or have your tree populated with weird blank text
+        nodes. You can spot them using the comodity function
+        <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
+        nodes.</li>
+    </ol>
+    <p>Note also that with the new default the output functions don't add any
+    extra indentation when saving a tree in order to be able to round trip
+    (read and save) without inflating the document with extra formatting
+    chars.</p>
+  </li>
+  <li>The include path has changed to $prefix/libxml/ and the includes
+    themselves uses this new prefix in includes instructions... If you are
+    using (as expected) the
+    <pre>xml2-config --cflags</pre>
+    <p>output to generate you compile commands this will probably work out of
+    the box</p>
+  </li>
+  <li>xmlDetectCharEncoding takes an extra argument indicating the lenght in
+    byte of the head of the document available for character detection.</li>
+</ol>
 
-<p>The validation is not completely finished but in a (very IMHO) usable
-state. Until a real validation interface is defined the way to do it is to
-define and set the <strong>xmlDoValidityCheckingDefaultValue</strong>
-external variable to 1, this will of course be changed at some point:</p>
+<h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3>
 
-<p>extern int xmlDoValidityCheckingDefaultValue;</p>
+<p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released
+to allow smoth upgrade of existing libxml v1code while retaining
+compatibility. They offers the following:</p>
+<ol>
+  <li>similar include naming, one should use
+    <strong>#include&lt;libxml/...&gt;</strong> in both cases.</li>
+  <li>similar identifiers defined via macros for the child and root fields:
+    respectively <strong>xmlChildrenNode</strong> and
+    <strong>xmlRootNode</strong></li>
+  <li>a new macro <strong>LIBXML_TEST_VERSION</strong> which should be
+    inserted once in the client code</li>
+</ol>
 
-<p>...</p>
+<p>So the roadmap to upgrade your existing libxml applications is the
+following:</p>
+<ol>
+  <li>install the  libxml-1.8.8 (and libxml-devel-1.8.8) packages</li>
+  <li>find all occurences where the xmlDoc <strong>root</strong> field is
+    used and change it to <strong>xmlRootNode</strong></li>
+  <li>similary find all occurences where the xmlNode <strong>childs</strong>
+    field is used and change it to <strong>xmlChildrenNode</strong></li>
+  <li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your
+    <strong>main()</strong> or in the library init entry point</li>
+  <li>Recompile, check compatibility, it should still work</li>
+  <li>Change your configure script to look first for xml2-config and fallback
+    using xml-config . Use the --cflags and --libs ouptut of the command as
+    the Include and Linking parameters needed to use libxml.</li>
+  <li>install libxml2-2.3.x and  libxml2-devel-2.3.x (libxml-1.8.y and
+    libxml-devel-1.8.y can be kept simultaneously)</li>
+  <li>remove your config.cache, relaunch your configuration mechanism, and
+    recompile, if steps 2 and 3 were done right it should compile as-is</li>
+  <li>Test that your application is still running correctly, if not this may
+    be due to extra empty nodes due to formating spaces being kept in libxml2
+    contrary to libxml1, in that case insert xmlKeepBlanksDefault(1) in your
+    code before calling the parser (next to
+    <strong>LIBXML_TEST_VERSION</strong> is a fine place).</li>
+</ol>
 
-<p>xmlDoValidityCheckingDefaultValue = 1;</p>
+<p>Following those steps should work. It worked for some of my own code.</p>
 
-<p></p>
-
-<p>To handle external entities, use the function
-<strong>xmlSetExternalEntityLoader</strong>(xmlExternalEntityLoader f); to
-link in you HTTP/FTP/Entities database library to the standard libxml
-core.</p>
-
-<p>@@interfaces@@</p>
+<p>Let me put some emphasis on the fact that there is far more changes from
+libxml 1.x to 2.x than the ones you may have to patch for. The overall code
+has been considerably cleaned up and the conformance to the XML specification
+has been drastically improved too. Don't take those changes as an excuse to
+not upgrade, it may cost a lot on the long term ...</p>
 
 <h2><a name="DOM"></a><a name="Principles">DOM Principles</a></h2>
 
@@ -1659,7 +3165,11 @@
 
 <h2><a name="Contributi">Contributions</a></h2>
 <ul>
-  <li><a href="mailto:ari@lusis.org">Ari Johnson</a>
+  <li>Bjorn Reese, William Brack and Thomas Broyer have provided a number of
+    patches, Gary Pennington worked on the validation API, threading support
+    and Solaris port.</li>
+  <li>John Fleck helps maintaining the documentation and man pages.</li>
+  <li><p><a href="mailto:ari@lusis.org">Ari Johnson</a></p>
      provides a  C++ wrapper for libxml:
     <p>Website: <a
     href="http://lusis.org/~ari/xml++/">http://lusis.org/~ari/xml++/</a></p>
@@ -1698,6 +3208,6 @@
 
 <p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
 
-<p>$Id: xml.html,v 1.113 2001/10/19 14:50:57 veillard Exp $</p>
+<p>$Id: xml.html,v 1.114 2001/10/24 12:35:52 veillard Exp $</p>
 </body>
 </html>
commit	b8cfbd12680cbd28c9eaafea2642b8f1cbd52a48	[log] [tgz]
author	Daniel Veillard <veillard@src.gnome.org>	Thu Oct 25 10:53:28 2001 +0000
committer	Daniel Veillard <veillard@src.gnome.org>	Thu Oct 25 10:53:28 2001 +0000
tree	ecb3a4c2213bc9f2b3670f0a4f232d5600760ca1
parent	594cf0b2f20c5484cb915731cba921fd941745a7 [diff] [blame]