added documentation about Catalog support, misses an API description
* doc/catalog.html doc/xml.html: added documentation about
Catalog support, misses an API description
* doc/html/*: reextracted the API pages
Daniel
diff --git a/doc/catalog.html b/doc/catalog.html
new file mode 100644
index 0000000..a93d2f2
--- /dev/null
+++ b/doc/catalog.html
@@ -0,0 +1,315 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+ "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+ <title>Libxml Catalog support</title>
+ <meta name="GENERATOR" content="amaya V5.0">
+ <meta http-equiv="Content-Type" content="text/html">
+</head>
+
+<body bgcolor="#ffffff">
+<h1 align="center">Libxml Catalog support</h1>
+
+<p>Location: <a
+href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
+
+<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
+
+<p>Mailing-list archive: <a
+href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
+
+<p>Version: $Revision:$</p>
+
+<p>Table of Content:</p>
+<ol>
+ <li><a href="#General">General overview</a></li>
+ <li><a href="#definition">The definition</a></li>
+ <li><a href="#Simple">Using catalogs</a></li>
+ <li><a href="#Some">Some examples</a></li>
+ <li><a href="#reference">How to tune catalog usage</a></li>
+ <li><a href="#validate">How to debug catalog processing</a></li>
+ <li><a href="#Declaring">How to create and maintain catalogs</a></li>
+ <li><a href="#implemento">The implementor corner quick review of the
+ API</a></li>
+ <li><a href="#Other">Other resources</a></li>
+</ol>
+
+<h2><a name="General">General overview</a></h2>
+
+<p>What is a catalog ? Basically it's a lookup mechanism which is used when
+an entity (a file or a remote resource) reference another entity. The catalog
+lookup is inserted between the moment the reference is recognized by the
+software (XML parser, stylesheet processing, or even images referenced for
+inclusion in a rendering) and the time where loading that resource is
+actually started. </p>
+
+<p>It is basically used for 3 things:</p>
+<ul>
+ <li>mapping from "logical" names, the public identifiers and a more
+ concrete name usable for download (and URI). For example it can associate
+ the logical name
+ <p>"-//OASIS//DTD DocBook XML V4.1.2//EN" </p>
+ <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
+ downloaded</p>
+ <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd </p>
+ </li>
+ <li>remapping from a given URL to another one, like an HTTP indirection
+ saying that
+ <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
+ <p>should really be looked at</p>
+ <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"
+ </p>
+ </li>
+ <li>providing a local cache mechanism allowing to load the entities
+ associated to public identifiers or remote resources, this is a really
+ important feature for any significant deployment of XML or SGML since it
+ allows to avoid the aleas and delays associated to fetching remore
+ resources.</li>
+</ul>
+
+<h2><a name="definition">The definitions</a></h2>
+
+<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
+<ul>
+ <li>the older SGML catalogs, the official spec is SGML Open Technical
+ Resolution TR9401:1997, but is better understood by reading <a
+ href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
+ James Clark. This is relatively old and not the preferred mode of
+ operation of libxml.</li>
+ <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
+ Catalogs</a> is far more flexible, more recent, uses an XML syntax and
+ should scale quite better. This is the default option of libxml.</li>
+</ul>
+
+<p></p>
+
+<h2><a name="Simple">Using catalog</a></h2>
+
+<p>In a normal environment libxml will by default check the presence of a
+catalog in /etc/xml/catalog, and assuming it has been correctly populated,
+the processing is completely transparent to the document user. To take a
+concrete example, suppose you are authoring a DocBook document, this one
+starts with the following DOCTYPE definition:</p>
+<pre><?xml version='1.0'?>
+<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
+ "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">
+
+</pre>
+
+<p>When validating the document with libxml, the catalog will be
+automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
+DocBk XML V3.1.4//EN" and the system identifier
+"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
+been installed on your system and the catalogs actually point to them, libxml
+will fetch them from the local disk.</p>
+
+<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
+DOCTYPE example it's a really old version, but is fine as an example.</p>
+
+<p>Libxml will check the catalog each time that it is requested to load an
+entity, this include DTD, external parsed entities, stylesheets, etc ... If
+your system is correctly configured all the authoring phase and processing
+should use only local files, even if your document stay portable because it
+uses the canonical public and system ID, referencing the remote document.</p>
+
+<h2><a name="Some">Some examples:</a></h2>
+
+<p>Here is a couple of fragments from XML Catalogs used in libxml early
+regression tests in <code>test/catalogs</code> :</p>
+<pre><?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+ <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+ uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+...</pre>
+
+<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
+written in XML, there is a specific namespace for catalog elements
+"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
+catalog is a <code>public</code> mapping it allows to associate a Public
+Identifier with an URI. </p>
+<pre>...
+ <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+ rewritePrefix="file:///usr/share/xml/docbook/"/>
+...</pre>
+
+<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
+any URI starting with a given prefix should be looked at another URI
+constructed by replacing the prefix with an new one. In effect this acts like
+a cache system for a full area of the Web. In practice it is extremely useful
+with a file prefix if you have installed a copy of those resources on your
+local system. </p>
+<pre>...
+<delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+<delegateURI uriStartString="http://www.oasis-open.org/docbook/"
+ catalog="file:///usr/share/xml/docbook.xml"/>
+...</pre>
+
+<p>Delegation is the core features which allows to build a tree of catalogs,
+easier to maintain than a single catalog, based on Public Identifier, System
+Identifier or URI prefixes it instruct the catalog software to lookup entries
+in another resource. This feature allow to build hierarchies of catalogs, the
+set of entries presented should be sufficient to redirect the resolution of
+all DocBook references to the specific catalog in
+<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
+references for DocBook 4.2.1 to a specific catalog installed at the same time
+as the DocBook resources on the local machine.</p>
+
+<h2><a name="reference">How to tune catalog usage:</a></h2>
+
+<p>The user can change the default catalog behaviour by redirecting queries
+to its own set of catalogs, this can be done by setting the
+<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
+empty one should desactivate loading the default
+<code>/etc/xml/catalog</code> default catalog.</p>
+
+<p>@@More options are likely to be provided in the future@@</p>
+
+<h2><a name="validate">How to debug catalog processing:</a></h2>
+
+<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
+make libxml output debugging informations for each catalog operations, for
+example:</p>
+<pre>orchis:~/XML -> xmllint --memory --noout test/ent2
+warning: failed to load external entity "title.xml"
+orchis:~/XML -> export XML_DEBUG_CATALOG=
+orchis:~/XML -> xmllint --memory --noout test/ent2
+Failed to parse catalog /etc/xml/catalog
+Failed to parse catalog /etc/xml/catalog
+warning: failed to load external entity "title.xml"
+Catalogs cleanup
+orchis:~/XML -> </pre>
+
+<p>The test/ent2 references an entity, running the parser from memory makes
+the base URI unavailable and the the "title.xml" entity cannot be loaded.
+Setting up the debug environment variable allows to detect that an attempt is
+made to load the <code>/etc/xml/catalog</code> but since it's not present the
+resolution fails. </p>
+
+<p>But the most advanced way to debug XML catalog processing is to use the
+<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
+catalogs and make resolution queries to see what is going on. This is also
+used for the regression tests:</p>
+<pre>orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+orchis:~/XML -> </pre>
+
+<p>For debugging what is going on, adding one -v flags increase the verbosity
+level to indicate the processing done (adding a second flag also indicate
+what elements are recognized at parsing):</p>
+<pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
+Parsing catalog test/catalogs/docbook.xml's content
+Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+Catalogs cleanup
+orchis:~/XML -> </pre>
+
+<p>A shell interface is also available to debug and process multiple queries
+(and for regression tests):</p>
+<pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
+> help
+Commands available:
+public PublicID: make a PUBLIC identifier lookup
+system SystemID: make a SYSTEM identifier lookup
+resolve PublicID SystemID: do a full resolver lookup
+add 'type' 'orig' 'replace' : add an entry
+del 'values' : remove values
+dump: print the current catalog state
+debug: increase the verbosity level
+quiet: decrease the verbosity level
+exit: quit the shell
+> public "-//OASIS//DTD DocBook XML V4.1.2//EN"
+http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
+> quit
+orchis:~/XML -> </pre>
+
+<p>This should be sufficient for most debugging purpose, this was actually
+used heavilly to debug the XML Catalog implementation itself.</p>
+
+<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
+
+<p>Basically XML Catalogs are XML files, you can either use XML tools to
+manage them or use <strong>xmlcatalog</strong> for this. The basic step is
+to create a catalog the -create option provide this facility:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --create tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+ "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> </pre>
+
+<p>By default xmlcatalog does not overwrite the original catalog and save the
+result on the standard output, this can be overrident using the -noout
+option. The <code>-add</code> command allows to add entries in the
+catalog:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
+orchis:~/XML -> cat tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
+ uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
+</catalog>
+orchis:~/XML -> </pre>
+
+<p>The <code>-add</code> option will always take 3 parameters even if some of
+the XML Catalog constructs (like nextCatalog) will have only a single
+argument, just pass a third empty string, it will be ignored.</p>
+
+<p>Similary the <code>-del</code> option remove matching entries from the
+catalog:</p>
+<pre>orchis:~/XML -> ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
+<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
+orchis:~/XML -> </pre>
+
+<p>The catalog is now empty. Note that the maching of <code>-del</code> is
+exact and would have worked in a similar fashion with the Public ID
+string.</p>
+
+<p> This is rudimentary but should be sufficient to manage a not too complex
+catalog tree of resources. </p>
+
+<h2><a name="implemento">The implementor corner quick review of the
+API:</a></h2>
+
+<p>@@TODO@@</p>
+
+<h2><a name="Other">Other resources</a></h2>
+
+<p>The XML Catalog specification is relatively recent so there isn't much
+litterature to point at:</p>
+<ul>
+ <li>You can find an good rant from Norm Walsh about <a
+ href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
+ need for catalogs</a>, it provides a lot of context informations even if
+ I don't agree with everything presented.</li>
+ <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
+ catalog proposal</a> from John Cowan</li>
+ <li>The <a href="http://www.rddl.org/">Resource Directory Description
+ Language</a> (RDDL) another catalog system but more oriented toward
+ providing metadata for XML namespaces.</li>
+ <li>the page from the OASIS Technical <a
+ href="http://www.oasis-open.org/committees/entity/">Committee on Entity
+ Resolution</a> who maintains XML Catalog, you will find pointers to the
+ specification update, some background and pointers to others tools
+ providing XML Catalog support</li>
+</ul>
+
+<p>If you have suggestions for corrections or additions, simply contact
+me:</p>
+
+<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
+
+<p>$Id:$</p>
+</body>
+</html>