blob: 2ee09867176293135900c64a774519b8a6d4f4fe [file] [log] [blame]
Daniel Veillarde7ead2d2001-08-22 23:44:09 +00001<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5 <title>Libxml Catalog support</title>
6 <meta name="GENERATOR" content="amaya V5.0">
7 <meta http-equiv="Content-Type" content="text/html">
8</head>
9
10<body bgcolor="#ffffff">
11<h1 align="center">Libxml Catalog support</h1>
12
13<p>Location: <a
14href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
15
16<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
17
18<p>Mailing-list archive: <a
19href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
20
Daniel Veillard9f7b84b2001-08-23 15:31:19 +000021<p>Version: $Revision: 1.2 $</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000022
23<p>Table of Content:</p>
24<ol>
25 <li><a href="#General">General overview</a></li>
26 <li><a href="#definition">The definition</a></li>
27 <li><a href="#Simple">Using catalogs</a></li>
28 <li><a href="#Some">Some examples</a></li>
29 <li><a href="#reference">How to tune catalog usage</a></li>
30 <li><a href="#validate">How to debug catalog processing</a></li>
31 <li><a href="#Declaring">How to create and maintain catalogs</a></li>
32 <li><a href="#implemento">The implementor corner quick review of the
33 API</a></li>
34 <li><a href="#Other">Other resources</a></li>
35</ol>
36
37<h2><a name="General">General overview</a></h2>
38
39<p>What is a catalog ? Basically it's a lookup mechanism which is used when
40an entity (a file or a remote resource) reference another entity. The catalog
41lookup is inserted between the moment the reference is recognized by the
42software (XML parser, stylesheet processing, or even images referenced for
43inclusion in a rendering) and the time where loading that resource is
Daniel Veillardffb120d2001-08-23 00:52:23 +000044actually started.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000045
46<p>It is basically used for 3 things:</p>
47<ul>
48 <li>mapping from "logical" names, the public identifiers and a more
49 concrete name usable for download (and URI). For example it can associate
Daniel Veillardffb120d2001-08-23 00:52:23 +000050 the logical name
51 <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000052 <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
53 downloaded</p>
Daniel Veillardffb120d2001-08-23 00:52:23 +000054 <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000055 </li>
56 <li>remapping from a given URL to another one, like an HTTP indirection
57 saying that
58 <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
59 <p>should really be looked at</p>
Daniel Veillardffb120d2001-08-23 00:52:23 +000060 <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000061 </li>
62 <li>providing a local cache mechanism allowing to load the entities
63 associated to public identifiers or remote resources, this is a really
64 important feature for any significant deployment of XML or SGML since it
65 allows to avoid the aleas and delays associated to fetching remore
66 resources.</li>
67</ul>
68
69<h2><a name="definition">The definitions</a></h2>
70
71<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
72<ul>
73 <li>the older SGML catalogs, the official spec is SGML Open Technical
74 Resolution TR9401:1997, but is better understood by reading <a
75 href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
76 James Clark. This is relatively old and not the preferred mode of
77 operation of libxml.</li>
78 <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
Daniel Veillardffb120d2001-08-23 00:52:23 +000079 Catalogs</a>
80 is far more flexible, more recent, uses an XML syntax and should scale
81 quite better. This is the default option of libxml.</li>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000082</ul>
83
84<p></p>
85
86<h2><a name="Simple">Using catalog</a></h2>
87
88<p>In a normal environment libxml will by default check the presence of a
89catalog in /etc/xml/catalog, and assuming it has been correctly populated,
90the processing is completely transparent to the document user. To take a
91concrete example, suppose you are authoring a DocBook document, this one
92starts with the following DOCTYPE definition:</p>
93<pre>&lt;?xml version='1.0'?&gt;
94&lt;!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
Daniel Veillard9f7b84b2001-08-23 15:31:19 +000095 "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"&gt;</pre>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000096
97<p>When validating the document with libxml, the catalog will be
98automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
99DocBk XML V3.1.4//EN" and the system identifier
100"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
101been installed on your system and the catalogs actually point to them, libxml
102will fetch them from the local disk.</p>
103
104<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
105DOCTYPE example it's a really old version, but is fine as an example.</p>
106
107<p>Libxml will check the catalog each time that it is requested to load an
108entity, this include DTD, external parsed entities, stylesheets, etc ... If
109your system is correctly configured all the authoring phase and processing
110should use only local files, even if your document stay portable because it
111uses the canonical public and system ID, referencing the remote document.</p>
112
113<h2><a name="Some">Some examples:</a></h2>
114
115<p>Here is a couple of fragments from XML Catalogs used in libxml early
116regression tests in <code>test/catalogs</code> :</p>
117<pre>&lt;?xml version="1.0"?&gt;
118&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
119 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
120&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
121 &lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
122 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
123...</pre>
124
125<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
126written in XML, there is a specific namespace for catalog elements
127"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
128catalog is a <code>public</code> mapping it allows to associate a Public
Daniel Veillardffb120d2001-08-23 00:52:23 +0000129Identifier with an URI.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000130<pre>...
131 &lt;rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
132 rewritePrefix="file:///usr/share/xml/docbook/"/&gt;
133...</pre>
134
135<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
136any URI starting with a given prefix should be looked at another URI
137constructed by replacing the prefix with an new one. In effect this acts like
138a cache system for a full area of the Web. In practice it is extremely useful
139with a file prefix if you have installed a copy of those resources on your
Daniel Veillardffb120d2001-08-23 00:52:23 +0000140local system.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000141<pre>...
142&lt;delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
143 catalog="file:///usr/share/xml/docbook.xml"/&gt;
144&lt;delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
145 catalog="file:///usr/share/xml/docbook.xml"/&gt;
146&lt;delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
147 catalog="file:///usr/share/xml/docbook.xml"/&gt;
148&lt;delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
149 catalog="file:///usr/share/xml/docbook.xml"/&gt;
150&lt;delegateURI uriStartString="http://www.oasis-open.org/docbook/"
151 catalog="file:///usr/share/xml/docbook.xml"/&gt;
152...</pre>
153
154<p>Delegation is the core features which allows to build a tree of catalogs,
155easier to maintain than a single catalog, based on Public Identifier, System
156Identifier or URI prefixes it instruct the catalog software to lookup entries
157in another resource. This feature allow to build hierarchies of catalogs, the
158set of entries presented should be sufficient to redirect the resolution of
159all DocBook references to the specific catalog in
160<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
161references for DocBook 4.2.1 to a specific catalog installed at the same time
162as the DocBook resources on the local machine.</p>
163
164<h2><a name="reference">How to tune catalog usage:</a></h2>
165
166<p>The user can change the default catalog behaviour by redirecting queries
167to its own set of catalogs, this can be done by setting the
168<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
169empty one should desactivate loading the default
170<code>/etc/xml/catalog</code> default catalog.</p>
171
172<p>@@More options are likely to be provided in the future@@</p>
173
174<h2><a name="validate">How to debug catalog processing:</a></h2>
175
176<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
177make libxml output debugging informations for each catalog operations, for
178example:</p>
179<pre>orchis:~/XML -&gt; xmllint --memory --noout test/ent2
180warning: failed to load external entity "title.xml"
181orchis:~/XML -&gt; export XML_DEBUG_CATALOG=
182orchis:~/XML -&gt; xmllint --memory --noout test/ent2
183Failed to parse catalog /etc/xml/catalog
184Failed to parse catalog /etc/xml/catalog
185warning: failed to load external entity "title.xml"
186Catalogs cleanup
187orchis:~/XML -&gt; </pre>
188
189<p>The test/ent2 references an entity, running the parser from memory makes
190the base URI unavailable and the the "title.xml" entity cannot be loaded.
191Setting up the debug environment variable allows to detect that an attempt is
192made to load the <code>/etc/xml/catalog</code> but since it's not present the
Daniel Veillardffb120d2001-08-23 00:52:23 +0000193resolution fails.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000194
195<p>But the most advanced way to debug XML catalog processing is to use the
196<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
197catalogs and make resolution queries to see what is going on. This is also
198used for the regression tests:</p>
199<pre>orchis:~/XML -&gt; ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
200http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
201orchis:~/XML -&gt; </pre>
202
203<p>For debugging what is going on, adding one -v flags increase the verbosity
204level to indicate the processing done (adding a second flag also indicate
205what elements are recognized at parsing):</p>
206<pre>orchis:~/XML -&gt; ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
207Parsing catalog test/catalogs/docbook.xml's content
208Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
209http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
210Catalogs cleanup
211orchis:~/XML -&gt; </pre>
212
213<p>A shell interface is also available to debug and process multiple queries
214(and for regression tests):</p>
215<pre>orchis:~/XML -&gt; ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
216&gt; help
217Commands available:
218public PublicID: make a PUBLIC identifier lookup
219system SystemID: make a SYSTEM identifier lookup
220resolve PublicID SystemID: do a full resolver lookup
221add 'type' 'orig' 'replace' : add an entry
222del 'values' : remove values
223dump: print the current catalog state
224debug: increase the verbosity level
225quiet: decrease the verbosity level
226exit: quit the shell
227&gt; public "-//OASIS//DTD DocBook XML V4.1.2//EN"
228http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
229&gt; quit
230orchis:~/XML -&gt; </pre>
231
232<p>This should be sufficient for most debugging purpose, this was actually
233used heavilly to debug the XML Catalog implementation itself.</p>
234
235<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
236
237<p>Basically XML Catalogs are XML files, you can either use XML tools to
238manage them or use <strong>xmlcatalog</strong> for this. The basic step is
239to create a catalog the -create option provide this facility:</p>
240<pre>orchis:~/XML -&gt; ./xmlcatalog --create tst.xml
241&lt;?xml version="1.0"?&gt;
242&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
243 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
244&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
245orchis:~/XML -&gt; </pre>
246
247<p>By default xmlcatalog does not overwrite the original catalog and save the
248result on the standard output, this can be overrident using the -noout
249option. The <code>-add</code> command allows to add entries in the
250catalog:</p>
251<pre>orchis:~/XML -&gt; ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
252orchis:~/XML -&gt; cat tst.xml
253&lt;?xml version="1.0"?&gt;
254&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
255&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
256&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
257 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
258&lt;/catalog&gt;
259orchis:~/XML -&gt; </pre>
260
261<p>The <code>-add</code> option will always take 3 parameters even if some of
262the XML Catalog constructs (like nextCatalog) will have only a single
263argument, just pass a third empty string, it will be ignored.</p>
264
265<p>Similary the <code>-del</code> option remove matching entries from the
266catalog:</p>
267<pre>orchis:~/XML -&gt; ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
268&lt;?xml version="1.0"?&gt;
269&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
270&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
271orchis:~/XML -&gt; </pre>
272
273<p>The catalog is now empty. Note that the maching of <code>-del</code> is
274exact and would have worked in a similar fashion with the Public ID
275string.</p>
276
Daniel Veillardffb120d2001-08-23 00:52:23 +0000277<p>This is rudimentary but should be sufficient to manage a not too complex
278catalog tree of resources.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000279
280<h2><a name="implemento">The implementor corner quick review of the
281API:</a></h2>
282
Daniel Veillardffb120d2001-08-23 00:52:23 +0000283<p>First an like for every other module of libxml, there is an automatically
284generated <a href="html/libxml-catalog.html">API page for catalog
285support</a>.</p>
286
287<p>The header for the catalog interfaces should be included as:</p>
288<pre>#include &lt;libxml/catalog.h&gt;</pre>
289
290<p>The API is voluntarily kept very simple. First it is not obvious that
291applications really need access to it since it is the default behaviour of
292libxml (Note: it is possible to completely override libxml default catalog by
293using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to
294plug an application specific resolver).</p>
295
296<p>Basically libxml support 2 catalog lists:</p>
297<ul>
298 <li>the default one, global shared by all the application</li>
299 <li>a per-document catalog, this one is built if the document uses the
300 <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is
301 associated to the parser context and destroyed when the parsing context
302 is destroyed.</li>
303</ul>
304
305<p>the document one will be used first if it exists.</p>
306
307<h3>Initialization routines:</h3>
308
309<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be
310used at startup to initialize the catalog, if the catalog should be
311initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs()
312should be called before xmlInitializeCatalog() which would otherwise do a
313default initialization first.</p>
314
315<p>The xmlCatalogAddLocal() call is used by the parser to grow the document
316own catalog list if needed.</p>
317
318<h3>Preferences setup:</h3>
319
320<p>The XML Catalog spec requires the possibility to select default
321preferences between public and system delegation,
322xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
323xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should
324be forbidden, allowed for global catalog, for document catalog or both, the
325default is to allow both.</p>
326
327<p>And of course xmlCatalogSetDebug() allows to generate debug messages
328(through the xmlGenericError() mechanism).</p>
329
330<h3>Querying routines:</h3>
331
332<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()
333and xmlCatalogResolveURI() are relatively explicit if you read the XML
334Catalog specification they correspond to section 7 algorithms, they should
335also work if you have loaded an SGML catalog with a simplified semantic.</p>
336
337<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but
338operate on the document catalog list</p>
339
340<h3>Cleanup and Miscellaneous:</h3>
341
342<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is
343the per-document equivalent.</p>
344
345<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the
346first catalog in the global list, and xmlCatalogDump() allows to dump a
347catalog state, those routines are primarily designed for xmlcatalog, I'm not
348sure that exposing more complex interfaces (like navigation ones) would be
349really useful.</p>
350
Daniel Veillard9f7b84b2001-08-23 15:31:19 +0000351<p>The xmlParseCatalogFile() is a function used to load XML Catalog files,
352it's similar as xmlParseFile() except it bypass all catalog lookups, it's
353provided because this functionality may be useful for client tools.</p>
354
Daniel Veillardffb120d2001-08-23 00:52:23 +0000355<h3>threaded environments:</h3>
356
357<p>Since the catalog tree is built progressively, some care has been taken to
358try to avoid troubles in multithreaded environments but without a
359test-and-set routine accessible from C this can't be fully garanteed, so the
360best is to use xmlGetExternalEntityLoader and set the entity loader routines
361to one of your code doing the synchronization.</p>
362
363<p></p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000364
365<h2><a name="Other">Other resources</a></h2>
366
367<p>The XML Catalog specification is relatively recent so there isn't much
368litterature to point at:</p>
369<ul>
370 <li>You can find an good rant from Norm Walsh about <a
371 href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
372 need for catalogs</a>, it provides a lot of context informations even if
373 I don't agree with everything presented.</li>
374 <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
375 catalog proposal</a> from John Cowan</li>
376 <li>The <a href="http://www.rddl.org/">Resource Directory Description
377 Language</a> (RDDL) another catalog system but more oriented toward
378 providing metadata for XML namespaces.</li>
379 <li>the page from the OASIS Technical <a
380 href="http://www.oasis-open.org/committees/entity/">Committee on Entity
381 Resolution</a> who maintains XML Catalog, you will find pointers to the
382 specification update, some background and pointers to others tools
383 providing XML Catalog support</li>
384</ul>
385
386<p>If you have suggestions for corrections or additions, simply contact
387me:</p>
388
389<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
390
Daniel Veillard9f7b84b2001-08-23 15:31:19 +0000391<p>$Id: catalog.html,v 1.2 2001/08/23 00:52:23 veillard Exp $</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000392</body>
393</html>