blob: e01b69a6df69ae468e643f985f26a5a8a04738a8 [file] [log] [blame]
Daniel Veillarde7ead2d2001-08-22 23:44:09 +00001<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5 <title>Libxml Catalog support</title>
6 <meta name="GENERATOR" content="amaya V5.0">
7 <meta http-equiv="Content-Type" content="text/html">
8</head>
9
10<body bgcolor="#ffffff">
11<h1 align="center">Libxml Catalog support</h1>
12
13<p>Location: <a
14href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
15
16<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
17
18<p>Mailing-list archive: <a
19href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
20
Daniel Veillardffb120d2001-08-23 00:52:23 +000021<p>Version: $Revision: 1.1 $</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000022
23<p>Table of Content:</p>
24<ol>
25 <li><a href="#General">General overview</a></li>
26 <li><a href="#definition">The definition</a></li>
27 <li><a href="#Simple">Using catalogs</a></li>
28 <li><a href="#Some">Some examples</a></li>
29 <li><a href="#reference">How to tune catalog usage</a></li>
30 <li><a href="#validate">How to debug catalog processing</a></li>
31 <li><a href="#Declaring">How to create and maintain catalogs</a></li>
32 <li><a href="#implemento">The implementor corner quick review of the
33 API</a></li>
34 <li><a href="#Other">Other resources</a></li>
35</ol>
36
37<h2><a name="General">General overview</a></h2>
38
39<p>What is a catalog ? Basically it's a lookup mechanism which is used when
40an entity (a file or a remote resource) reference another entity. The catalog
41lookup is inserted between the moment the reference is recognized by the
42software (XML parser, stylesheet processing, or even images referenced for
43inclusion in a rendering) and the time where loading that resource is
Daniel Veillardffb120d2001-08-23 00:52:23 +000044actually started.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000045
46<p>It is basically used for 3 things:</p>
47<ul>
48 <li>mapping from "logical" names, the public identifiers and a more
49 concrete name usable for download (and URI). For example it can associate
Daniel Veillardffb120d2001-08-23 00:52:23 +000050 the logical name
51 <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000052 <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
53 downloaded</p>
Daniel Veillardffb120d2001-08-23 00:52:23 +000054 <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000055 </li>
56 <li>remapping from a given URL to another one, like an HTTP indirection
57 saying that
58 <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
59 <p>should really be looked at</p>
Daniel Veillardffb120d2001-08-23 00:52:23 +000060 <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000061 </li>
62 <li>providing a local cache mechanism allowing to load the entities
63 associated to public identifiers or remote resources, this is a really
64 important feature for any significant deployment of XML or SGML since it
65 allows to avoid the aleas and delays associated to fetching remore
66 resources.</li>
67</ul>
68
69<h2><a name="definition">The definitions</a></h2>
70
71<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
72<ul>
73 <li>the older SGML catalogs, the official spec is SGML Open Technical
74 Resolution TR9401:1997, but is better understood by reading <a
75 href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
76 James Clark. This is relatively old and not the preferred mode of
77 operation of libxml.</li>
78 <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
Daniel Veillardffb120d2001-08-23 00:52:23 +000079 Catalogs</a>
80 is far more flexible, more recent, uses an XML syntax and should scale
81 quite better. This is the default option of libxml.</li>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000082</ul>
83
84<p></p>
85
86<h2><a name="Simple">Using catalog</a></h2>
87
88<p>In a normal environment libxml will by default check the presence of a
89catalog in /etc/xml/catalog, and assuming it has been correctly populated,
90the processing is completely transparent to the document user. To take a
91concrete example, suppose you are authoring a DocBook document, this one
92starts with the following DOCTYPE definition:</p>
93<pre>&lt;?xml version='1.0'?&gt;
94&lt;!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
95 "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"&gt;
Daniel Veillarde7ead2d2001-08-22 23:44:09 +000096</pre>
97
98<p>When validating the document with libxml, the catalog will be
99automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
100DocBk XML V3.1.4//EN" and the system identifier
101"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
102been installed on your system and the catalogs actually point to them, libxml
103will fetch them from the local disk.</p>
104
105<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
106DOCTYPE example it's a really old version, but is fine as an example.</p>
107
108<p>Libxml will check the catalog each time that it is requested to load an
109entity, this include DTD, external parsed entities, stylesheets, etc ... If
110your system is correctly configured all the authoring phase and processing
111should use only local files, even if your document stay portable because it
112uses the canonical public and system ID, referencing the remote document.</p>
113
114<h2><a name="Some">Some examples:</a></h2>
115
116<p>Here is a couple of fragments from XML Catalogs used in libxml early
117regression tests in <code>test/catalogs</code> :</p>
118<pre>&lt;?xml version="1.0"?&gt;
119&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
120 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
121&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
122 &lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
123 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
124...</pre>
125
126<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
127written in XML, there is a specific namespace for catalog elements
128"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
129catalog is a <code>public</code> mapping it allows to associate a Public
Daniel Veillardffb120d2001-08-23 00:52:23 +0000130Identifier with an URI.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000131<pre>...
132 &lt;rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
133 rewritePrefix="file:///usr/share/xml/docbook/"/&gt;
134...</pre>
135
136<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
137any URI starting with a given prefix should be looked at another URI
138constructed by replacing the prefix with an new one. In effect this acts like
139a cache system for a full area of the Web. In practice it is extremely useful
140with a file prefix if you have installed a copy of those resources on your
Daniel Veillardffb120d2001-08-23 00:52:23 +0000141local system.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000142<pre>...
143&lt;delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
144 catalog="file:///usr/share/xml/docbook.xml"/&gt;
145&lt;delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
146 catalog="file:///usr/share/xml/docbook.xml"/&gt;
147&lt;delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
148 catalog="file:///usr/share/xml/docbook.xml"/&gt;
149&lt;delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
150 catalog="file:///usr/share/xml/docbook.xml"/&gt;
151&lt;delegateURI uriStartString="http://www.oasis-open.org/docbook/"
152 catalog="file:///usr/share/xml/docbook.xml"/&gt;
153...</pre>
154
155<p>Delegation is the core features which allows to build a tree of catalogs,
156easier to maintain than a single catalog, based on Public Identifier, System
157Identifier or URI prefixes it instruct the catalog software to lookup entries
158in another resource. This feature allow to build hierarchies of catalogs, the
159set of entries presented should be sufficient to redirect the resolution of
160all DocBook references to the specific catalog in
161<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
162references for DocBook 4.2.1 to a specific catalog installed at the same time
163as the DocBook resources on the local machine.</p>
164
165<h2><a name="reference">How to tune catalog usage:</a></h2>
166
167<p>The user can change the default catalog behaviour by redirecting queries
168to its own set of catalogs, this can be done by setting the
169<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
170empty one should desactivate loading the default
171<code>/etc/xml/catalog</code> default catalog.</p>
172
173<p>@@More options are likely to be provided in the future@@</p>
174
175<h2><a name="validate">How to debug catalog processing:</a></h2>
176
177<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
178make libxml output debugging informations for each catalog operations, for
179example:</p>
180<pre>orchis:~/XML -&gt; xmllint --memory --noout test/ent2
181warning: failed to load external entity "title.xml"
182orchis:~/XML -&gt; export XML_DEBUG_CATALOG=
183orchis:~/XML -&gt; xmllint --memory --noout test/ent2
184Failed to parse catalog /etc/xml/catalog
185Failed to parse catalog /etc/xml/catalog
186warning: failed to load external entity "title.xml"
187Catalogs cleanup
188orchis:~/XML -&gt; </pre>
189
190<p>The test/ent2 references an entity, running the parser from memory makes
191the base URI unavailable and the the "title.xml" entity cannot be loaded.
192Setting up the debug environment variable allows to detect that an attempt is
193made to load the <code>/etc/xml/catalog</code> but since it's not present the
Daniel Veillardffb120d2001-08-23 00:52:23 +0000194resolution fails.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000195
196<p>But the most advanced way to debug XML catalog processing is to use the
197<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
198catalogs and make resolution queries to see what is going on. This is also
199used for the regression tests:</p>
200<pre>orchis:~/XML -&gt; ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
201http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
202orchis:~/XML -&gt; </pre>
203
204<p>For debugging what is going on, adding one -v flags increase the verbosity
205level to indicate the processing done (adding a second flag also indicate
206what elements are recognized at parsing):</p>
207<pre>orchis:~/XML -&gt; ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
208Parsing catalog test/catalogs/docbook.xml's content
209Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
210http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
211Catalogs cleanup
212orchis:~/XML -&gt; </pre>
213
214<p>A shell interface is also available to debug and process multiple queries
215(and for regression tests):</p>
216<pre>orchis:~/XML -&gt; ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
217&gt; help
218Commands available:
219public PublicID: make a PUBLIC identifier lookup
220system SystemID: make a SYSTEM identifier lookup
221resolve PublicID SystemID: do a full resolver lookup
222add 'type' 'orig' 'replace' : add an entry
223del 'values' : remove values
224dump: print the current catalog state
225debug: increase the verbosity level
226quiet: decrease the verbosity level
227exit: quit the shell
228&gt; public "-//OASIS//DTD DocBook XML V4.1.2//EN"
229http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
230&gt; quit
231orchis:~/XML -&gt; </pre>
232
233<p>This should be sufficient for most debugging purpose, this was actually
234used heavilly to debug the XML Catalog implementation itself.</p>
235
236<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
237
238<p>Basically XML Catalogs are XML files, you can either use XML tools to
239manage them or use <strong>xmlcatalog</strong> for this. The basic step is
240to create a catalog the -create option provide this facility:</p>
241<pre>orchis:~/XML -&gt; ./xmlcatalog --create tst.xml
242&lt;?xml version="1.0"?&gt;
243&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
244 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
245&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
246orchis:~/XML -&gt; </pre>
247
248<p>By default xmlcatalog does not overwrite the original catalog and save the
249result on the standard output, this can be overrident using the -noout
250option. The <code>-add</code> command allows to add entries in the
251catalog:</p>
252<pre>orchis:~/XML -&gt; ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
253orchis:~/XML -&gt; cat tst.xml
254&lt;?xml version="1.0"?&gt;
255&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
256&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
257&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
258 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
259&lt;/catalog&gt;
260orchis:~/XML -&gt; </pre>
261
262<p>The <code>-add</code> option will always take 3 parameters even if some of
263the XML Catalog constructs (like nextCatalog) will have only a single
264argument, just pass a third empty string, it will be ignored.</p>
265
266<p>Similary the <code>-del</code> option remove matching entries from the
267catalog:</p>
268<pre>orchis:~/XML -&gt; ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
269&lt;?xml version="1.0"?&gt;
270&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
271&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
272orchis:~/XML -&gt; </pre>
273
274<p>The catalog is now empty. Note that the maching of <code>-del</code> is
275exact and would have worked in a similar fashion with the Public ID
276string.</p>
277
Daniel Veillardffb120d2001-08-23 00:52:23 +0000278<p>This is rudimentary but should be sufficient to manage a not too complex
279catalog tree of resources.</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000280
281<h2><a name="implemento">The implementor corner quick review of the
282API:</a></h2>
283
Daniel Veillardffb120d2001-08-23 00:52:23 +0000284<p>First an like for every other module of libxml, there is an automatically
285generated <a href="html/libxml-catalog.html">API page for catalog
286support</a>.</p>
287
288<p>The header for the catalog interfaces should be included as:</p>
289<pre>#include &lt;libxml/catalog.h&gt;</pre>
290
291<p>The API is voluntarily kept very simple. First it is not obvious that
292applications really need access to it since it is the default behaviour of
293libxml (Note: it is possible to completely override libxml default catalog by
294using <a href="html/libxml-parser.html">xmlSetExternalEntityLoader</a> to
295plug an application specific resolver).</p>
296
297<p>Basically libxml support 2 catalog lists:</p>
298<ul>
299 <li>the default one, global shared by all the application</li>
300 <li>a per-document catalog, this one is built if the document uses the
301 <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is
302 associated to the parser context and destroyed when the parsing context
303 is destroyed.</li>
304</ul>
305
306<p>the document one will be used first if it exists.</p>
307
308<h3>Initialization routines:</h3>
309
310<p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be
311used at startup to initialize the catalog, if the catalog should be
312initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs()
313should be called before xmlInitializeCatalog() which would otherwise do a
314default initialization first.</p>
315
316<p>The xmlCatalogAddLocal() call is used by the parser to grow the document
317own catalog list if needed.</p>
318
319<h3>Preferences setup:</h3>
320
321<p>The XML Catalog spec requires the possibility to select default
322preferences between public and system delegation,
323xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
324xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should
325be forbidden, allowed for global catalog, for document catalog or both, the
326default is to allow both.</p>
327
328<p>And of course xmlCatalogSetDebug() allows to generate debug messages
329(through the xmlGenericError() mechanism).</p>
330
331<h3>Querying routines:</h3>
332
333<p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()
334and xmlCatalogResolveURI() are relatively explicit if you read the XML
335Catalog specification they correspond to section 7 algorithms, they should
336also work if you have loaded an SGML catalog with a simplified semantic.</p>
337
338<p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but
339operate on the document catalog list</p>
340
341<h3>Cleanup and Miscellaneous:</h3>
342
343<p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is
344the per-document equivalent.</p>
345
346<p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the
347first catalog in the global list, and xmlCatalogDump() allows to dump a
348catalog state, those routines are primarily designed for xmlcatalog, I'm not
349sure that exposing more complex interfaces (like navigation ones) would be
350really useful.</p>
351
352<h3>threaded environments:</h3>
353
354<p>Since the catalog tree is built progressively, some care has been taken to
355try to avoid troubles in multithreaded environments but without a
356test-and-set routine accessible from C this can't be fully garanteed, so the
357best is to use xmlGetExternalEntityLoader and set the entity loader routines
358to one of your code doing the synchronization.</p>
359
360<p></p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000361
362<h2><a name="Other">Other resources</a></h2>
363
364<p>The XML Catalog specification is relatively recent so there isn't much
365litterature to point at:</p>
366<ul>
367 <li>You can find an good rant from Norm Walsh about <a
368 href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
369 need for catalogs</a>, it provides a lot of context informations even if
370 I don't agree with everything presented.</li>
371 <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
372 catalog proposal</a> from John Cowan</li>
373 <li>The <a href="http://www.rddl.org/">Resource Directory Description
374 Language</a> (RDDL) another catalog system but more oriented toward
375 providing metadata for XML namespaces.</li>
376 <li>the page from the OASIS Technical <a
377 href="http://www.oasis-open.org/committees/entity/">Committee on Entity
378 Resolution</a> who maintains XML Catalog, you will find pointers to the
379 specification update, some background and pointers to others tools
380 providing XML Catalog support</li>
381</ul>
382
383<p>If you have suggestions for corrections or additions, simply contact
384me:</p>
385
386<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
387
Daniel Veillardffb120d2001-08-23 00:52:23 +0000388<p>$Id: catalog.html,v 1.1 2001/08/22 23:44:08 veillard Exp $</p>
Daniel Veillarde7ead2d2001-08-22 23:44:09 +0000389</body>
390</html>