blob: a93d2f28d342def3651ba38d79a8a654276d0c9b [file] [log] [blame]
Daniel Veillarde7ead2d2001-08-22 23:44:09 +00001<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5 <title>Libxml Catalog support</title>
6 <meta name="GENERATOR" content="amaya V5.0">
7 <meta http-equiv="Content-Type" content="text/html">
8</head>
9
10<body bgcolor="#ffffff">
11<h1 align="center">Libxml Catalog support</h1>
12
13<p>Location: <a
14href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
15
16<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
17
18<p>Mailing-list archive: <a
19href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
20
21<p>Version: $Revision:$</p>
22
23<p>Table of Content:</p>
24<ol>
25 <li><a href="#General">General overview</a></li>
26 <li><a href="#definition">The definition</a></li>
27 <li><a href="#Simple">Using catalogs</a></li>
28 <li><a href="#Some">Some examples</a></li>
29 <li><a href="#reference">How to tune catalog usage</a></li>
30 <li><a href="#validate">How to debug catalog processing</a></li>
31 <li><a href="#Declaring">How to create and maintain catalogs</a></li>
32 <li><a href="#implemento">The implementor corner quick review of the
33 API</a></li>
34 <li><a href="#Other">Other resources</a></li>
35</ol>
36
37<h2><a name="General">General overview</a></h2>
38
39<p>What is a catalog ? Basically it's a lookup mechanism which is used when
40an entity (a file or a remote resource) reference another entity. The catalog
41lookup is inserted between the moment the reference is recognized by the
42software (XML parser, stylesheet processing, or even images referenced for
43inclusion in a rendering) and the time where loading that resource is
44actually started. </p>
45
46<p>It is basically used for 3 things:</p>
47<ul>
48 <li>mapping from "logical" names, the public identifiers and a more
49 concrete name usable for download (and URI). For example it can associate
50 the logical name
51 <p>"-//OASIS//DTD DocBook XML V4.1.2//EN" </p>
52 <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
53 downloaded</p>
54 <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd </p>
55 </li>
56 <li>remapping from a given URL to another one, like an HTTP indirection
57 saying that
58 <p>"http://www.oasis-open.org/committes/tr.xsl"</p>
59 <p>should really be looked at</p>
60 <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"
61 </p>
62 </li>
63 <li>providing a local cache mechanism allowing to load the entities
64 associated to public identifiers or remote resources, this is a really
65 important feature for any significant deployment of XML or SGML since it
66 allows to avoid the aleas and delays associated to fetching remore
67 resources.</li>
68</ul>
69
70<h2><a name="definition">The definitions</a></h2>
71
72<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
73<ul>
74 <li>the older SGML catalogs, the official spec is SGML Open Technical
75 Resolution TR9401:1997, but is better understood by reading <a
76 href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
77 James Clark. This is relatively old and not the preferred mode of
78 operation of libxml.</li>
79 <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
80 Catalogs</a> is far more flexible, more recent, uses an XML syntax and
81 should scale quite better. This is the default option of libxml.</li>
82</ul>
83
84<p></p>
85
86<h2><a name="Simple">Using catalog</a></h2>
87
88<p>In a normal environment libxml will by default check the presence of a
89catalog in /etc/xml/catalog, and assuming it has been correctly populated,
90the processing is completely transparent to the document user. To take a
91concrete example, suppose you are authoring a DocBook document, this one
92starts with the following DOCTYPE definition:</p>
93<pre>&lt;?xml version='1.0'?&gt;
94&lt;!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
95 "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"&gt;
96
97</pre>
98
99<p>When validating the document with libxml, the catalog will be
100automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
101DocBk XML V3.1.4//EN" and the system identifier
102"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
103been installed on your system and the catalogs actually point to them, libxml
104will fetch them from the local disk.</p>
105
106<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
107DOCTYPE example it's a really old version, but is fine as an example.</p>
108
109<p>Libxml will check the catalog each time that it is requested to load an
110entity, this include DTD, external parsed entities, stylesheets, etc ... If
111your system is correctly configured all the authoring phase and processing
112should use only local files, even if your document stay portable because it
113uses the canonical public and system ID, referencing the remote document.</p>
114
115<h2><a name="Some">Some examples:</a></h2>
116
117<p>Here is a couple of fragments from XML Catalogs used in libxml early
118regression tests in <code>test/catalogs</code> :</p>
119<pre>&lt;?xml version="1.0"?&gt;
120&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
121 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
122&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
123 &lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
124 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
125...</pre>
126
127<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
128written in XML, there is a specific namespace for catalog elements
129"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
130catalog is a <code>public</code> mapping it allows to associate a Public
131Identifier with an URI. </p>
132<pre>...
133 &lt;rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
134 rewritePrefix="file:///usr/share/xml/docbook/"/&gt;
135...</pre>
136
137<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
138any URI starting with a given prefix should be looked at another URI
139constructed by replacing the prefix with an new one. In effect this acts like
140a cache system for a full area of the Web. In practice it is extremely useful
141with a file prefix if you have installed a copy of those resources on your
142local system. </p>
143<pre>...
144&lt;delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
145 catalog="file:///usr/share/xml/docbook.xml"/&gt;
146&lt;delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
147 catalog="file:///usr/share/xml/docbook.xml"/&gt;
148&lt;delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
149 catalog="file:///usr/share/xml/docbook.xml"/&gt;
150&lt;delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
151 catalog="file:///usr/share/xml/docbook.xml"/&gt;
152&lt;delegateURI uriStartString="http://www.oasis-open.org/docbook/"
153 catalog="file:///usr/share/xml/docbook.xml"/&gt;
154...</pre>
155
156<p>Delegation is the core features which allows to build a tree of catalogs,
157easier to maintain than a single catalog, based on Public Identifier, System
158Identifier or URI prefixes it instruct the catalog software to lookup entries
159in another resource. This feature allow to build hierarchies of catalogs, the
160set of entries presented should be sufficient to redirect the resolution of
161all DocBook references to the specific catalog in
162<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
163references for DocBook 4.2.1 to a specific catalog installed at the same time
164as the DocBook resources on the local machine.</p>
165
166<h2><a name="reference">How to tune catalog usage:</a></h2>
167
168<p>The user can change the default catalog behaviour by redirecting queries
169to its own set of catalogs, this can be done by setting the
170<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
171empty one should desactivate loading the default
172<code>/etc/xml/catalog</code> default catalog.</p>
173
174<p>@@More options are likely to be provided in the future@@</p>
175
176<h2><a name="validate">How to debug catalog processing:</a></h2>
177
178<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
179make libxml output debugging informations for each catalog operations, for
180example:</p>
181<pre>orchis:~/XML -&gt; xmllint --memory --noout test/ent2
182warning: failed to load external entity "title.xml"
183orchis:~/XML -&gt; export XML_DEBUG_CATALOG=
184orchis:~/XML -&gt; xmllint --memory --noout test/ent2
185Failed to parse catalog /etc/xml/catalog
186Failed to parse catalog /etc/xml/catalog
187warning: failed to load external entity "title.xml"
188Catalogs cleanup
189orchis:~/XML -&gt; </pre>
190
191<p>The test/ent2 references an entity, running the parser from memory makes
192the base URI unavailable and the the "title.xml" entity cannot be loaded.
193Setting up the debug environment variable allows to detect that an attempt is
194made to load the <code>/etc/xml/catalog</code> but since it's not present the
195resolution fails. </p>
196
197<p>But the most advanced way to debug XML catalog processing is to use the
198<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
199catalogs and make resolution queries to see what is going on. This is also
200used for the regression tests:</p>
201<pre>orchis:~/XML -&gt; ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
202http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
203orchis:~/XML -&gt; </pre>
204
205<p>For debugging what is going on, adding one -v flags increase the verbosity
206level to indicate the processing done (adding a second flag also indicate
207what elements are recognized at parsing):</p>
208<pre>orchis:~/XML -&gt; ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
209Parsing catalog test/catalogs/docbook.xml's content
210Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
211http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
212Catalogs cleanup
213orchis:~/XML -&gt; </pre>
214
215<p>A shell interface is also available to debug and process multiple queries
216(and for regression tests):</p>
217<pre>orchis:~/XML -&gt; ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
218&gt; help
219Commands available:
220public PublicID: make a PUBLIC identifier lookup
221system SystemID: make a SYSTEM identifier lookup
222resolve PublicID SystemID: do a full resolver lookup
223add 'type' 'orig' 'replace' : add an entry
224del 'values' : remove values
225dump: print the current catalog state
226debug: increase the verbosity level
227quiet: decrease the verbosity level
228exit: quit the shell
229&gt; public "-//OASIS//DTD DocBook XML V4.1.2//EN"
230http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
231&gt; quit
232orchis:~/XML -&gt; </pre>
233
234<p>This should be sufficient for most debugging purpose, this was actually
235used heavilly to debug the XML Catalog implementation itself.</p>
236
237<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
238
239<p>Basically XML Catalogs are XML files, you can either use XML tools to
240manage them or use <strong>xmlcatalog</strong> for this. The basic step is
241to create a catalog the -create option provide this facility:</p>
242<pre>orchis:~/XML -&gt; ./xmlcatalog --create tst.xml
243&lt;?xml version="1.0"?&gt;
244&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
245 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
246&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
247orchis:~/XML -&gt; </pre>
248
249<p>By default xmlcatalog does not overwrite the original catalog and save the
250result on the standard output, this can be overrident using the -noout
251option. The <code>-add</code> command allows to add entries in the
252catalog:</p>
253<pre>orchis:~/XML -&gt; ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
254orchis:~/XML -&gt; cat tst.xml
255&lt;?xml version="1.0"?&gt;
256&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
257&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
258&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
259 uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
260&lt;/catalog&gt;
261orchis:~/XML -&gt; </pre>
262
263<p>The <code>-add</code> option will always take 3 parameters even if some of
264the XML Catalog constructs (like nextCatalog) will have only a single
265argument, just pass a third empty string, it will be ignored.</p>
266
267<p>Similary the <code>-del</code> option remove matching entries from the
268catalog:</p>
269<pre>orchis:~/XML -&gt; ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
270&lt;?xml version="1.0"?&gt;
271&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
272&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
273orchis:~/XML -&gt; </pre>
274
275<p>The catalog is now empty. Note that the maching of <code>-del</code> is
276exact and would have worked in a similar fashion with the Public ID
277string.</p>
278
279<p> This is rudimentary but should be sufficient to manage a not too complex
280catalog tree of resources. </p>
281
282<h2><a name="implemento">The implementor corner quick review of the
283API:</a></h2>
284
285<p>@@TODO@@</p>
286
287<h2><a name="Other">Other resources</a></h2>
288
289<p>The XML Catalog specification is relatively recent so there isn't much
290litterature to point at:</p>
291<ul>
292 <li>You can find an good rant from Norm Walsh about <a
293 href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
294 need for catalogs</a>, it provides a lot of context informations even if
295 I don't agree with everything presented.</li>
296 <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
297 catalog proposal</a> from John Cowan</li>
298 <li>The <a href="http://www.rddl.org/">Resource Directory Description
299 Language</a> (RDDL) another catalog system but more oriented toward
300 providing metadata for XML namespaces.</li>
301 <li>the page from the OASIS Technical <a
302 href="http://www.oasis-open.org/committees/entity/">Committee on Entity
303 Resolution</a> who maintains XML Catalog, you will find pointers to the
304 specification update, some background and pointers to others tools
305 providing XML Catalog support</li>
306</ul>
307
308<p>If you have suggestions for corrections or additions, simply contact
309me:</p>
310
311<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
312
313<p>$Id:$</p>
314</body>
315</html>