Blame - doc/guidelines.html - platform/external/libxml2

blob: 08cdf267507b231057a96c4d1a1f84cf8fac7dbc [file] [log] [blame]

Daniel Veillard	8329884	2002-12-28 15:12:33 +0000	[diff] [blame^]	1	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
				2	"http://www.w3.org/TR/html4/loose.dtd">
				3	<html>
				4	<head>
				5	<meta http-equiv="Content-Type" content="text/html">
				6	<style type="text/css"><!--
				7	TD {font-family: Verdana,Arial,Helvetica}
				8	BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
				9	H1 {font-family: Verdana,Arial,Helvetica}
				10	H2 {font-family: Verdana,Arial,Helvetica}
				11	H3 {font-family: Verdana,Arial,Helvetica}
				12	A:link, A:visited, A:active { text-decoration: underline }
				13	--></style>
				14	<title>XML resources publication guidelines</title>
				15	</head>
				16
				17	<body bgcolor="#fffacd" text="#000000">
				18	<h1 align="center">XML resources publication guidelines</h1>
				19
				20	<p></p>
				21
				22	<p>The goal of this document is to provide a set of guidelines and tips
				23	helping the publication and deployment of <a
				24	href="http://www.w3.org.XML/">XML</a> resources for the <a
				25	href="http://www.gnome.org/">GNOME project</a>. However it is not tied to
				26	GNOME and might be helpful more generally, I welcome <a
				27	href="mailto:veillard@redhat.com">feedback</a> on this document.</p>
				28
				29	<p>The intended audience are the software developpers who started using XML
				30	for some of the resources of their project, as a storage format, for data
				31	exchange, checking or transformations. There have been an increasing number
				32	of new XML format defined, but not all steps have been taken, possibly by
				33	lack of documentation, to truely gain all the benefits of the use of XML.
				34	Those guidelines hopes to improve the matter and provide a better overview of
				35	the overall XML processing and associated steps needed deploy it
				36	successfully: </p>
				37
				38	<p>Table of content:</p>
				39	<ol>
				40	<li><a href="#Design">Design guidelines</a></li>
				41	<li><a href="#Canonical">Canonical URL</a></li>
				42	<li><a href="#Catalog">Catalog setup</a></li>
				43	<li><a href="#Package">Package integration</a></li>
				44	</ol>
				45
				46	<h2><a name="Design">Design guidelines</a></h2>
				47
				48	<p>This part intend to focuse on the format itself of XML, those may arrive
				49	a bit too late since the structure of the document may already be cast in
				50	existing and deployed code. Still here are a few rules which might be helpful
				51	when designing a new XML vocabulary or making the revision of an existing
				52	format:</p>
				53
				54	<h3>Reuse existing formats:</h3>
				55
				56	<p>This may sounds a bit simplistic, but before designing your own format,
				57	try to lookup existing XML vocabularies on similar data. Ideally this allows
				58	to reuse them, in which case a lot of the existing tools like DTD, schemas
				59	and stylesheets may already be available. If you are looking at a
				60	documentation format, <a href="http://www.docbook.org/">DocBook</a> should
				61	handle your needs. If reuse is not possible because some semantic or use case
				62	aspects are too differents this will be helpful avoiding design errors like
				63	targetting the vocabulary to the wrong abstraction level. In this format
				64	design phase try to be synthetic and be sure to express the real content of
				65	your data and use the XML structure to express the semantic and context of
				66	those data.</p>
				67
				68	<h3>DTD rules: </h3>
				69
				70	<p>Building a DTD (Document Type Definition) or a Schema describing the
				71	structure allowed by instances is the core of the design process of the
				72	vocabulary. Here are a few tips:</p>
				73	<ul>
				74	<li>use significant words for the element and attributes names</li>
				75	<li>do not use attributes for textual content, attributes will be modified
				76	by the parser before reaching the application</li>
				77	<li>use single elements for every strings which might be subject to
				78	localization, the canonical way to localize XML content is to use
				79	siblings element carrying different xml:lang attributes like in the
				80	following:
				81	<pre><welcome>
				82	<msg xml:lang="en">hello</msg>
				83	<msg xml:lang="fr">bonjour</msg>
				84	</welcome></pre>
				85	</li>
				86	<li>use attribute to refine the content of an element but avoid them for
				87	more complex tasks, attribute parsing is not cheaper than an element and
				88	it is far easier to make an element content more complex while attribute
				89	will have to remain very simple.</li>
				90	</ul>
				91
				92	<h3>Versioning:</h3>
				93
				94	<p>As part of the design, make sure the structure you define will be usable
				95	for future extension that you may not consider for the current version, there
				96	is 2 parts for this:</p>
				97	<ul>
				98	<li>make sure the instance contains a version number which will allow to
				99	make backward compatibility easy, something as simple as having a
				100	<code>version="1.0"</code> on the root document of the instance is
				101	sufficient</li>
				102	<li>while designing the code doing the analysis of the data provided by the
				103	XML parser, make sure you can work with unknown versions, generate a UI
				104	warning and process only the tags recognized by your version but keep in
				105	mind that you should not break on unknown elements if the version
				106	attribute was not in the recognized set.</li>
				107	</ul>
				108
				109	<h3>Other design parts: </h3>
				110
				111	<p>While defining you vocabulary, try to think in term of other usage to your
				112	data, for example how using XSLT stylesheets could be used to make an HTML
				113	view of your data, or to convert it into a different format. Checking XML
				114	Schemas and looking at defining an XML Schemas with a more complete
				115	validation and datatyping of your data structures are important, this helps
				116	avoiding some mistakes in the design phase.</p>
				117
				118	<h3>Namespace:</h3>
				119
				120	<p>If you expect your XML vocabulary to be used or recognized outside of your
				121	application (for example binding a specific processing from a graphic shell
				122	like Nautilus to instance of your data) then you should really define an <a
				123	href="http://www.w3.org/TR/REC-xml-names/">XML namespace</a> for your
				124	vocabulary. A namespace name is an URL (absolute URI more precisely), it is
				125	generally recommended to anchor it as an HTTP resource to a server associated
				126	with the software project, see the next section about this. In practice this
				127	will mean that XML parsers will not handle your element names as-is but as a
				128	couple based on the namespace name and the element name. This allow to
				129	recognize and disambiguate processing. Unicity of the namespace name can be
				130	for the most part garanteed by the use of the DNS registry. Namespace can
				131	also be used to carry versionning informations like:</p>
				132
				133	<p><code>"http://www.gnome.org/project/projectname/1.0/"</code></p>
				134
				135	<p>an an easy way to use them is to make them the default namespace on the
				136	root element of the XML instance like:</p>
				137	<pre><structure xmlns="http://www.gnome.org/project/projectname/1.0/">
				138	<data>
				139	...
				140	</data>
				141	</structure></pre>
				142
				143	<p>In that document, structure and all descendant elements like data are in
				144	the given namespace.</p>
				145
				146	<h2><a name="Canonical">Canonical URL</a></h2>
				147
				148	<p>As seen in the previous namespace section, while XML processing is not
				149	tied to the Web there is a natural synergy between both, XML was designed to
				150	be available on the Web, and keeping the infrastructure that way helps
				151	deploying the XML resources. The core of this issue is the notion of
				152	"Canonical URL" of an XML resource, the resource can be an XML document, a
				153	DTD, a stylesheet, a schemas, or even non-XML data associated to an XML
				154	resource, the canonical URL is the URL where the "master" copy of that
				155	resource is expected to be present on the Web. Usually when processing XML a
				156	copy of the resource will be present on the local disk, maybe in
				157	/usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
				158	(horror !), the key point is that the way to name that resource should be
				159	independant of the actual place where it reside on disk if it is available,
				160	and the fact that the processing will still work if there is no local copy
				161	(and that the machine where the processing is connected to the Internet).</p>
				162
				163	<p>What this really mean is that one should never use the local name of a
				164	resource to reference it but always use the canonical URL. For example in a
				165	DocBook instance the following should not be used:</p>
				166	<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
				167	"/usr/share/xml/docbook/4.2/docbookx.dtd"></pre>
				168
				169	<p>But always reference the canonical URL for the DTD:</p>
				170	<pre><!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"<br>
				171	"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"> </pre>
				172
				173	<p>Similary, the document instance may reference the <a
				174	href="http://www.w3.org/TR/xslt">XSLT</a> stylesheets needed to process it to
				175	generate HTML, and the canonical URL should be used:</p>
				176	<pre><?xml-stylesheet
				177	href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
				178	type="text/xsl"?></pre>
				179
				180	<p>Defining the canonical URL for the resources needed should obbey a few
				181	simple rules similar to those used to design namespace names:</p>
				182	<ul>
				183	<li>use a DNS name you know is associated to the project and will be
				184	available on the long term</li>
				185	<li>whithin that server space, reserve the right to the subtree where you
				186	intend to keep those data</li>
				187	<li>version the URL so that multiple concurent versions of the resources
				188	can be hosted simultaneously</li>
				189	</ul>
				190
				191	<h2><a name="Catalog">Catalog setup</a></h2>
				192
				193	<h3>How catalog works:</h3>
				194
				195	<p>The catalogs are the technical mechanism which allow the XML processing
				196	tools to use a local copy of the resources if it is available even if the
				197	instance document references the canonical URL. <a
				198	href="http://www.oasis-open.org/committees/entity/">XML Catalogs</a> are
				199	anchored in the root catalog (usually <code>/etc/xml/catalog</code> or
				200	defined by the user). They are a tree of XML documents defining the mappings
				201	between the canonical naming space and the local installed ones, this can be
				202	seen as a static cache structure.</p>
				203
				204	<p>When the XML processor is asked to process a resource it will
				205	automatically test for a locally available version in the catalog, starting
				206	from the root catalog, and possibly fetching sub-catalog resources until it
				207	finds that the catalog has that resource or not. If not the default
				208	processing of fetching the resource from the Web is done, allowing in most
				209	case to recover from a catalog miss. The key point is that the document
				210	instances are totally independant of the availability of a catalog or from
				211	the actual place where the loacl resource they reference may be installed.
				212	This greatly improve the management of the document in the long run, making
				213	them independant of the platform or toolchain used to process them.</p>
				214
				215	<h3>Usual catalog setup:</h3>
				216
				217	<p>Usually catalogs for a project are setup as a 2 level hierarchical cache,
				218	the root catalog containing only "delegates" indicating a separate subcatalog
				219	dedicated to the project. The goal is to keep the root catalog clean and
				220	simplify the maintainance of the catalog by using separate catalogs per
				221	project. For example when creating a catalog for the <a
				222	href="http://www.w3.org/TR/xhtml1">XHTML1</a> DTDs, only 3 items are added to
				223	the root catalog:</p>
				224	<pre> <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
				225	catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
				226	<delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
				227	catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
				228	<delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
				229	catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/></pre>
				230
				231	<p>They are all "delegates" meaning that if the catalog system is asked to
				232	resolve a reference corresponding to them, it has to lookup a sub catalog.
				233	Here the subcatalog was installed as
				234	<code>/usr/share/sgml/xhtml1/xmlcatalog</code> in the local tree, that
				235	decision is left to the sysadmin or the packager for that system and may
				236	obbey different rules, but the actual place on the filesystem (or on a
				237	resource cache on the local network) will not influence the processing as
				238	long as it is available. The first rule indicate that if the reference uses a
				239	PUBLIC identifier beginning with the </p>
				240
				241	<p><code>"-//W3C//DTD XHTML 1.0"</code></p>
				242
				243	<p>substring, then the catalog lookup should be limited to the specific given
				244	lookup catalog. Similary the second and third entries indicate those
				245	delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
				246	starts with the <code>"http://www.w3.org/TR/xhtml1/DTD"</code> subscting
				247	which indicates the location on the W3C server where the XHTML1 resources are
				248	stored, those are the beginning of all Canonical URLs for XHTML1 resources.
				249	Those 3 rules are sufficient in practice to capture all references to XHTML1
				250	resources and direct the processing tools to the right subcatalog.</p>
				251
				252	<h3>A subcatalog example:</h3>
				253
				254	<p>Here is the complete subcatalog used for XHTML1:</p>
				255	<pre><?xml version="1.0"?>
				256	<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
				257	"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
				258	<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
				259	<public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
				260	uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/>
				261	<public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
				262	uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/>
				263	<public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
				264	uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/>
				265	<rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
				266	rewritePrefix="xhtml1-20020801/DTD"/>
				267	<rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
				268	rewritePrefix="xhtml1-20020801/DTD"/>
				269	</catalog>
				270	</pre>
				271
				272	<p>There is a few things to notice:</p>
				273	<ul>
				274	<li>this is an XML resource, it points to the DTD using Canonical URLs, the
				275	root element defines a namespace (but based on an URN not an HTTP
				276	URL).</li>
				277	<li>it contains 5 rules, the 3 first ones are direct mapping for the 3
				278	PUBLIC identifiers defined by the XHTML1 specification and associating
				279	them with the local resource containing the DTD, the 2 last ones are
				280	rewrite rules allowing to build the local filename for any URL based on
				281	"http://www.w3.org/TR/xhtml1/DTD", the local cache simplify the rules by
				282	keeping the same structure as the on-line server at the Canonical URL</li>
				283	<li>the local resources are designated using URI references (the uri or
				284	rewritePrefix attributes), the base being the containing sub-catalog URL,
				285	which means that in practice the copy of the XHTML1 strict DTD is stored
				286	locally in
				287	<code>/usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd</code></li>
				288	</ul>
				289
				290	<p>Those 5 rules are sufficient to cover all references to the resources held
				291	at the Canonical URL for the XHTML1 DTDs.</p>
				292
				293	<h2><a name="Package">Package integration</a></h2>
				294
				295	<p>Creating and removing catalogs should be handled as part of the process of
				296	(un)installing the local copy of the resources. The catalog files being XML
				297	resources should be processed with XML based tools to avoid problems with the
				298	generated files, the xmlcatalog command coming with libxml2 allows to create
				299	catalogs, and add or remove rules at that time. Here is a complete example
				300	coming from RPM for the XHTML1 DTDs post install script:</p>
				301	<pre>%post
				302	CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
				303	#
				304	# Register it in the super catalog with the appropriate delegates
				305	#
				306	ROOTCATALOG=/etc/xml/catalog
				307
				308	if [ ! -r $ROOTCATALOG ]
				309	then
				310	/usr/bin/xmlcatalog --noout --create $ROOTCATALOG
				311	fi
				312
				313	if [ -w $ROOTCATALOG ]
				314	then
				315	/usr/bin/xmlcatalog --noout --add "delegatePublic" \
				316	"-//W3C//DTD XHTML 1.0" \
				317	"file://$CATALOG" $ROOTCATALOG
				318	/usr/bin/xmlcatalog --noout --add "delegateSystem" \
				319	"http://www.w3.org/TR/xhtml1/DTD" \
				320	"file://$CATALOG" $ROOTCATALOG
				321	/usr/bin/xmlcatalog --noout --add "delegateURI" \
				322	"http://www.w3.org/TR/xhtml1/DTD" \
				323	"file://$CATALOG" $ROOTCATALOG
				324	fi</pre>
				325
				326	<p>The XHTML1 subcatalog is not created on-the-fly in that case, it is
				327	installed as part of the files of the packages. So the only work needed is to
				328	make sure the root catalog exists and register the delegate rules.</p>
				329
				330	<p>Similary, the script for the post-uninstall just remove the rules from the
				331	catalog:</p>
				332	<pre>%postun
				333	#
				334	# On removal, unregister the xmlcatalog from the supercatalog
				335	#
				336	if [ "$1" = 0 ]; then
				337	CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
				338	ROOTCATALOG=/etc/xml/catalog
				339
				340	if [ -w $ROOTCATALOG ]
				341	then
				342	/usr/bin/xmlcatalog --noout --del \
				343	"-//W3C//DTD XHTML 1.0" $ROOTCATALOG
				344	/usr/bin/xmlcatalog --noout --del \
				345	"http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
				346	/usr/bin/xmlcatalog --noout --del \
				347	"http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
				348	fi
				349	fi</pre>
				350
				351	<p>Note the test against $1, this is needed to not remove the delegate rules
				352	in case of upgrade of the package.</p>
				353
				354	<p>Following the set of guidelines and tips provided in this document should
				355	help deploy the XML resources in the GNOME framework without much pain and
				356	ensure a smooth evolution of the resource and instances.</p>
				357
				358	<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
				359
				360	<p>$Id$</p>
				361
				362	<p> </p>
				363	</body>
				364	</html>