blob: 635dfc4b8f404be771bf840904774a6d8ebe0ac7 [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillard300f7d62000-11-24 13:04:04 +00002<html>
3<head>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00004<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
11A:link, A:visited, A:active { text-decoration: underline }
12--></style>
13<title>Validation &amp; DTDs</title>
Daniel Veillard300f7d62000-11-24 13:04:04 +000014</head>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000015<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>Validation &amp; DTDs</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
31<li><a href="intro.html">Introduction</a></li>
32<li><a href="FAQ.html">FAQ</a></li>
33<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
40<li><a href="architecture.html">libxml architecture</a></li>
41<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
43<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
49<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
51<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000052<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000053<li><a href="DOM.html">DOM Principles</a></li>
54<li><a href="example.html">A real example</a></li>
55<li><a href="contribs.html">Contributions</a></li>
56<li>
57<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
58</li>
59</ul></td></tr>
60</table>
61<table width="100%" border="0" cellspacing="1" cellpadding="3">
62<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
63<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
64<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
65<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
66<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
67<li><a href="ftp://xmlsoft.org/">FTP</a></li>
68<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000069<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillardc6271d22001-10-27 07:50:58 +000070<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000071</ul></td></tr>
72</table>
73</td></tr></table></td>
74<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
Daniel Veillard300f7d62000-11-24 13:04:04 +000075<p>Table of Content:</p>
76<ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000077<li><a href="#General5">General overview</a></li>
78<li><a href="#definition">The definition</a></li>
79<li>
80<a href="#Simple">Simple rules</a><ol>
Daniel Veillard9c466822001-10-25 12:03:39 +000081<li><a href="#reference">How to reference a DTD from a document</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000082<li><a href="#Declaring">Declaring elements</a></li>
83<li><a href="#Declaring1">Declaring attributes</a></li>
Daniel Veillard300f7d62000-11-24 13:04:04 +000084</ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000085</li>
86<li><a href="#Some">Some examples</a></li>
87<li><a href="#validate">How to validate</a></li>
88<li><a href="#Other">Other resources</a></li>
89</ol>
90<h3><a name="General5">General overview</a></h3>
91<p>Well what is validation and what is a DTD ?</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +000092<p>DTD is the acronym for Document Type Definition. This is a description of
93the content for a familly of XML files. This is part of the XML 1.0
94specification, and alows to describe and check that a given document instance
Daniel Veillard64e73902000-11-24 13:28:38 +000095conforms to a set of rules detailing its structure and content.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000096<p>Validation is the process of checking a document against a DTD (more
97generally against a set of construction rules).</p>
98<p>The validation process and building DTDs are the two most difficult parts
99of the XML life cycle. Briefly a DTD defines all the possibles element to be
100found within your document, what is the formal shape of your document tree
101(by defining the allowed content of an element, either text, a regular
102expression for the allowed list of children, or mixed content i.e. both text
103and children). The DTD also defines the allowed attributes for all elements
104and the types of the attributes.</p>
105<h3><a name="definition1">The definition</a></h3>
106<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
Daniel Veillard300f7d62000-11-24 13:04:04 +0000107Rev1</a>):</p>
108<ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000109<li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
Daniel Veillard300f7d62000-11-24 13:04:04 +0000110 elements</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000111<li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
Daniel Veillard300f7d62000-11-24 13:04:04 +0000112 attributes</a></li>
113</ul>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000114<p>(unfortunately) all this is inherited from the SGML world, the syntax is
115ancient...</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000116<h3><a name="Simple1">Simple rules</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000117<p>Writing DTD can be done in multiple ways, the rules to build them if you
118need something fixed or something which can evolve over time can be radically
119different. Really complex DTD like Docbook ones are flexible but quite harder
120to design. I will just focuse on DTDs for a formats with a fixed simple
121structure. It is just a set of basic rules, and definitely not exhaustive nor
122useable for complex DTD design.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000123<h4>
124<a name="reference1">How to reference a DTD from a document</a>:</h4>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000125<p>Assuming the top element of the document is <code>spec</code> and the dtd
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000126is placed in the file <code>mydtd</code> in the subdirectory
127<code>dtds</code> of the directory from where the document were loaded:</p>
128<p><code>&lt;!DOCTYPE spec SYSTEM &quot;dtds/mydtd&quot;&gt;</code></p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000129<p>Notes:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000130<ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000131<li>the system string is actually an URI-Reference (as defined in <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a
Daniel Veillard64e73902000-11-24 13:28:38 +0000132 full URL string indicating the location of your DTD on the Web, this is a
133 really good thing to do if you want others to validate your document</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000134<li>it is also possible to associate a <code>PUBLIC</code> identifier (a
Daniel Veillard300f7d62000-11-24 13:04:04 +0000135 magic string) so that the DTd is looked up in catalogs on the client side
Daniel Veillard64e73902000-11-24 13:28:38 +0000136 without having to locate it on the web</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000137<li>a dtd contains a set of elements and attributes declarations, but they
Daniel Veillard300f7d62000-11-24 13:04:04 +0000138 don't define what the root of the document should be. This is explicitely
139 told to the parser/validator as the first element of the
140 <code>DOCTYPE</code> declaration.</li>
141</ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000142<h4>
143<a name="Declaring2">Declaring elements</a>:</h4>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000144<p>The following declares an element <code>spec</code>:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000145<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000146<p>it also expresses that the spec element contains one <code>front</code>,
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000147one <code>body</code> and one optionnal <code>back</code> children elements
148in this order. The declaration of one element of the structure and its
149content are done in a single declaration. Similary the following declares
Daniel Veillard64e73902000-11-24 13:28:38 +0000150<code>div1</code> elements:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000151<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)&gt;</code></p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000152<p>means div1 contains one <code>head</code> then a series of optional
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000153<code>p</code>, <code>list</code>s and <code>note</code>s and then an
154optional <code>div2</code>. And last but not least an element can contain
155text:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000156<p><code>&lt;!ELEMENT b (#PCDATA)&gt;</code></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000157<p>
158<code>b</code> contains text or being of mixed content (text and elements
Daniel Veillard300f7d62000-11-24 13:04:04 +0000159in no particular order):</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000160<p><code>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*&gt;</code></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000161<p>
162<code>p </code>can contain text or <code>a</code>, <code>ul</code>,
Daniel Veillard300f7d62000-11-24 13:04:04 +0000163<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
164order.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000165<h4>
166<a name="Declaring1">Declaring attributes</a>:</h4>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000167<p>again the attributes declaration includes their content definition:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000168<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000169<p>means that the element <code>termdef</code> can have a <code>name</code>
170attribute containing text (<code>CDATA</code>) and which is optionnal
171(<code>#IMPLIED</code>). The attribute value can also be defined within a
172set:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000173<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000174&quot;ordered&quot;&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000175<p>means <code>list</code> element have a <code>type</code> attribute with 3
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000176allowed values &quot;bullets&quot;, &quot;ordered&quot; or &quot;glossary&quot; and which default to
177&quot;ordered&quot; if the attribute is not explicitely specified.</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000178<p>The content type of an attribute can be text (<code>CDATA</code>),
179anchor/reference/references
180(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
181(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
182(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000183<code>chapter</code> element can have an optional <code>id</code> attribute
184of type <code>ID</code>, usable for reference from attribute of type
185IDREF:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000186<p><code>&lt;!ATTLIST chapter id ID #IMPLIED&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000187<p>The last value of an attribute definition can be <code>#REQUIRED
188</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
189meaning that it is optional, or the default value (possibly prefixed by
190<code>#FIXED</code> if it is the only allowed).</p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000191<p>Notes:</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000192<ul><li>usually the attributes pertaining to a given element are declared in a
Daniel Veillard64e73902000-11-24 13:28:38 +0000193 single expression, but it is just a convention adopted by a lot of DTD
194 writers:
195 <pre>&lt;!ATTLIST termdef
196 id ID #REQUIRED
197 name CDATA #IMPLIED&gt;</pre>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000198<p>The previous construct defines both <code>id</code> and
199 <code>name</code> attributes for the element <code>termdef</code>
200</p>
201</li></ul>
202<h3><a name="Some1">Some examples</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000203<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
204contains some complex DTD examples. The <code>test/valid/dia.xml</code>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000205example shows an XML file where the simple DTD is directly included within
206the document.</p>
207<h3><a name="validate1">How to validate</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000208<p>The simplest is to use the xmllint program comming with libxml. The
209<code>--valid</code> option turn on validation of the files given as input,
210for example the following validates a copy of the first revision of the XML
2111.0 specification:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000212<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000213<p>the -- noout is used to not output the resulting tree.</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000214<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against
215a given DTD.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000216<p>Libxml exports an API to handle DTDs and validation, check the <a href="http://xmlsoft.org/html/libxml-valid.html">associated
Daniel Veillard300f7d62000-11-24 13:04:04 +0000217description</a>.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000218<h3><a name="Other1">Other resources</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000219<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
220will just list one for now, others pointers welcome:</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000221<ul><li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li></ul>
222<p>I suggest looking at the examples found under test/valid/dtd and any of
223the large number of books available on XML. The dia example in test/valid
224should be both simple and complete enough to allow you to build your own.</p>
225<p>
Daniel Veillardc5d64342001-06-24 12:13:24 +0000226<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000227</td></tr></table></td></tr></table></td></tr></table></td>
228</tr></table></td></tr></table>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000229</body>
230</html>