blob: c6c25281e5f70f610a58511b8d34684722bb9b54 [file] [log] [blame]
Daniel Veillard43d3f612001-11-10 11:57:23 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
Daniel Veillard300f7d62000-11-24 13:04:04 +00002<html>
3<head>
Daniel Veillard7216cfd2002-11-08 15:10:00 +00004<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Daniel Veillardc332dab2002-03-29 14:08:27 +00005<link rel="SHORTCUT ICON" href="/favicon.ico">
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00006<style type="text/css"><!--
Daniel Veillard373a4752002-02-21 14:46:29 +00007TD {font-family: Verdana,Arial,Helvetica}
8BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
9H1 {font-family: Verdana,Arial,Helvetica}
10H2 {font-family: Verdana,Arial,Helvetica}
11H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000012A:link, A:visited, A:active { text-decoration: underline }
13--></style>
14<title>Validation &amp; DTDs</title>
Daniel Veillard300f7d62000-11-24 13:04:04 +000015</head>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000016<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
17<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
18<td width="180">
Daniel Veillard8f40f1e2002-08-28 21:18:45 +000019<a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo"></a></div>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000020</td>
21<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
22<h1>The XML C library for Gnome</h1>
23<h2>Validation &amp; DTDs</h2>
24</td></tr></table></td></tr></table></td>
25</tr></table>
26<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
27<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
28<table width="100%" border="0" cellspacing="1" cellpadding="3">
29<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
Daniel Veillard4a603e42003-01-11 14:18:53 +000030<tr><td bgcolor="#fffacd">
31<form action="search.php" enctype="application/x-www-form-urlencoded" method="GET">
32<input name="query" type="TEXT" size="20" value=""><input name="submit" type="submit" value="Search ...">
33</form>
34<ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000035<li><a href="index.html">Home</a></li>
36<li><a href="intro.html">Introduction</a></li>
37<li><a href="FAQ.html">FAQ</a></li>
38<li><a href="docs.html">Documentation</a></li>
39<li><a href="bugs.html">Reporting bugs and getting help</a></li>
40<li><a href="help.html">How to help</a></li>
41<li><a href="downloads.html">Downloads</a></li>
42<li><a href="news.html">News</a></li>
Daniel Veillard7b602b42002-01-08 13:26:00 +000043<li><a href="XMLinfo.html">XML</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000044<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillard6dbcaf82002-02-20 14:37:47 +000045<li><a href="python.html">Python and bindings</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000046<li><a href="architecture.html">libxml architecture</a></li>
47<li><a href="tree.html">The tree output</a></li>
48<li><a href="interface.html">The SAX interface</a></li>
49<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
50<li><a href="xmlmem.html">Memory Management</a></li>
51<li><a href="encoding.html">Encodings support</a></li>
52<li><a href="xmlio.html">I/O Interfaces</a></li>
53<li><a href="catalog.html">Catalog support</a></li>
54<li><a href="library.html">The parser interfaces</a></li>
55<li><a href="entities.html">Entities or no entities</a></li>
56<li><a href="namespaces.html">Namespaces</a></li>
57<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillard52dcab32001-10-30 12:51:17 +000058<li><a href="threads.html">Thread safety</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000059<li><a href="DOM.html">DOM Principles</a></li>
60<li><a href="example.html">A real example</a></li>
61<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard7b4b2f92003-01-06 13:11:20 +000062<li><a href="xmlreader.html">The Reader Interface</a></li>
Daniel Veillardfc59c092002-06-05 14:48:26 +000063<li><a href="tutorial/index.html">Tutorial</a></li>
Daniel Veillard7b4b2f92003-01-06 13:11:20 +000064<li><a href="guidelines.html">XML Guidelines</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000065<li>
66<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
67</li>
Daniel Veillard5ede35e2002-10-01 11:37:35 +000068</ul>
69</td></tr>
Daniel Veillard3bf65be2002-01-23 12:36:34 +000070</table>
71<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000072<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
Daniel Veillard8acca112002-01-21 09:52:27 +000073<tr><td bgcolor="#fffacd"><ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000074<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
75<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
Daniel Veillard4a859202002-01-08 11:49:22 +000076<li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000077<li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000078<li><a href="ftp://xmlsoft.org/">FTP</a></li>
Daniel Veillardc84f8b52002-12-19 22:12:47 +000079<li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li>
Daniel Veillarddb9dfd92001-11-26 17:25:02 +000080<li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
Daniel Veillardcb7543b2002-09-09 10:54:06 +000081<li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li>
Daniel Veillarde6d8e202002-05-02 06:11:10 +000082<li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li>
Daniel Veillard2d347fa2002-03-17 10:34:11 +000083<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000084</ul></td></tr>
85</table>
Daniel Veillard4a603e42003-01-11 14:18:53 +000086<table width="100%" border="0" cellspacing="1" cellpadding="3">
87<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr>
88<tr><td bgcolor="#fffacd"><ul>
89<li><a href="APIchunk0.html">Alphabetic</a></li>
90<li><a href="APIconstructors.html">Constructors</a></li>
91<li><a href="APIfunctions.html">Functions/Types</a></li>
92<li><a href="APIfiles.html">Modules</a></li>
93<li><a href="APIsymbols.html">Symbols</a></li>
94</ul></td></tr>
95</table>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000096</td></tr></table></td>
97<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
Daniel Veillard300f7d62000-11-24 13:04:04 +000098<p>Table of Content:</p>
99<ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000100<li><a href="#General5">General overview</a></li>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000101 <li><a href="#definition">The definition</a></li>
102 <li>
103<a href="#Simple">Simple rules</a>
104 <ol>
Daniel Veillard9c466822001-10-25 12:03:39 +0000105<li><a href="#reference">How to reference a DTD from a document</a></li>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000106 <li><a href="#Declaring">Declaring elements</a></li>
107 <li><a href="#Declaring1">Declaring attributes</a></li>
108 </ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000109</li>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000110 <li><a href="#Some">Some examples</a></li>
111 <li><a href="#validate">How to validate</a></li>
112 <li><a href="#Other">Other resources</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000113</ol>
114<h3><a name="General5">General overview</a></h3>
115<p>Well what is validation and what is a DTD ?</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000116<p>DTD is the acronym for Document Type Definition. This is a description of
Daniel Veillard63d83142002-05-20 06:51:05 +0000117the content for a family of XML files. This is part of the XML 1.0
Daniel Veillard0b28e882002-07-24 23:47:05 +0000118specification, and allows one to describe and verify that a given document
119instance conforms to the set of rules detailing its structure and content.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000120<p>Validation is the process of checking a document against a DTD (more
121generally against a set of construction rules).</p>
122<p>The validation process and building DTDs are the two most difficult parts
Daniel Veillard0b28e882002-07-24 23:47:05 +0000123of the XML life cycle. Briefly a DTD defines all the possible elements to be
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000124found within your document, what is the formal shape of your document tree
Daniel Veillard0b28e882002-07-24 23:47:05 +0000125(by defining the allowed content of an element; either text, a regular
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000126expression for the allowed list of children, or mixed content i.e. both text
Daniel Veillard42766c02002-08-22 20:52:17 +0000127and children). The DTD also defines the valid attributes for all elements and
128the types of those attributes.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000129<h3><a name="definition1">The definition</a></h3>
130<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
Daniel Veillard300f7d62000-11-24 13:04:04 +0000131Rev1</a>):</p>
132<ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000133<li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
Daniel Veillard300f7d62000-11-24 13:04:04 +0000134 elements</a></li>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000135 <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
Daniel Veillard300f7d62000-11-24 13:04:04 +0000136 attributes</a></li>
137</ul>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000138<p>(unfortunately) all this is inherited from the SGML world, the syntax is
139ancient...</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000140<h3><a name="Simple1">Simple rules</a></h3>
Daniel Veillard42766c02002-08-22 20:52:17 +0000141<p>Writing DTDs can be done in many ways. The rules to build them if you need
142something permanent or something which can evolve over time can be radically
143different. Really complex DTDs like DocBook ones are flexible but quite
144harder to design. I will just focus on DTDs for a formats with a fixed simple
Daniel Veillard300f7d62000-11-24 13:04:04 +0000145structure. It is just a set of basic rules, and definitely not exhaustive nor
Daniel Veillard63d83142002-05-20 06:51:05 +0000146usable for complex DTD design.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000147<h4>
148<a name="reference1">How to reference a DTD from a document</a>:</h4>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000149<p>Assuming the top element of the document is <code>spec</code> and the dtd
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000150is placed in the file <code>mydtd</code> in the subdirectory
151<code>dtds</code> of the directory from where the document were loaded:</p>
152<p><code>&lt;!DOCTYPE spec SYSTEM &quot;dtds/mydtd&quot;&gt;</code></p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000153<p>Notes:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000154<ul>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000155<li>The system string is actually an URI-Reference (as defined in <a href="http://www.ietf.org/rfc/rfc2396.txt">RFC 2396</a>) so you can use a
156 full URL string indicating the location of your DTD on the Web. This is a
157 really good thing to do if you want others to validate your document.</li>
158 <li>It is also possible to associate a <code>PUBLIC</code> identifier (a
Daniel Veillard63d83142002-05-20 06:51:05 +0000159 magic string) so that the DTD is looked up in catalogs on the client side
Daniel Veillard0b28e882002-07-24 23:47:05 +0000160 without having to locate it on the web.</li>
161 <li>A DTD contains a set of element and attribute declarations, but they
Daniel Veillard63d83142002-05-20 06:51:05 +0000162 don't define what the root of the document should be. This is explicitly
Daniel Veillard300f7d62000-11-24 13:04:04 +0000163 told to the parser/validator as the first element of the
164 <code>DOCTYPE</code> declaration.</li>
165</ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000166<h4>
167<a name="Declaring2">Declaring elements</a>:</h4>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000168<p>The following declares an element <code>spec</code>:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000169<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000170<p>It also expresses that the spec element contains one <code>front</code>,
Daniel Veillardc0801af2002-05-28 16:28:42 +0000171one <code>body</code> and one optional <code>back</code> children elements in
172this order. The declaration of one element of the structure and its content
173are done in a single declaration. Similarly the following declares
Daniel Veillard64e73902000-11-24 13:28:38 +0000174<code>div1</code> elements:</p>
Daniel Veillard51737272002-01-23 23:10:38 +0000175<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2?)&gt;</code></p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000176<p>which means div1 contains one <code>head</code> then a series of optional
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000177<code>p</code>, <code>list</code>s and <code>note</code>s and then an
178optional <code>div2</code>. And last but not least an element can contain
179text:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000180<p><code>&lt;!ELEMENT b (#PCDATA)&gt;</code></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000181<p>
182<code>b</code> contains text or being of mixed content (text and elements
Daniel Veillard300f7d62000-11-24 13:04:04 +0000183in no particular order):</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000184<p><code>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*&gt;</code></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000185<p>
186<code>p </code>can contain text or <code>a</code>, <code>ul</code>,
Daniel Veillard300f7d62000-11-24 13:04:04 +0000187<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
188order.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000189<h4>
190<a name="Declaring1">Declaring attributes</a>:</h4>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000191<p>Again the attributes declaration includes their content definition:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000192<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000193<p>means that the element <code>termdef</code> can have a <code>name</code>
Daniel Veillard63d83142002-05-20 06:51:05 +0000194attribute containing text (<code>CDATA</code>) and which is optional
Daniel Veillard300f7d62000-11-24 13:04:04 +0000195(<code>#IMPLIED</code>). The attribute value can also be defined within a
196set:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000197<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000198&quot;ordered&quot;&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000199<p>means <code>list</code> element have a <code>type</code> attribute with 3
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000200allowed values &quot;bullets&quot;, &quot;ordered&quot; or &quot;glossary&quot; and which default to
Daniel Veillard63d83142002-05-20 06:51:05 +0000201&quot;ordered&quot; if the attribute is not explicitly specified.</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000202<p>The content type of an attribute can be text (<code>CDATA</code>),
203anchor/reference/references
204(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
205(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
206(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000207<code>chapter</code> element can have an optional <code>id</code> attribute
208of type <code>ID</code>, usable for reference from attribute of type
209IDREF:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000210<p><code>&lt;!ATTLIST chapter id ID #IMPLIED&gt;</code></p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000211<p>The last value of an attribute definition can be <code>#REQUIRED
212</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
213meaning that it is optional, or the default value (possibly prefixed by
214<code>#FIXED</code> if it is the only allowed).</p>
Daniel Veillard64e73902000-11-24 13:28:38 +0000215<p>Notes:</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000216<ul>
217<li>Usually the attributes pertaining to a given element are declared in a
Daniel Veillard64e73902000-11-24 13:28:38 +0000218 single expression, but it is just a convention adopted by a lot of DTD
219 writers:
220 <pre>&lt;!ATTLIST termdef
221 id ID #REQUIRED
222 name CDATA #IMPLIED&gt;</pre>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000223 <p>The previous construct defines both <code>id</code> and
224 <code>name</code> attributes for the element <code>termdef</code>.</p>
225 </li>
226</ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000227<h3><a name="Some1">Some examples</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000228<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
Daniel Veillard42766c02002-08-22 20:52:17 +0000229contains some complex DTD examples. The example in the file
230<code>test/valid/dia.xml</code> shows an XML file where the simple DTD is
231directly included within the document.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000232<h3><a name="validate1">How to validate</a></h3>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000233<p>The simplest way is to use the xmllint program included with libxml. The
234<code>--valid</code> option turns-on validation of the files given as input.
235For example the following validates a copy of the first revision of the XML
Daniel Veillard300f7d62000-11-24 13:04:04 +00002361.0 specification:</p>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000237<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000238<p>the -- noout is used to disable output of the resulting tree.</p>
Daniel Veillard42766c02002-08-22 20:52:17 +0000239<p>The <code>--dtdvalid dtd</code> allows validation of the document(s)
240against a given DTD.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000241<p>Libxml exports an API to handle DTDs and validation, check the <a href="http://xmlsoft.org/html/libxml-valid.html">associated
Daniel Veillard300f7d62000-11-24 13:04:04 +0000242description</a>.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000243<h3><a name="Other1">Other resources</a></h3>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000244<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
245will just list one for now, others pointers welcome:</p>
Daniel Veillard0b28e882002-07-24 23:47:05 +0000246<ul>
247<li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li>
248</ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000249<p>I suggest looking at the examples found under test/valid/dtd and any of
250the large number of books available on XML. The dia example in test/valid
251should be both simple and complete enough to allow you to build your own.</p>
Daniel Veillard154877e2003-01-30 12:17:05 +0000252<p>
Daniel Veillard3f4c40f2002-02-13 09:19:28 +0000253<p><a href="bugs.html">Daniel Veillard</a></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000254</td></tr></table></td></tr></table></td></tr></table></td>
255</tr></table></td></tr></table>
Daniel Veillard300f7d62000-11-24 13:04:04 +0000256</body>
257</html>