blob: b526c63c0b9fcb35739e4c4ffbc6db2815808ed7 [file] [log] [blame]
Daniel Veillard300f7d62000-11-24 13:04:04 +00001<html>
2<head>
3 <title>Libxml Input/Output handling</title>
4 <meta name="GENERATOR" content="amaya V4.0">
5 <meta http-equiv="Content-Type" content="text/html">
6</head>
7
8<body bgcolor="#ffffff">
9<h1 align="center">Libxml DTD support</h1>
10
11<p>Location: <a
12href="http://xmlsoft.org/xmlio.html">http://xmlsoft.org/xmldtd.html</a></p>
13
14<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
15
16<p>Mailing-list archive: <a
17href="http://xmlsoft.org/messages/">http://xmlsoft.org/messages/</a></p>
18
19<p>Version: $Revision$</p>
20
21<p>Table of Content:</p>
22<ol>
23 <li><a href="#General">General overview</a></li>
24 <li><a href="#definition">The definition</a></li>
25 <li><a href="#Simple">Simple rules</a>
26 <ol>
27 <li><a href="#reference">How to reference a DTD from a document</a></li>
28 <li><a href="#Declaring">Declaring elements</a></li>
29 <li><a href="#Declaring1">Declaring attributes</a></li>
30 </ol>
31 </li>
32 <li><a href="#Some">Some examples</a></li>
33 <li><a href="#validate">How to validate</a></li>
34 <li><a href="#Other">Other resources</a></li>
35</ol>
36
37<h2><a name="General">General overview</a></h2>
38
39<p>DTD is the acronym for Document Type Definition. This is a description of
40the content for a familly of XML files. This is part of the XML 1.0
41specification, and alows to describe and check that a given document instance
42conforms to a set of rules detailing its structure and content. </p>
43
44<h2><a name="definition">The definition</a></h2>
45
46<p>The <a href="http://www.w3.org/TR/REC-xml">W3C XML Recommendation</a> (<a
47href="http://www.xml.com/axml/axml.html">Tim Bray's annotated version of
48Rev1</a>):</p>
49<ul>
50 <li><a href="http://www.w3.org/TR/REC-xml#elemdecls">Declaring
51 elements</a></li>
52 <li><a href="http://www.w3.org/TR/REC-xml#attdecls">Declaring
53 attributes</a></li>
54</ul>
55
56<p>(unfortunately) all this is inherited from the SGML world, the syntax is
57ancient...</p>
58
59<h2><a name="Simple">Simple rules</a></h2>
60
61<p>Writing DTD can be done in multiple ways, the rules to build them if you
62need something fixed or something which can evolve over time can be radically
63different. Really complex DTD like Docbook ones are flexible but quite harder
64to design. I will just focuse on DTDs for a formats with a fixed simple
65structure. It is just a set of basic rules, and definitely not exhaustive nor
66useable for complex DTD design.</p>
67
68<h3><a name="reference">How to reference a DTD from a document</a>:</h3>
69
70<p>Assuming the top element of the document is <code>spec</code> and the dtd
71is placed in the file <code>mydtd</code> in the subdirectory <code>dtds</code>
72of the directory from where the document were loaded:</p>
73
74<p><code>&lt;!DOCTYPE spec SYSTEM "dtds/mydtd"&gt;</code></p>
75
76<p>Notes: </p>
77<ul>
78 <li>the system string is actually an URI-Reference (as defined in RFC 2396)
79 so you can use a full URL string indicating the location of your DTD on
80 the Web, this is a really good thing to do if you want others to validate
81 your document</li>
82 <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
83 magic string) so that the DTd is looked up in catalogs on the client side
84 without having to locate it on the web </li>
85 <li>a dtd contains a set of elements and attributes declarations, but they
86 don't define what the root of the document should be. This is explicitely
87 told to the parser/validator as the first element of the
88 <code>DOCTYPE</code> declaration.</li>
89</ul>
90
91<h3><a name="Declaring">Declaring elements</a>:</h3>
92
93<p>The following declares an element <code>spec</code>:</p>
94
95<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
96
97<p>it also expresses that the spec element contains one front, one body and
98one optionnal back in this order. The declaration of one element of the
99structure and its content are done in a single declaration. Similary the
100following declares <code>div1</code> elements:</p>
101
102<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2*)&gt;</code></p>
103
104<p>means div1 contains one head then a series of optional p, lists and notes
105and then an optional div2. And last but not least an element can contain
106text:</p>
107
108<p><code>&lt;!ELEMENT b (#PCDATA)&gt;</code></p>
109
110<p><code>b</code> contains text or being of mixed content (text and elements
111in no particular order):</p>
112
113<p><code>&lt;!ELEMENT p (#PCDATA|a|ul|b|i|em)*&gt;</code></p>
114
115<p> <code>p </code>can contain text or <code>a</code>, <code>ul</code>,
116<code>b</code>, <code>i </code>or <code>em</code> elements in no particular
117order.</p>
118
119<h3><a name="Declaring1">Declaring attributes</a>:</h3>
120
121<p>again the attributes declaration includes their content definition:</p>
122
123<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
124
125<p>means that the element <code>termdef</code> can have a <code>name</code>
126attribute containing text (<code>CDATA</code>) and which is optionnal
127(<code>#IMPLIED</code>). The attribute value can also be defined within a
128set:</p>
129
130<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
131"ordered"&gt;</code></p>
132
133<p>means <code>list</code> element have a <code>type</code> attribute with 3
134allowed values "bullets", "ordered" or "glossary" and which default to
135"ordered" if the attribute is not explicitely specified. </p>
136
137<p>The content type of an attribute can be text (<code>CDATA</code>),
138anchor/reference/references
139(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
140(<code>ENTITY</code>/<code>ENTITIES</code>) or name(s)
141(<code>NMTOKEN</code>/<code>NMTOKENS</code>). The following defines that a
142<code>chapter</code> element can have an optional <code>id</code> attribute of
143type <code>ID</code>, usable for reference from attribute of type IDREF:</p>
144
145<p><code>&lt;!ATTLIST chapter id ID #IMPLIED&gt;</code></p>
146
147<p>The last value of an attribute definition can be <code>#REQUIRED
148</code>meaning that the attribute has to be given, <code>#IMPLIED</code>
149meaning that it is optional, or the default value (possibly prefixed by
150<code>#FIXED</code> if it is the only allowed).</p>
151
152<h2><a name="Some">Some examples</a></h2>
153
154<p>The directory <code>test/valid/dtds/</code> in the libxml distribution
155contains some complex DTD examples. The <code>test/valid/dia.xml</code>
156example shows an XML file where the simple DTD is directly included within the
157document.</p>
158
159<h2><a name="validate">How to validate</a></h2>
160
161<p>The simplest is to use the xmllint program comming with libxml. The
162<code>--valid</code> option turn on validation of the files given as input,
163for example the following validates a copy of the first revision of the XML
1641.0 specification:</p>
165
166<p><code>xmllint --valid --noout test/valid/REC-xml-19980210.xml</code></p>
167
168<p>the -- noout is used to not output the resulting tree.</p>
169
170<p>The <code>--dtdvalid dtd</code> allows to validate the document(s) against
171a given DTD.</p>
172
173<p>Libxml exports an API to handle DTDs and validation, check the <a
174href="http://xmlsoft.org/html/gnome-xml-valid.html">associated
175description</a>.</p>
176
177<h2><a name="Other">Other resources</a></h2>
178
179<p>DTDs are as old as SGML. So there may be a number of examples on-line, I
180will just list one for now, others pointers welcome:</p>
181<ul>
182 <li><a href="http://www.xml101.com:8081/dtd/">XML-101 DTD</a></li>
183</ul>
184
185<p></p>
186
187<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p>
188
189<p>$Id$</p>
190</body>
191</html>