blob: 7bc72ee55f6d04784a88ed0140d6aa0c99fbba32 [file] [log] [blame]
Daniel Veillardc9484202001-10-24 12:35:52 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000011A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillardc9484202001-10-24 12:35:52 +000012--></style>
13<title>Entities or no entities</title>
14</head>
15<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>Entities or no entities</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000026<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
Daniel Veillardc9484202001-10-24 12:35:52 +000028<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000031<li><a href="intro.html">Introduction</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000032<li><a href="FAQ.html">FAQ</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000033<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000040<li><a href="architecture.html">libxml architecture</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000041<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000043<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000049<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000051<li><a href="upgrade.html">Upgrading 1.x code</a></li>
Daniel Veillardc9484202001-10-24 12:35:52 +000052<li><a href="DOM.html">DOM Principles</a></li>
53<li><a href="example.html">A real example</a></li>
54<li><a href="contribs.html">Contributions</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000055<li>
56<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
57</li>
Daniel Veillardc9484202001-10-24 12:35:52 +000058</ul></td></tr>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000059</table>
60<table width="100%" border="0" cellspacing="1" cellpadding="3">
61<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
62<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
63<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
64<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
65<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
66<li><a href="ftp://xmlsoft.org/">FTP</a></li>
67<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
68<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li>
Daniel Veillardc6271d22001-10-27 07:50:58 +000069<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
Daniel Veillard594cf0b2001-10-25 08:09:12 +000070</ul></td></tr>
71</table>
72</td></tr></table></td>
Daniel Veillardc9484202001-10-24 12:35:52 +000073<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
74<p>Entities in principle are similar to simple C macros. An entity defines an
75abbreviation for a given string that you can reuse many times throughout the
76content of your document. Entities are especially useful when a given string
77may occur frequently within a document, or to confine the change needed to a
78document to a restricted area in the internal subset of the document (at the
79beginning). Example:</p>
80<pre>1 &lt;?xml version=&quot;1.0&quot;?&gt;
812 &lt;!DOCTYPE EXAMPLE SYSTEM &quot;example.dtd&quot; [
823 &lt;!ENTITY xml &quot;Extensible Markup Language&quot;&gt;
834 ]&gt;
845 &lt;EXAMPLE&gt;
856 &amp;xml;
867 &lt;/EXAMPLE&gt;</pre>
87<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
88its name with '&amp;' and following it by ';' without any spaces added. There
89are 5 predefined entities in libxml allowing you to escape charaters with
90predefined meaning in some parts of the xml document content:
91<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
92for the character '&gt;', <strong>&amp;apos;</strong> for the character ''',
93<strong>&amp;quot;</strong> for the character '&quot;', and
94<strong>&amp;amp;</strong> for the character '&amp;'.</p>
95<p>One of the problems related to entities is that you may want the parser to
96substitute an entity's content so that you can see the replacement text in
97your application. Or you may prefer to keep entity references as such in the
98content to be able to save the document back without losing this usually
99precious information (if the user went through the pain of explicitly
100defining entities, he may have a a rather negative attitude if you blindly
101susbtitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
102function allows you to check and change the behaviour, which is to not
103substitute entities by default.</p>
104<p>Here is the DOM tree built by libxml for the previous document in the
105default case:</p>
106<pre>/gnome/src/gnome-xml -&gt; ./xmllint --debug test/ent1
107DOCUMENT
108version=1.0
109 ELEMENT EXAMPLE
110 TEXT
111 content=
112 ENTITY_REF
113 INTERNAL_GENERAL_ENTITY xml
114 content=Extensible Markup Language
115 TEXT
116 content=</pre>
117<p>And here is the result when substituting entities:</p>
118<pre>/gnome/src/gnome-xml -&gt; ./tester --debug --noent test/ent1
119DOCUMENT
120version=1.0
121 ELEMENT EXAMPLE
122 TEXT
123 content= Extensible Markup Language</pre>
124<p>So, entities or no entities? Basically, it depends on your use case. I
125suggest that you keep the non-substituting default behaviour and avoid using
126entities in your XML document or data if you are not willing to handle the
127entity references elements in the DOM tree.</p>
128<p>Note that at save time libxml enforces the conversion of the predefined
129entities where necessary to prevent well-formedness problems, and will also
130transparently replace those with chars (i.e. it will not generate entity
131reference elements in the DOM tree or call the reference() SAX callback when
132finding them in the input).</p>
133<p>
134<span style="background-color: #FF0000">WARNING</span>: handling entities
135on top of the libxml SAX interface is difficult!!! If you plan to use
136non-predefined entities in your documents, then the learning cuvre to handle
137then using the SAX API may be long. If you plan to use complex documents, I
138strongly suggest you consider using the DOM interface instead and let libxml
139deal with the complexity rather than trying to do it yourself.</p>
140<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
141</td></tr></table></td></tr></table></td></tr></table></td>
142</tr></table></td></tr></table>
143</body>
144</html>