blob: f5ee99d6c3a0a6be50896c0f8a381621ae9a2139 [file] [log] [blame]
Daniel Veillardc9484202001-10-24 12:35:52 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
2<html>
3<head>
4<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
11--></style>
12<title>Entities or no entities</title>
13</head>
14<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
15<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
16<td width="180">
17<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
18</td>
19<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
20<h1>The XML C library for Gnome</h1>
21<h2>Entities or no entities</h2>
22</td></tr></table></td></tr></table></td>
23</tr></table>
24<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
25<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3">
26<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
27<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
28<li><a href="index.html">Home</a></li>
29<li><a href="FAQ.html">FAQ</a></li>
30<li><a href="intro.html">Introduction</a></li>
31<li><a href="docs.html">Documentation</a></li>
32<li><a href="bugs.html">Reporting bugs and getting help</a></li>
33<li><a href="help.html">How to help</a></li>
34<li><a href="downloads.html">Downloads</a></li>
35<li><a href="news.html">News</a></li>
36<li><a href="XML.html">XML</a></li>
37<li><a href="XSLT.html">XSLT</a></li>
38<li><a href="architecture.html">An overview of libxml architecture</a></li>
39<li><a href="tree.html">The tree output</a></li>
40<li><a href="interface.html">The SAX interface</a></li>
41<li><a href="library.html">The XML library interfaces</a></li>
42<li><a href="entities.html">Entities or no entities</a></li>
43<li><a href="namespaces.html">Namespaces</a></li>
44<li><a href="valid.html">Validation, or are you afraid of DTDs ?</a></li>
45<li><a href="DOM.html">DOM Principles</a></li>
46<li><a href="example.html">A real example</a></li>
47<li><a href="contribs.html">Contributions</a></li>
48<li><a href="encoding.html">Encodings support</a></li>
49<li><a href="catalog.html">Catalogs support</a></li>
50<li><a href="xmlio.html">I/O interfaces</a></li>
51<li><a href="xmlmem.html">Memory interfaces</a></li>
52<li><a href="xmldtd.html">DTD support</a></li>
53<li><a href="xml.html">flat page</a></li>
54</ul></td></tr>
55</table></td></tr></table></td>
56<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
57<p>Entities in principle are similar to simple C macros. An entity defines an
58abbreviation for a given string that you can reuse many times throughout the
59content of your document. Entities are especially useful when a given string
60may occur frequently within a document, or to confine the change needed to a
61document to a restricted area in the internal subset of the document (at the
62beginning). Example:</p>
63<pre>1 &lt;?xml version=&quot;1.0&quot;?&gt;
642 &lt;!DOCTYPE EXAMPLE SYSTEM &quot;example.dtd&quot; [
653 &lt;!ENTITY xml &quot;Extensible Markup Language&quot;&gt;
664 ]&gt;
675 &lt;EXAMPLE&gt;
686 &amp;xml;
697 &lt;/EXAMPLE&gt;</pre>
70<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
71its name with '&amp;' and following it by ';' without any spaces added. There
72are 5 predefined entities in libxml allowing you to escape charaters with
73predefined meaning in some parts of the xml document content:
74<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
75for the character '&gt;', <strong>&amp;apos;</strong> for the character ''',
76<strong>&amp;quot;</strong> for the character '&quot;', and
77<strong>&amp;amp;</strong> for the character '&amp;'.</p>
78<p>One of the problems related to entities is that you may want the parser to
79substitute an entity's content so that you can see the replacement text in
80your application. Or you may prefer to keep entity references as such in the
81content to be able to save the document back without losing this usually
82precious information (if the user went through the pain of explicitly
83defining entities, he may have a a rather negative attitude if you blindly
84susbtitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
85function allows you to check and change the behaviour, which is to not
86substitute entities by default.</p>
87<p>Here is the DOM tree built by libxml for the previous document in the
88default case:</p>
89<pre>/gnome/src/gnome-xml -&gt; ./xmllint --debug test/ent1
90DOCUMENT
91version=1.0
92 ELEMENT EXAMPLE
93 TEXT
94 content=
95 ENTITY_REF
96 INTERNAL_GENERAL_ENTITY xml
97 content=Extensible Markup Language
98 TEXT
99 content=</pre>
100<p>And here is the result when substituting entities:</p>
101<pre>/gnome/src/gnome-xml -&gt; ./tester --debug --noent test/ent1
102DOCUMENT
103version=1.0
104 ELEMENT EXAMPLE
105 TEXT
106 content= Extensible Markup Language</pre>
107<p>So, entities or no entities? Basically, it depends on your use case. I
108suggest that you keep the non-substituting default behaviour and avoid using
109entities in your XML document or data if you are not willing to handle the
110entity references elements in the DOM tree.</p>
111<p>Note that at save time libxml enforces the conversion of the predefined
112entities where necessary to prevent well-formedness problems, and will also
113transparently replace those with chars (i.e. it will not generate entity
114reference elements in the DOM tree or call the reference() SAX callback when
115finding them in the input).</p>
116<p>
117<span style="background-color: #FF0000">WARNING</span>: handling entities
118on top of the libxml SAX interface is difficult!!! If you plan to use
119non-predefined entities in your documents, then the learning cuvre to handle
120then using the SAX API may be long. If you plan to use complex documents, I
121strongly suggest you consider using the DOM interface instead and let libxml
122deal with the complexity rather than trying to do it yourself.</p>
123<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
124</td></tr></table></td></tr></table></td></tr></table></td>
125</tr></table></td></tr></table>
126</body>
127</html>