blob: 3117bf4b60d85c203568a30649a02f652934ae4f [file] [log] [blame]
Daniel Veillard1177ca42003-04-26 22:29:54 +00001<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Daniel Veillard373a4752002-02-21 14:46:29 +00004TD {font-family: Verdana,Arial,Helvetica}
5BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
6H1 {font-family: Verdana,Arial,Helvetica}
7H2 {font-family: Verdana,Arial,Helvetica}
8H3 {font-family: Verdana,Arial,Helvetica}
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00009A:link, A:visited, A:active { text-decoration: underline }
Daniel Veillard8a469172003-06-12 16:05:07 +000010</style><title>Entities or no entities</title></head><body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit for Gnome</h1><h2>Entities or no entities</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Developer Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html">Home</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="tutorial/index.html">Tutorial</a></li><li><a href="xmlreader.html">The Reader Interface</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="python.html">Python and bindings</a></li><li><a href="architecture.html">libxml2 architecture</a></li><li><a href="tree.html">The tree output</a></li><li><a href="interface.html">The SAX interface</a></li><li><a href="xmlmem.html">Memory Management</a></li><li><a href="xmlio.html">I/O Interfaces</a></li><li><a href="library.html">The parser interfaces</a></li><li><a href="entities.html">Entities or no entities</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="upgrade.html">Upgrading 1.x code</a></li><li><a href="threads.html">Thread safety</a></li><li><a href="DOM.html">DOM Principles</a></li><li><a href="example.html">A real example</a></li><li><a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>API Indexes</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="APIchunk0.html">Alphabetic</a></li><li><a href="APIconstructors.html">Constructors</a></li><li><a href="APIfunctions.html">Functions/Types</a></li><li><a href="APIfiles.html">Modules</a></li><li><a href="APIsymbols.html">Symbols</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://garypennington.net/libxml2/">Solaris binaries</a></li><li><a href="http://www.zveno.com/open_source/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml&amp;product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>Entities in principle are similar to simple C macros. An entity defines an
Daniel Veillardc9484202001-10-24 12:35:52 +000011abbreviation for a given string that you can reuse many times throughout the
12content of your document. Entities are especially useful when a given string
13may occur frequently within a document, or to confine the change needed to a
14document to a restricted area in the internal subset of the document (at the
Daniel Veillard1177ca42003-04-26 22:29:54 +000015beginning). Example:</p><pre>1 &lt;?xml version=&quot;1.0&quot;?&gt;
Daniel Veillardc9484202001-10-24 12:35:52 +0000162 &lt;!DOCTYPE EXAMPLE SYSTEM &quot;example.dtd&quot; [
173 &lt;!ENTITY xml &quot;Extensible Markup Language&quot;&gt;
184 ]&gt;
195 &lt;EXAMPLE&gt;
206 &amp;xml;
Daniel Veillard1177ca42003-04-26 22:29:54 +0000217 &lt;/EXAMPLE&gt;</pre><p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
Daniel Veillardc9484202001-10-24 12:35:52 +000022its name with '&amp;' and following it by ';' without any spaces added. There
Daniel Veillard8a469172003-06-12 16:05:07 +000023are 5 predefined entities in libxml2 allowing you to escape characters with
Daniel Veillardc9484202001-10-24 12:35:52 +000024predefined meaning in some parts of the xml document content:
25<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
26for the character '&gt;', <strong>&amp;apos;</strong> for the character ''',
27<strong>&amp;quot;</strong> for the character '&quot;', and
Daniel Veillard1177ca42003-04-26 22:29:54 +000028<strong>&amp;amp;</strong> for the character '&amp;'.</p><p>One of the problems related to entities is that you may want the parser to
Daniel Veillardc9484202001-10-24 12:35:52 +000029substitute an entity's content so that you can see the replacement text in
30your application. Or you may prefer to keep entity references as such in the
31content to be able to save the document back without losing this usually
32precious information (if the user went through the pain of explicitly
33defining entities, he may have a a rather negative attitude if you blindly
Daniel Veillard63d83142002-05-20 06:51:05 +000034substitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
Daniel Veillardc9484202001-10-24 12:35:52 +000035function allows you to check and change the behaviour, which is to not
Daniel Veillard8a469172003-06-12 16:05:07 +000036substitute entities by default.</p><p>Here is the DOM tree built by libxml2 for the previous document in the
Daniel Veillard1177ca42003-04-26 22:29:54 +000037default case:</p><pre>/gnome/src/gnome-xml -&gt; ./xmllint --debug test/ent1
Daniel Veillardc9484202001-10-24 12:35:52 +000038DOCUMENT
39version=1.0
40 ELEMENT EXAMPLE
41 TEXT
42 content=
43 ENTITY_REF
44 INTERNAL_GENERAL_ENTITY xml
45 content=Extensible Markup Language
46 TEXT
Daniel Veillard1177ca42003-04-26 22:29:54 +000047 content=</pre><p>And here is the result when substituting entities:</p><pre>/gnome/src/gnome-xml -&gt; ./tester --debug --noent test/ent1
Daniel Veillardc9484202001-10-24 12:35:52 +000048DOCUMENT
49version=1.0
50 ELEMENT EXAMPLE
51 TEXT
Daniel Veillard1177ca42003-04-26 22:29:54 +000052 content= Extensible Markup Language</pre><p>So, entities or no entities? Basically, it depends on your use case. I
Daniel Veillardc9484202001-10-24 12:35:52 +000053suggest that you keep the non-substituting default behaviour and avoid using
54entities in your XML document or data if you are not willing to handle the
Daniel Veillard8a469172003-06-12 16:05:07 +000055entity references elements in the DOM tree.</p><p>Note that at save time libxml2 enforces the conversion of the predefined
Daniel Veillardc9484202001-10-24 12:35:52 +000056entities where necessary to prevent well-formedness problems, and will also
57transparently replace those with chars (i.e. it will not generate entity
58reference elements in the DOM tree or call the reference() SAX callback when
Daniel Veillard1177ca42003-04-26 22:29:54 +000059finding them in the input).</p><p><span style="background-color: #FF0000">WARNING</span>: handling entities
Daniel Veillard8a469172003-06-12 16:05:07 +000060on top of the libxml2 SAX interface is difficult!!! If you plan to use
Daniel Veillard63d83142002-05-20 06:51:05 +000061non-predefined entities in your documents, then the learning curve to handle
Daniel Veillardc9484202001-10-24 12:35:52 +000062then using the SAX API may be long. If you plan to use complex documents, I
63strongly suggest you consider using the DOM interface instead and let libxml
Daniel Veillard1177ca42003-04-26 22:29:54 +000064deal with the complexity rather than trying to do it yourself.</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>