preparing release of 2.6.26 Daniel * configure.ini NEWS doc//* libxml.spec.in : preparing release of 2.6.26 Daniel

commit: fabafd54c7d7aed4f68be56d196b2f965eab7a9d [log] [tgz]
author: Daniel Veillard <veillard@src.gnome.org> Thu Jun 08 08:16:33 2006 +0000
committer: Daniel Veillard <veillard@src.gnome.org> Thu Jun 08 08:16:33 2006 +0000
tree: 887717d1e0384a48d71e631a8ca4c24612f108f7
parent: 7cb3fa9d51a0a57d1e538854e4be84db82b75098 [diff] [blame]
diff --git a/doc/encoding.html b/doc/encoding.html
index 4b6d0d4..8db787e 100644
--- a/doc/encoding.html
+++ b/doc/encoding.html

@@ -7,44 +7,44 @@
 H2 {font-family: Verdana,Arial,Helvetica}
 H3 {font-family: Verdana,Arial,Helvetica}
 A:link, A:visited, A:active { text-decoration: underline }
-</style><title>Encodings support</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>Encodings support</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html">Home</a></li><li><a href="html/index.html">Reference Manual</a></li><li><a href="intro.html">Introduction</a></li><li><a href="FAQ.html">FAQ</a></li><li><a href="docs.html" style="font-weight:bold">Developer Menu</a></li><li><a href="bugs.html">Reporting bugs and getting help</a></li><li><a href="help.html">How to help</a></li><li><a href="downloads.html">Downloads</a></li><li><a href="news.html">Releases</a></li><li><a href="XMLinfo.html">XML</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="xmldtd.html">Validation &amp; DTDs</a></li><li><a href="encoding.html">Encodings support</a></li><li><a href="catalog.html">Catalog support</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="contribs.html">Contributions</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="html/index.html" style="font-weight:bold">API Menu</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="ChangeLog.html">Recent Changes</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://www.blastwave.org/packages.php/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>If you are not really familiar with Internationalization (usual shortcutis
-I18N) , Unicode, characters and glyphs, I suggest you read a <a href="http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode">presentation</a>by
-Tim Bray on Unicode and why you should care about it.</p><p>If you don't understand why <b>it does not make sense to have a
-stringwithout knowing what encoding it uses</b>, then as Joel Spolsky said <a href="http://www.joelonsoftware.com/articles/Unicode.html">please do notwrite
-another line of code until you finish reading that article.</a>. It isa
-prerequisite to understand this page, and avoid a lot of problems
-withlibxml2, XML or text processing in general.</p><p>Table of Content:</p><ol><li><a href="encoding.html#What">What does internationalization supportmean
-    ?</a></li>
-  <li><a href="encoding.html#internal">The internal encoding, how
-  andwhy</a></li>
+</style><title>Encodings support</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000"><table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr><td width="120"><a href="http://swpat.ffii.org/"><img src="epatents.png" alt="Action against software patents" /></a></td><td width="180"><a href="http://www.gnome.org/"><img src="gnome2.png" alt="Gnome2 Logo" /></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo" /></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo" /></a><div align="left"><a href="http://xmlsoft.org/"><img src="Libxml2-Logo-180x168.gif" alt="Made with Libxml2 Logo" /></a></div></td><td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"><h1>The XML C parser and toolkit of Gnome</h1><h2>Encodings support</h2></td></tr></table></td></tr></table></td></tr></table><table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr><td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr><tr><td bgcolor="#fffacd"><form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form><ul><li><a href="index.html">Home</a></li><li><a href="html/index.html">Reference Manual</a></li><li><a href="intro.html">Introduction</a></li><li><a href="FAQ.html">FAQ</a></li><li><a href="docs.html" style="font-weight:bold">Developer Menu</a></li><li><a href="bugs.html">Reporting bugs and getting help</a></li><li><a href="help.html">How to help</a></li><li><a href="downloads.html">Downloads</a></li><li><a href="news.html">Releases</a></li><li><a href="XMLinfo.html">XML</a></li><li><a href="XSLT.html">XSLT</a></li><li><a href="xmldtd.html">Validation &amp; DTDs</a></li><li><a href="encoding.html">Encodings support</a></li><li><a href="catalog.html">Catalog support</a></li><li><a href="namespaces.html">Namespaces</a></li><li><a href="contribs.html">Contributions</a></li><li><a href="examples/index.html" style="font-weight:bold">Code Examples</a></li><li><a href="html/index.html" style="font-weight:bold">API Menu</a></li><li><a href="guidelines.html">XML Guidelines</a></li><li><a href="ChangeLog.html">Recent Changes</a></li></ul></td></tr></table><table width="100%" border="0" cellspacing="1" cellpadding="3"><tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr><tr><td bgcolor="#fffacd"><ul><li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li><li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li><li><a href="http://phd.cs.unibo.it/gdome2/">DOM gdome2</a></li><li><a href="http://www.aleksey.com/xmlsec/">XML-DSig xmlsec</a></li><li><a href="ftp://xmlsoft.org/">FTP</a></li><li><a href="http://www.zlatkovic.com/projects/libxml/">Windows binaries</a></li><li><a href="http://www.blastwave.org/packages.php/libxml2">Solaris binaries</a></li><li><a href="http://www.explain.com.au/oss/libxml2xslt.html">MacOsX binaries</a></li><li><a href="http://libxmlplusplus.sourceforge.net/">C++ bindings</a></li><li><a href="http://www.zend.com/php5/articles/php5-xmlphp.php#Heading4">PHP bindings</a></li><li><a href="http://sourceforge.net/projects/libxml2-pas/">Pascal bindings</a></li><li><a href="http://libxml.rubyforge.org/">Ruby bindings</a></li><li><a href="http://tclxml.sourceforge.net/">Tcl bindings</a></li><li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml2">Bug Tracker</a></li></ul></td></tr></table></td></tr></table></td><td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"><p>If you are not really familiar with Internationalization (usual
+shortcutisI18N) , Unicode, characters and glyphs, I suggest you read a <a href="http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode">presentation</a>byTim
+Bray on Unicode and why you should care about it.</p><p>If you don't understand why <b>it does not make sense to have
+astringwithout knowing what encoding it uses</b>, then as Joel Spolsky said
+<a href="http://www.joelonsoftware.com/articles/Unicode.html">please do
+notwriteanother line of code until you finish reading that article.</a>. It
+isaprerequisite to understand this page, and avoid a lot of
+problemswithlibxml2, XML or text processing in general.</p><p>Table of Content:</p><ol><li><a href="encoding.html#What">What does internationalization
+    supportmean?</a></li>
+  <li><a href="encoding.html#internal">The internal encoding,
+  howandwhy</a></li>
   <li><a href="encoding.html#implemente">How is it implemented ?</a></li>
   <li><a href="encoding.html#Default">Default supported encodings</a></li>
-  <li><a href="encoding.html#extend">How to extend the
-  existingsupport</a></li>
-</ol><h3><a name="What" id="What">What does internationalization support mean ?</a></h3><p>XML was designed from the start to allow the support of any character
-setby using Unicode. Any conformant XML parser has to support the UTF-8
-andUTF-16 default encodings which can both express the full unicode ranges.
-UTF8is a variable length encoding whose greatest points are to reuse the
-sameencoding for ASCII and to save space for Western encodings, but it is a
-bitmore complex to handle in practice. UTF-16 use 2 bytes per character
-(andsometimes combines two pairs), it makes implementation easier, but looks
-abit overkill for Western languages encoding. Moreover the XML
-specificationallows the document to be encoded in other encodings at the
-condition thatthey are clearly labeled as such. For example the following is
-a wellformedXML document encoded in ISO-8859-1 and using accentuated letters
-that weFrench like for both markup and content:</p><pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
+  <li><a href="encoding.html#extend">How to extend theexistingsupport</a></li>
+</ol><h3><a name="What" id="What">What does internationalization support mean ?</a></h3><p>XML was designed from the start to allow the support of any charactersetby
+using Unicode. Any conformant XML parser has to support the UTF-8andUTF-16
+default encodings which can both express the full unicode ranges.UTF8is a
+variable length encoding whose greatest points are to reuse thesameencoding
+for ASCII and to save space for Western encodings, but it is abitmore complex
+to handle in practice. UTF-16 use 2 bytes per character(andsometimes combines
+two pairs), it makes implementation easier, but looksabit overkill for
+Western languages encoding. Moreover the XMLspecificationallows the document
+to be encoded in other encodings at thecondition thatthey are clearly labeled
+as such. For example the following isa wellformedXML document encoded in
+ISO-8859-1 and using accentuated lettersthat weFrench like for both markup
+and content:</p><pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
 &lt;très&gt;là&lt;/très&gt;</pre><p>Having internationalization support in libxml2 means the following:</p><ul><li>the document is properly parsed</li>
   <li>informations about it's encoding are saved</li>
   <li>it can be modified</li>
   <li>it can be saved in its original encoding</li>
-  <li>it can also be saved in another encoding supported by libxml2
-    (forexample straight UTF8 or even an ASCII form)</li>
-</ul><p>Another very important point is that the whole libxml2 API, with
-theexception of a few routines to read with a specific encoding or save to
-aspecific encoding, is completely agnostic about the original encoding of
-thedocument.</p><p>It should be noted too that the HTML parser embedded in libxml2 now
-obeythe same rules too, the following document will be (as of 2.2.2) handled 
-inan internationalized fashion by libxml2 too:</p><pre>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
+  <li>it can also be saved in another encoding supported by
+    libxml2(forexample straight UTF8 or even an ASCII form)</li>
+</ul><p>Another very important point is that the whole libxml2 API,
+withtheexception of a few routines to read with a specific encoding or save
+toaspecific encoding, is completely agnostic about the original encoding
+ofthedocument.</p><p>It should be noted too that the HTML parser embedded in libxml2 nowobeythe
+same rules too, the following document will be (as of 2.2.2) handledinan
+internationalized fashion by libxml2 too:</p><pre>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
                       "http://www.w3.org/TR/REC-html40/loose.dtd"&gt;
 &lt;html lang="fr"&gt;
 &lt;head&gt;
@@ -52,60 +52,60 @@
 &lt;/head&gt;
 &lt;body&gt;
 &lt;p&gt;W3C crée des standards pour le Web.&lt;/body&gt;
-&lt;/html&gt;</pre><h3><a name="internal" id="internal">The internal encoding, how and why</a></h3><p>One of the core decisions was to force all documents to be converted to
-adefault internal encoding, and that encoding to be UTF-8, here are
-therationales for those choices:</p><ul><li>keeping the native encoding in the internal form would force the
-    libxmlusers (or the code associated) to be fully aware of the encoding of
-    theoriginal document, for examples when adding a text node to a
-    document,the content would have to be provided in the document encoding,
-    i.e. theclient code would have to check it before hand, make sure it's
-    conformantto the encoding, etc ... Very hard in practice, though in some
-    specificcases this may make sense.</li>
-  <li>the second decision was which encoding. From the XML spec only UTF8
-    andUTF16 really makes sense as being the two only encodings for which
-    thereis mandatory support. UCS-4 (32 bits fixed size encoding) could
-    beconsidered an intelligent choice too since it's a direct Unicode
-    mappingsupport. I selected UTF-8 on the basis of efficiency and
-    compatibilitywith surrounding software:
-    <ul><li>UTF-8 while a bit more complex to convert from/to (i.e.
-        slightlymore costly to import and export CPU wise) is also far more
-        compactthan UTF-16 (and UCS-4) for a majority of the documents I see
-        it usedfor right now (RPM RDF catalogs, advogato data, various
-        configurationfile formats, etc.) and the key point for today's
-        computerarchitecture is efficient uses of caches. If one nearly
-        double thememory requirement to store the same amount of data, this
-        will trashcaches (main memory/external caches/internal caches) and my
-        take isthat this harms the system far more than the CPU requirements
-        neededfor the conversion to UTF-8</li>
-      <li>Most of libxml2 version 1 users were using it with straight
-        ASCIImost of the time, doing the conversion with an internal
-        encodingrequiring all their code to be rewritten was a serious
-        show-stopperfor using UTF-16 or UCS-4.</li>
-      <li>UTF-8 is being used as the de-facto internal encoding standard
-        forrelated code like the <a href="http://www.pango.org/">pango</a>upcoming Gnome text widget, and
-        a lot of Unix code (yet another placewhere Unix programmer base takes
-        a different approach from Microsoft- they are using UTF-16)</li>
+&lt;/html&gt;</pre><h3><a name="internal" id="internal">The internal encoding, how and why</a></h3><p>One of the core decisions was to force all documents to be converted
+toadefault internal encoding, and that encoding to be UTF-8, here
+aretherationales for those choices:</p><ul><li>keeping the native encoding in the internal form would force
+    thelibxmlusers (or the code associated) to be fully aware of the encoding
+    oftheoriginal document, for examples when adding a text node to
+    adocument,the content would have to be provided in the document
+    encoding,i.e. theclient code would have to check it before hand, make
+    sure it'sconformantto the encoding, etc ... Very hard in practice, though
+    in somespecificcases this may make sense.</li>
+  <li>the second decision was which encoding. From the XML spec only
+    UTF8andUTF16 really makes sense as being the two only encodings for
+    whichthereis mandatory support. UCS-4 (32 bits fixed size encoding)
+    couldbeconsidered an intelligent choice too since it's a direct
+    Unicodemappingsupport. I selected UTF-8 on the basis of efficiency
+    andcompatibilitywith surrounding software:
+    <ul><li>UTF-8 while a bit more complex to convert from/to (i.e.slightlymore
+        costly to import and export CPU wise) is also far morecompactthan
+        UTF-16 (and UCS-4) for a majority of the documents I seeit usedfor
+        right now (RPM RDF catalogs, advogato data, variousconfigurationfile
+        formats, etc.) and the key point for today'scomputerarchitecture is
+        efficient uses of caches. If one nearlydouble thememory requirement
+        to store the same amount of data, thiswill trashcaches (main
+        memory/external caches/internal caches) and mytake isthat this harms
+        the system far more than the CPU requirementsneededfor the conversion
+        to UTF-8</li>
+      <li>Most of libxml2 version 1 users were using it with
+        straightASCIImost of the time, doing the conversion with an
+        internalencodingrequiring all their code to be rewritten was a
+        seriousshow-stopperfor using UTF-16 or UCS-4.</li>
+      <li>UTF-8 is being used as the de-facto internal encoding
+        standardforrelated code like the <a href="http://www.pango.org/">pango</a>upcoming Gnome text widget,
+        anda lot of Unix code (yet another placewhere Unix programmer base
+        takesa different approach from Microsoft- they are using UTF-16)</li>
     </ul></li>
-</ul><p>What does this mean in practice for the libxml2 user:</p><ul><li>xmlChar, the libxml2 data type is a byte, those bytes must be
-    assembledas UTF-8 valid strings. The proper way to terminate an xmlChar *
-    stringis simply to append 0 byte, as usual.</li>
-  <li>One just need to make sure that when using chars outside the ASCII
-    set,the values has been properly converted to UTF-8</li>
-</ul><h3><a name="implemente" id="implemente">How is it implemented ?</a></h3><p>Let's describe how all this works within libxml, basically the
-I18N(internationalization) support get triggered only during I/O operation,
-i.e.when reading a document or saving one. Let's look first at the
-readingsequence:</p><ol><li>when a document is processed, we usually don't know the encoding,
-    asimple heuristic allows to detect UTF-16 and UCS-4 from encodings
-    wherethe ASCII range (0-0x7F) maps with ASCII</li>
-  <li>the xml declaration if available is parsed, including the
-    encodingdeclaration. At that point, if the autodetected encoding is
-    differentfrom the one declared a call to xmlSwitchEncoding() is
-  issued.</li>
-  <li>If there is no encoding declaration, then the input has to be in
-    eitherUTF-8 or UTF-16, if it is not then at some point when processing
-    theinput, the converter/checker of UTF-8 form will raise an encoding
-    error.You may end-up with a garbled document, or no document at all !
-    Example:
+</ul><p>What does this mean in practice for the libxml2 user:</p><ul><li>xmlChar, the libxml2 data type is a byte, those bytes must
+    beassembledas UTF-8 valid strings. The proper way to terminate an xmlChar
+    *stringis simply to append 0 byte, as usual.</li>
+  <li>One just need to make sure that when using chars outside the
+    ASCIIset,the values has been properly converted to UTF-8</li>
+</ul><h3><a name="implemente" id="implemente">How is it implemented ?</a></h3><p>Let's describe how all this works within libxml, basically
+theI18N(internationalization) support get triggered only during I/O
+operation,i.e.when reading a document or saving one. Let's look first at
+thereadingsequence:</p><ol><li>when a document is processed, we usually don't know the
+    encoding,asimple heuristic allows to detect UTF-16 and UCS-4 from
+    encodingswherethe ASCII range (0-0x7F) maps with ASCII</li>
+  <li>the xml declaration if available is parsed, including
+    theencodingdeclaration. At that point, if the autodetected encoding
+    isdifferentfrom the one declared a call to xmlSwitchEncoding()
+  isissued.</li>
+  <li>If there is no encoding declaration, then the input has to be
+    ineitherUTF-8 or UTF-16, if it is not then at some point when
+    processingtheinput, the converter/checker of UTF-8 form will raise an
+    encodingerror.You may end-up with a garbled document, or no document at
+    all !Example:
     <pre>~/XML -&gt; ./xmllint err.xml 
 err.xml:1: error: Input is not proper UTF-8, indicate encoding !
 &lt;très&gt;là&lt;/très&gt;
@@ -114,94 +114,93 @@
 &lt;très&gt;là&lt;/très&gt;
    ^</pre>
   </li>
-  <li>xmlSwitchEncoding() does an encoding name lookup, canonicalize it,
-    andthen search the default registered encoding converters for that
-    encoding.If it's not within the default set and iconv() support has been
-    compiledit, it will ask iconv for such an encoder. If this fails then the
-    parserwill report an error and stops processing:
+  <li>xmlSwitchEncoding() does an encoding name lookup, canonicalize
+    it,andthen search the default registered encoding converters for
+    thatencoding.If it's not within the default set and iconv() support has
+    beencompiledit, it will ask iconv for such an encoder. If this fails then
+    theparserwill report an error and stops processing:
     <pre>~/XML -&gt; ./xmllint err2.xml 
 err2.xml:1: error: Unsupported encoding UnsupportedEnc
 &lt;?xml version="1.0" encoding="UnsupportedEnc"?&gt;
                                              ^</pre>
   </li>
-  <li>From that point the encoder processes progressively the input (it
-    isplugged as a front-end to the I/O module) for that entity. It
-    capturesand converts on-the-fly the document to be parsed to UTF-8. The
-    parseritself just does UTF-8 checking of this input and process
-    ittransparently. The only difference is that the encoding information
-    hasbeen added to the parsing context (more precisely to the
-    inputcorresponding to this entity).</li>
-  <li>The result (when using DOM) is an internal form completely in UTF-8with
-    just an encoding information on the document node.</li>
-</ol><p>Ok then what happens when saving the document (assuming youcollected/built
-an xmlDoc DOM like structure) ? It depends on the functioncalled,
-xmlSaveFile() will just try to save in the original encoding,
-whilexmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a
-givenencoding:</p><ol><li>if no encoding is given, libxml2 will look for an encoding
-    valueassociated to the document and if it exists will try to save to
-    thatencoding,
+  <li>From that point the encoder processes progressively the input
+    (itisplugged as a front-end to the I/O module) for that entity.
+    Itcapturesand converts on-the-fly the document to be parsed to UTF-8.
+    Theparseritself just does UTF-8 checking of this input and
+    processittransparently. The only difference is that the encoding
+    informationhasbeen added to the parsing context (more precisely to
+    theinputcorresponding to this entity).</li>
+  <li>The result (when using DOM) is an internal form completely in
+    UTF-8withjust an encoding information on the document node.</li>
+</ol><p>Ok then what happens when saving the document (assuming
+youcollected/builtan xmlDoc DOM like structure) ? It depends on the
+functioncalled,xmlSaveFile() will just try to save in the original
+encoding,whilexmlSaveFileTo() and xmlSaveFileEnc() can optionally save to
+agivenencoding:</p><ol><li>if no encoding is given, libxml2 will look for an
+    encodingvalueassociated to the document and if it exists will try to save
+    tothatencoding,
     <p>otherwise everything is written in the internal form, i.e. UTF-8</p>
   </li>
-  <li>so if an encoding was specified, either at the API level or on
-    thedocument, libxml2 will again canonicalize the encoding name, lookup
-    for aconverter in the registered set or through iconv. If not found
-    thefunction will return an error code</li>
-  <li>the converter is placed before the I/O buffer layer, as another kind
-    ofbuffer, then libxml2 will simply push the UTF-8 serialization to
-    throughthat buffer, which will then progressively be converted and pushed
-    ontothe I/O layer.</li>
-  <li>It is possible that the converter code fails on some input, for
-    exampletrying to push an UTF-8 encoded Chinese character through the
-    UTF-8 toISO-8859-1 converter won't work. Since the encoders are
-    progressive theywill just report the error and the number of bytes
-    converted, at thatpoint libxml2 will decode the offending character,
-    remove it from thebuffer and replace it with the associated charRef
-    encoding &amp;#123; andresume the conversion. This guarantees that any
-    document will be savedwithout losses (except for markup names where this
-    is not legal, this isa problem in the current version, in practice avoid
-    using non-asciicharacters for tag or attribute names). A special "ascii"
-    encoding nameis used to save documents to a pure ascii form can be used
-    whenportability is really crucial</li>
+  <li>so if an encoding was specified, either at the API level or
+    onthedocument, libxml2 will again canonicalize the encoding name,
+    lookupfor aconverter in the registered set or through iconv. If not
+    foundthefunction will return an error code</li>
+  <li>the converter is placed before the I/O buffer layer, as another
+    kindofbuffer, then libxml2 will simply push the UTF-8 serialization
+    tothroughthat buffer, which will then progressively be converted and
+    pushedontothe I/O layer.</li>
+  <li>It is possible that the converter code fails on some input,
+    forexampletrying to push an UTF-8 encoded Chinese character through
+    theUTF-8 toISO-8859-1 converter won't work. Since the encoders
+    areprogressive theywill just report the error and the number of
+    bytesconverted, at thatpoint libxml2 will decode the offending
+    character,remove it from thebuffer and replace it with the associated
+    charRefencoding &amp;#123; andresume the conversion. This guarantees that
+    anydocument will be savedwithout losses (except for markup names where
+    thisis not legal, this isa problem in the current version, in practice
+    avoidusing non-asciicharacters for tag or attribute names). A special
+    "ascii"encoding nameis used to save documents to a pure ascii form can be
+    usedwhenportability is really crucial</li>
 </ol><p>Here are a few examples based on the same test document:</p><pre>~/XML -&gt; ./xmllint isolat1 
 &lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
 &lt;très&gt;là&lt;/très&gt;
 ~/XML -&gt; ./xmllint --encode UTF-8 isolat1 
 &lt;?xml version="1.0" encoding="UTF-8"?&gt;
 &lt;trÃ¨s&gt;lÃ  &lt;/trÃ¨s&gt;
-~/XML -&gt; </pre><p>The same processing is applied (and reuse most of the code) for HTML
-I18Nprocessing. Looking up and modifying the content encoding is a bit
-moredifficult since it is located in a &lt;meta&gt; tag under the
-&lt;head&gt;,so a couple of functions htmlGetMetaEncoding() and
-htmlSetMetaEncoding() havebeen provided. The parser also attempts to switch
-encoding on the fly whendetecting such a tag on input. Except for that the
-processing is the same(and again reuses the same code).</p><h3><a name="Default" id="Default">Default supported encodings</a></h3><p>libxml2 has a set of default converters for the following
-encodings(located in encoding.c):</p><ol><li>UTF-8 is supported by default (null handlers)</li>
+~/XML -&gt; </pre><p>The same processing is applied (and reuse most of the code) for
+HTMLI18Nprocessing. Looking up and modifying the content encoding is a
+bitmoredifficult since it is located in a &lt;meta&gt; tag under
+the&lt;head&gt;,so a couple of functions htmlGetMetaEncoding()
+andhtmlSetMetaEncoding() havebeen provided. The parser also attempts to
+switchencoding on the fly whendetecting such a tag on input. Except for that
+theprocessing is the same(and again reuses the same code).</p><h3><a name="Default" id="Default">Default supported encodings</a></h3><p>libxml2 has a set of default converters for the followingencodings(located
+in encoding.c):</p><ol><li>UTF-8 is supported by default (null handlers)</li>
   <li>UTF-16, both little and big endian</li>
   <li>ISO-Latin-1 (ISO-8859-1) covering most western languages</li>
   <li>ASCII, useful mostly for saving</li>
-  <li>HTML, a specific handler for the conversion of UTF-8 to ASCII with
-    HTMLpredefined entities like &amp;copy; for the Copyright sign.</li>
-</ol><p>More over when compiled on an Unix platform with iconv support the fullset
-of encodings supported by iconv can be instantly be used by libxml. On alinux
-machine with glibc-2.1 the list of supported encodings and aliases fill3 full
-pages, and include UCS-4, the full set of ISO-Latin encodings, and thevarious
-Japanese ones.</p><p>To convert from the UTF-8 values returned from the API to another
-encodingthen it is possible to use the function provided from <a href="html/libxml-encoding.html">the encoding module</a>like <a href="html/libxml-encoding.html#UTF8Toisolat1">UTF8Toisolat1</a>, or use
-thePOSIX <a href="http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html">iconv()</a>API
-directly.</p><h4>Encoding aliases</h4><p>From 2.2.3, libxml2 has support to register encoding names aliases.
-Thegoal is to be able to parse document whose encoding is supported but
-wherethe name differs (for example from the default set of names accepted
-byiconv). The following functions allow to register and handle new aliases
-forexisting encodings. Once registered libxml2 will automatically lookup
-thealiases when handling a document:</p><ul><li>int xmlAddEncodingAlias(const char *name, const char *alias);</li>
+  <li>HTML, a specific handler for the conversion of UTF-8 to ASCII
+    withHTMLpredefined entities like &amp;copy; for the Copyright sign.</li>
+</ol><p>More over when compiled on an Unix platform with iconv support the
+fullsetof encodings supported by iconv can be instantly be used by libxml. On
+alinuxmachine with glibc-2.1 the list of supported encodings and aliases
+fill3 fullpages, and include UCS-4, the full set of ISO-Latin encodings, and
+thevariousJapanese ones.</p><p>To convert from the UTF-8 values returned from the API to
+anotherencodingthen it is possible to use the function provided from <a href="html/libxml-encoding.html">the encoding module</a>like <a href="html/libxml-encoding.html#UTF8Toisolat1">UTF8Toisolat1</a>, or
+usethePOSIX <a href="http://www.opengroup.org/onlinepubs/009695399/functions/iconv.html">iconv()</a>APIdirectly.</p><h4>Encoding aliases</h4><p>From 2.2.3, libxml2 has support to register encoding names aliases.Thegoal
+is to be able to parse document whose encoding is supported butwherethe name
+differs (for example from the default set of names acceptedbyiconv). The
+following functions allow to register and handle new aliasesforexisting
+encodings. Once registered libxml2 will automatically lookupthealiases when
+handling a document:</p><ul><li>int xmlAddEncodingAlias(const char *name, const char *alias);</li>
   <li>int xmlDelEncodingAlias(const char *alias);</li>
   <li>const char * xmlGetEncodingAlias(const char *alias);</li>
   <li>void xmlCleanupEncodingAliases(void);</li>
-</ul><h3><a name="extend" id="extend">How to extend the existing support</a></h3><p>Well adding support for new encoding, or overriding one of the
-encoders(assuming it is buggy) should not be hard, just write input and
-outputconversion routines to/from UTF-8, and register them
-usingxmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx),  and they will
-becalled automatically if the parser(s) encounter such an encoding
-name(register it uppercase, this will help). The description of the
-encoders,their arguments and expected return values are described in the
-encoding.hheader.</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>
+</ul><h3><a name="extend" id="extend">How to extend the existing support</a></h3><p>Well adding support for new encoding, or overriding one of
+theencoders(assuming it is buggy) should not be hard, just write input
+andoutputconversion routines to/from UTF-8, and register
+themusingxmlNewCharEncodingHandler(name, xxxToUTF8, UTF8Toxxx),  and they
+willbecalled automatically if the parser(s) encounter such an
+encodingname(register it uppercase, this will help). The description of
+theencoders,their arguments and expected return values are described in
+theencoding.hheader.</p><p><a href="bugs.html">Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>
commit	fabafd54c7d7aed4f68be56d196b2f965eab7a9d	[log] [tgz]
author	Daniel Veillard <veillard@src.gnome.org>	Thu Jun 08 08:16:33 2006 +0000
committer	Daniel Veillard <veillard@src.gnome.org>	Thu Jun 08 08:16:33 2006 +0000
tree	887717d1e0384a48d71e631a8ca4c24612f108f7
parent	7cb3fa9d51a0a57d1e538854e4be84db82b75098 [diff] [blame]