Blanks handling function, added 2.x upgrade doc, Daniel
diff --git a/doc/upgrade.html b/doc/upgrade.html
new file mode 100644
index 0000000..ce13bef
--- /dev/null
+++ b/doc/upgrade.html
@@ -0,0 +1,82 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
+ "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html>
+<head>
+ <title>Upgrading libxml client code from 1.x to 2.x</title>
+ <meta name="GENERATOR" content="amaya V2.4">
+ <meta http-equiv="Content-Type" content="text/html">
+</head>
+
+<body bgcolor="#ffffff">
+<h1 align="center">Upgrading libxml client code from 1.x to 2.x</h1>
+
+<p>Version 2 of libxml is the first version introducing serious backward
+incompatible changes. The main goals were:</p>
+<ul>
+ <li>a general cleanup. A number of mistakes inherited from the very early
+ versions couldn't be changed due to compatibility constraints. Example the
+ "childs" element in the nodes.</li>
+ <li>Uniformization of the various nodes, at least for their header and link
+ parts (doc, parent, children, prev, next), the goal is a simpler
+ programming model and simplifying the task of the DOM implementors.</li>
+ <li>better conformances to the XML specification, for example version 1.x
+ had an heuristic to try to detect ignorable white spaces. As a result the
+ SAX event generated were ignorableWhitespace() while the spec requires
+ character() in that case. This also mean that a number of DOM node
+ containing blank text may populate the DOM tree which were not present
+ before.</li>
+</ul>
+
+<p>So client code of libxml designed to run with version 1.x may have to be
+changed to compile against version 2.x of libxml. Here is a list of changes
+that I have collected, they may not be sufficient, so in case you find other
+change which are required, <a href="mailto:Daniel.Ïeillardw3.org">drop me a
+mail</a>:</p>
+<ol>
+ <li>Node <strong>childs</strong> field has been renamed
+ <strong>children</strong> so s/childs/children/g should be applied
+ (probablility of having "childs" anywere else is close to 0+</li>
+ <li>The document don't have anymore a <strong>root</strong> element it has
+ been replaced by <strong>children</strong> and usually you will get a list
+ of element here. Áor example a Dtd element for the internal subset and
+ it's declaration may be found in that list, as well as processing
+ instructions or comments found before or after the document root element.
+ Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
+ a document. Alternatively if you are sure to not reference Dtds nor have
+ PIs or comments before or after the root element s/->root/->children/g
+ will probably do it.</li>
+ <li>The white space issue, this one is more complex, unless special case of
+ validating parsing, the line breaks and spaces usually used for indenting
+ and formatting the document content becomes significant. So they are
+ reported by SAX and if your using the DOM tree, corresponding nodes are
+ generated. Too approach can be taken:
+ <ol>
+ <li>lazy one, use the compatibility call
+ <strong>xmlKeepBlanksDefault(0)</strong> but be aware that you are
+ relying on a special (and possibly broken) set of heuristics of libxml
+ to detect ignorable blanks. Don't complain if it breaks or make your
+ application not 100% clean w.r.t. to it's input.</li>
+ <li>the Right Way: change you code to accept possibly unsignificant
+ blanks characters, or have your tree populated with weird blank text
+ nodes. You can spot them using the comodity function
+ <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
+ nodes.</li>
+ </ol>
+ <p>Note also that with the new default the output functions don't add any
+ extra indentation when saving a tree in order to be able to round trip
+ (read and save) without inflating the document with extra formatting
+ chars.</p>
+ </li>
+</ol>
+
+<p>Let me put some emphasis on the fact that there is far more changes from
+libxml 1.x to 2.x than the ones you may have to pacth for. The overall code
+has been considerably improved and the conformance to the XML specification
+has been drastically improve. Don't take those changes as an excuse to not
+upgrade, it may cost a lot on the long term ...</p>
+
+<p><a href="mailto:Daniel.Veillard@w3.org">Daniel Veillard</a></p>
+
+<p>$Id$</p>
+</body>
+</html>