blob: 22c5231bdaa3d6a8fd872e8a96e527dc53a8f007 [file] [log] [blame]
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00001<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
Daniel Veillard96984452000-08-31 13:50:12 +00002<html>
3<head>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +00004<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
5<style type="text/css"><!--
6TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica}
7BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt}
8H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica}
9H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica}
10H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica}
11A:link, A:visited, A:active { text-decoration: underline }
12--></style>
13<title>I/O Interfaces</title>
Daniel Veillard96984452000-08-31 13:50:12 +000014</head>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000015<body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000">
16<table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr>
17<td width="180">
18<a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a>
19</td>
20<td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center">
21<h1>The XML C library for Gnome</h1>
22<h2>I/O Interfaces</h2>
23</td></tr></table></td></tr></table></td>
24</tr></table>
25<table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr>
26<td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td>
27<table width="100%" border="0" cellspacing="1" cellpadding="3">
28<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr>
29<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
30<li><a href="index.html">Home</a></li>
31<li><a href="intro.html">Introduction</a></li>
32<li><a href="FAQ.html">FAQ</a></li>
33<li><a href="docs.html">Documentation</a></li>
34<li><a href="bugs.html">Reporting bugs and getting help</a></li>
35<li><a href="help.html">How to help</a></li>
36<li><a href="downloads.html">Downloads</a></li>
37<li><a href="news.html">News</a></li>
38<li><a href="XML.html">XML</a></li>
39<li><a href="XSLT.html">XSLT</a></li>
40<li><a href="architecture.html">libxml architecture</a></li>
41<li><a href="tree.html">The tree output</a></li>
42<li><a href="interface.html">The SAX interface</a></li>
43<li><a href="xmldtd.html">Validation &amp; DTDs</a></li>
44<li><a href="xmlmem.html">Memory Management</a></li>
45<li><a href="encoding.html">Encodings support</a></li>
46<li><a href="xmlio.html">I/O Interfaces</a></li>
47<li><a href="catalog.html">Catalog support</a></li>
48<li><a href="library.html">The parser interfaces</a></li>
49<li><a href="entities.html">Entities or no entities</a></li>
50<li><a href="namespaces.html">Namespaces</a></li>
51<li><a href="upgrade.html">Upgrading 1.x code</a></li>
52<li><a href="DOM.html">DOM Principles</a></li>
53<li><a href="example.html">A real example</a></li>
54<li><a href="contribs.html">Contributions</a></li>
55<li>
56<a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a>
57</li>
58</ul></td></tr>
59</table>
60<table width="100%" border="0" cellspacing="1" cellpadding="3">
61<tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr>
62<tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt">
63<li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li>
64<li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li>
65<li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li>
66<li><a href="ftp://xmlsoft.org/">FTP</a></li>
67<li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li>
68<li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li>
Daniel Veillardc6271d22001-10-27 07:50:58 +000069<li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000070</ul></td></tr>
71</table>
72</td></tr></table></td>
73<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
Daniel Veillard96984452000-08-31 13:50:12 +000074<p>Table of Content:</p>
75<ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000076<li><a href="#General1">General overview</a></li>
77<li><a href="#basic">The basic buffer type</a></li>
78<li><a href="#Input">Input I/O handlers</a></li>
79<li><a href="#Output">Output I/O handlers</a></li>
80<li><a href="#entities">The entities loader</a></li>
81<li><a href="#Example2">Example of customized I/O</a></li>
Daniel Veillard96984452000-08-31 13:50:12 +000082</ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000083<h3><a name="General1">General overview</a></h3>
84<p>The module <code><a href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides
85the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
Daniel Veillard96984452000-08-31 13:50:12 +000086<ul>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000087<li>Entities loader, this is a routine which tries to fetch the entities
Daniel Veillard2bb89092000-08-31 14:57:50 +000088 (files) based on their PUBLIC and SYSTEM identifiers. The default loader
89 don't look at the public identifier since libxml do not maintain a
90 catalog. You can redefine you own entity loader by using
91 <code>xmlGetExternalEntityLoader()</code> and
Daniel Veillard9c466822001-10-25 12:03:39 +000092 <code>xmlSetExternalEntityLoader()</code>. <a href="#entities">Check the
93 example</a>.</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000094<li>Input I/O buffers which are a commodity structure used by the parser(s)
Daniel Veillard96984452000-08-31 13:50:12 +000095 input layer to handle fetching the informations to feed the parser. This
Daniel Veillardb8cfbd12001-10-25 10:53:28 +000096 provides buffering and is also a placeholder where the encoding
97 convertors to UTF8 are piggy-backed.</li>
98<li>Output I/O buffers are similar to the Input ones and fulfill similar
Daniel Veillard96984452000-08-31 13:50:12 +000099 task but when generating a serialization from a tree.</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000100<li>A mechanism to register sets of I/O callbacks and associate them with
Daniel Veillard96984452000-08-31 13:50:12 +0000101 specific naming schemes like the protocol part of the URIs.
102 <p>This affect the default I/O operations and allows to use specific I/O
103 handlers for certain names.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000104</li>
Daniel Veillard96984452000-08-31 13:50:12 +0000105</ul>
Daniel Veillard2bb89092000-08-31 14:57:50 +0000106<p>The general mechanism used when loading http://rpmfind.net/xml.html for
Daniel Veillard96984452000-08-31 13:50:12 +0000107example in the HTML parser is the following:</p>
108<ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000109<li>The default entity loader calls <code>xmlNewInputFromFile()</code> with
Daniel Veillard2bb89092000-08-31 14:57:50 +0000110 the parsing context and the URI string.</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000111<li>the URI string is checked against the existing registered handlers
112 using their match() callback function, if the HTTP module was compiled
113 in, it is registered and its match() function will succeeds</li>
114<li>the open() function of the handler is called and if successful will
Daniel Veillard96984452000-08-31 13:50:12 +0000115 return an I/O Input buffer</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000116<li>the parser will the start reading from this buffer and progressively
Daniel Veillard96984452000-08-31 13:50:12 +0000117 fetch information from the resource, calling the read() function of the
118 handler until the resource is exhausted</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000119<li>if an encoding change is detected it will be installed on the input
Daniel Veillard96984452000-08-31 13:50:12 +0000120 buffer, providing buffering and efficient use of the conversion
121 routines</li>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000122<li>once the parser has finished, the close() function of the handler is
Daniel Veillard96984452000-08-31 13:50:12 +0000123 called once and the Input buffer and associed resources are
124 deallocated.</li>
125</ol>
Daniel Veillard96984452000-08-31 13:50:12 +0000126<p>The user defined callbacks are checked first to allow overriding of the
127default libxml I/O routines.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000128<h3><a name="basic">The basic buffer type</a></h3>
Daniel Veillard96984452000-08-31 13:50:12 +0000129<p>All the buffer manipulation handling is done using the
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000130<code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a></code>which is a
131resizable memory buffer. The buffer allocation strategy can be selected to be
132either best-fit or use an exponential doubling one (CPU vs. memory use
Daniel Veillard96984452000-08-31 13:50:12 +0000133tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
134<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
135system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
136of functions allows to manipulate buffers with names starting with the
137<code>xmlBuffer...</code> prefix.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000138<h3><a name="Input">Input I/O handlers</a></h3>
Daniel Veillard96984452000-08-31 13:50:12 +0000139<p>An Input I/O handler is a simple structure
140<code>xmlParserInputBuffer</code> containing a context associated to the
141resource (file descriptor, or pointer to a protocol handler), the read() and
142close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset
143encoding handler are also present to support charset conversion when
144needed.</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000145<h3><a name="Output">Output I/O handlers</a></h3>
Daniel Veillard96984452000-08-31 13:50:12 +0000146<p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an
147Input one except the callbacks are write() and close().</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000148<h3><a name="entities">The entities loader</a></h3>
Daniel Veillard2bb89092000-08-31 14:57:50 +0000149<p>The entity loader resolves requests for new entities and create inputs for
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000150the parser. Creating an input from a filename or an URI string is done
151through the xmlNewInputFromFile() routine. The default entity loader do not
152handle the PUBLIC identifier associated with an entity (if any). So it just
153calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in
Daniel Veillard2bb89092000-08-31 14:57:50 +0000154XML).</p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000155<p>If you want to hook up a catalog mechanism then you simply need to
156override the default entity loader, here is an example:</p>
Daniel Veillard2bb89092000-08-31 14:57:50 +0000157<pre>#include &lt;libxml/xmlIO.h&gt;
158
159xmlExternalEntityLoader defaultLoader = NULL;
160
161xmlParserInputPtr
162xmlMyExternalEntityLoader(const char *URL, const char *ID,
163 xmlParserCtxtPtr ctxt) {
164 xmlParserInputPtr ret;
165 const char *fileID = NULL;
166 /* lookup for the fileID depending on ID */
167
168 ret = xmlNewInputFromFile(ctxt, fileID);
169 if (ret != NULL)
170 return(ret);
171 if (defaultLoader != NULL)
172 ret = defaultLoader(URL, ID, ctxt);
173 return(ret);
174}
175
176int main(..) {
177 ...
178
179 /*
180 * Install our own entity loader
181 */
182 defaultLoader = xmlGetExternalEntityLoader();
183 xmlSetExternalEntityLoader(xmlMyExternalEntityLoader);
184
185 ...
186}</pre>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000187<h3><a name="Example2">Example of customized I/O</a></h3>
Daniel Veillard96984452000-08-31 13:50:12 +0000188<p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a
189real use case</a>, xmlDocDump() closes the FILE * passed by the application
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000190and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
Daniel Veillard96984452000-08-31 13:50:12 +0000191new output handler with the closing call deactivated:</p>
192<ol>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000193<li>First define a new I/O ouput allocator where the output don't close the
Daniel Veillard96984452000-08-31 13:50:12 +0000194 file:
Daniel Veillarda7ad4522000-08-31 14:19:54 +0000195 <pre>xmlOutputBufferPtr
196xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
197    xmlOutputBufferPtr ret;
198    
199    if (xmlOutputCallbackInitialized == 0)
200        xmlRegisterDefaultOutputCallbacks();
201
202    if (file == NULL) return(NULL);
203    ret = xmlAllocOutputBuffer(encoder);
204    if (ret != NULL) {
205        ret-&gt;context = file;
206        ret-&gt;writecallback = xmlFileWrite;
207        ret-&gt;closecallback = NULL; /* No close callback */
208    }
209    return(ret); <br>
Daniel Veillard2bb89092000-08-31 14:57:50 +0000210
211
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000212
Daniel Veillard9c466822001-10-25 12:03:39 +0000213
214
Daniel Veillardc6271d22001-10-27 07:50:58 +0000215
Daniel Veillard96984452000-08-31 13:50:12 +0000216} </pre>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000217</li>
218<li>And then use it to save the document:
Daniel Veillard96984452000-08-31 13:50:12 +0000219 <pre>FILE *f;
Daniel Veillarda7ad4522000-08-31 14:19:54 +0000220xmlOutputBufferPtr output;
221xmlDocPtr doc;
Daniel Veillard96984452000-08-31 13:50:12 +0000222int res;
Daniel Veillarda7ad4522000-08-31 14:19:54 +0000223
Daniel Veillard96984452000-08-31 13:50:12 +0000224f = ...
225doc = ....
Daniel Veillard96984452000-08-31 13:50:12 +0000226
Daniel Veillarda7ad4522000-08-31 14:19:54 +0000227output = xmlOutputBufferCreateOwn(f, NULL);
228res = xmlSaveFileTo(output, doc, NULL);
Daniel Veillard96984452000-08-31 13:50:12 +0000229 </pre>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000230</li>
Daniel Veillard96984452000-08-31 13:50:12 +0000231</ol>
Daniel Veillardc5d64342001-06-24 12:13:24 +0000232<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
Daniel Veillardb8cfbd12001-10-25 10:53:28 +0000233</td></tr></table></td></tr></table></td></tr></table></td>
234</tr></table></td></tr></table>
Daniel Veillard96984452000-08-31 13:50:12 +0000235</body>
236</html>