Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 2 | <html> |
| 3 | <head> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 4 | <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type"> |
| 5 | <style type="text/css"><!-- |
| 6 | TD {font-size: 10pt; font-family: Verdana,Arial,Helvetica} |
| 7 | BODY {font-size: 10pt; font-family: Verdana,Arial,Helvetica; margin-top: 5pt; margin-left: 0pt; margin-right: 0pt} |
| 8 | H1 {font-size: 16pt; font-family: Verdana,Arial,Helvetica} |
| 9 | H2 {font-size: 14pt; font-family: Verdana,Arial,Helvetica} |
| 10 | H3 {font-size: 12pt; font-family: Verdana,Arial,Helvetica} |
| 11 | A:link, A:visited, A:active { text-decoration: underline } |
| 12 | --></style> |
| 13 | <title>I/O Interfaces</title> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 14 | </head> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 15 | <body bgcolor="#8b7765" text="#000000" link="#000000" vlink="#000000"> |
| 16 | <table border="0" width="100%" cellpadding="5" cellspacing="0" align="center"><tr> |
| 17 | <td width="180"> |
| 18 | <a href="http://www.gnome.org/"><img src="smallfootonly.gif" alt="Gnome Logo"></a><a href="http://www.w3.org/Status"><img src="w3c.png" alt="W3C Logo"></a><a href="http://www.redhat.com/"><img src="redhat.gif" alt="Red Hat Logo"></a> |
| 19 | </td> |
| 20 | <td><table border="0" width="90%" cellpadding="2" cellspacing="0" align="center" bgcolor="#000000"><tr><td><table width="100%" border="0" cellspacing="1" cellpadding="3" bgcolor="#fffacd"><tr><td align="center"> |
| 21 | <h1>The XML C library for Gnome</h1> |
| 22 | <h2>I/O Interfaces</h2> |
| 23 | </td></tr></table></td></tr></table></td> |
| 24 | </tr></table> |
| 25 | <table border="0" cellpadding="4" cellspacing="0" width="100%" align="center"><tr><td bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="2" width="100%"><tr> |
| 26 | <td valign="top" width="200" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td> |
| 27 | <table width="100%" border="0" cellspacing="1" cellpadding="3"> |
| 28 | <tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Main Menu</b></center></td></tr> |
| 29 | <tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt"> |
| 30 | <li><a href="index.html">Home</a></li> |
| 31 | <li><a href="intro.html">Introduction</a></li> |
| 32 | <li><a href="FAQ.html">FAQ</a></li> |
| 33 | <li><a href="docs.html">Documentation</a></li> |
| 34 | <li><a href="bugs.html">Reporting bugs and getting help</a></li> |
| 35 | <li><a href="help.html">How to help</a></li> |
| 36 | <li><a href="downloads.html">Downloads</a></li> |
| 37 | <li><a href="news.html">News</a></li> |
| 38 | <li><a href="XML.html">XML</a></li> |
| 39 | <li><a href="XSLT.html">XSLT</a></li> |
| 40 | <li><a href="architecture.html">libxml architecture</a></li> |
| 41 | <li><a href="tree.html">The tree output</a></li> |
| 42 | <li><a href="interface.html">The SAX interface</a></li> |
| 43 | <li><a href="xmldtd.html">Validation & DTDs</a></li> |
| 44 | <li><a href="xmlmem.html">Memory Management</a></li> |
| 45 | <li><a href="encoding.html">Encodings support</a></li> |
| 46 | <li><a href="xmlio.html">I/O Interfaces</a></li> |
| 47 | <li><a href="catalog.html">Catalog support</a></li> |
| 48 | <li><a href="library.html">The parser interfaces</a></li> |
| 49 | <li><a href="entities.html">Entities or no entities</a></li> |
| 50 | <li><a href="namespaces.html">Namespaces</a></li> |
| 51 | <li><a href="upgrade.html">Upgrading 1.x code</a></li> |
Daniel Veillard | 52dcab3 | 2001-10-30 12:51:17 +0000 | [diff] [blame] | 52 | <li><a href="threads.html">Thread safety</a></li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 53 | <li><a href="DOM.html">DOM Principles</a></li> |
| 54 | <li><a href="example.html">A real example</a></li> |
| 55 | <li><a href="contribs.html">Contributions</a></li> |
| 56 | <li> |
| 57 | <a href="xml.html">flat page</a>, <a href="site.xsl">stylesheet</a> |
| 58 | </li> |
| 59 | </ul></td></tr> |
| 60 | </table> |
| 61 | <table width="100%" border="0" cellspacing="1" cellpadding="3"> |
| 62 | <tr><td colspan="1" bgcolor="#eecfa1" align="center"><center><b>Related links</b></center></td></tr> |
| 63 | <tr><td bgcolor="#fffacd"><ul style="margin-left: -2pt"> |
| 64 | <li><a href="http://mail.gnome.org/archives/xml/">Mail archive</a></li> |
| 65 | <li><a href="http://xmlsoft.org/XSLT/">XSLT libxslt</a></li> |
| 66 | <li><a href="http://www.cs.unibo.it/~casarini/gdome2/">DOM gdome2</a></li> |
| 67 | <li><a href="ftp://xmlsoft.org/">FTP</a></li> |
| 68 | <li><a href="http://www.fh-frankfurt.de/~igor/projects/libxml/">Windows binaries</a></li> |
| 69 | <li><a href="http://pages.eidosnet.co.uk/~garypen/libxml/">Solaris binaries</a></li> |
Daniel Veillard | c6271d2 | 2001-10-27 07:50:58 +0000 | [diff] [blame] | 70 | <li><a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">Bug Tracker</a></li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 71 | </ul></td></tr> |
| 72 | </table> |
| 73 | </td></tr></table></td> |
| 74 | <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 75 | <p>Table of Content:</p> |
| 76 | <ol> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 77 | <li><a href="#General1">General overview</a></li> |
| 78 | <li><a href="#basic">The basic buffer type</a></li> |
| 79 | <li><a href="#Input">Input I/O handlers</a></li> |
| 80 | <li><a href="#Output">Output I/O handlers</a></li> |
| 81 | <li><a href="#entities">The entities loader</a></li> |
| 82 | <li><a href="#Example2">Example of customized I/O</a></li> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 83 | </ol> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 84 | <h3><a name="General1">General overview</a></h3> |
| 85 | <p>The module <code><a href="http://xmlsoft.org/html/libxml-xmlio.html">xmlIO.h</a></code> provides |
| 86 | the interfaces to the libxml I/O system. This consists of 4 main parts:</p> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 87 | <ul> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 88 | <li>Entities loader, this is a routine which tries to fetch the entities |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 89 | (files) based on their PUBLIC and SYSTEM identifiers. The default loader |
| 90 | don't look at the public identifier since libxml do not maintain a |
| 91 | catalog. You can redefine you own entity loader by using |
| 92 | <code>xmlGetExternalEntityLoader()</code> and |
Daniel Veillard | 9c46682 | 2001-10-25 12:03:39 +0000 | [diff] [blame] | 93 | <code>xmlSetExternalEntityLoader()</code>. <a href="#entities">Check the |
| 94 | example</a>.</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 95 | <li>Input I/O buffers which are a commodity structure used by the parser(s) |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 96 | input layer to handle fetching the informations to feed the parser. This |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 97 | provides buffering and is also a placeholder where the encoding |
| 98 | convertors to UTF8 are piggy-backed.</li> |
| 99 | <li>Output I/O buffers are similar to the Input ones and fulfill similar |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 100 | task but when generating a serialization from a tree.</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 101 | <li>A mechanism to register sets of I/O callbacks and associate them with |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 102 | specific naming schemes like the protocol part of the URIs. |
| 103 | <p>This affect the default I/O operations and allows to use specific I/O |
| 104 | handlers for certain names.</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 105 | </li> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 106 | </ul> |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 107 | <p>The general mechanism used when loading http://rpmfind.net/xml.html for |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 108 | example in the HTML parser is the following:</p> |
| 109 | <ol> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 110 | <li>The default entity loader calls <code>xmlNewInputFromFile()</code> with |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 111 | the parsing context and the URI string.</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 112 | <li>the URI string is checked against the existing registered handlers |
| 113 | using their match() callback function, if the HTTP module was compiled |
| 114 | in, it is registered and its match() function will succeeds</li> |
| 115 | <li>the open() function of the handler is called and if successful will |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 116 | return an I/O Input buffer</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 117 | <li>the parser will the start reading from this buffer and progressively |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 118 | fetch information from the resource, calling the read() function of the |
| 119 | handler until the resource is exhausted</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 120 | <li>if an encoding change is detected it will be installed on the input |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 121 | buffer, providing buffering and efficient use of the conversion |
| 122 | routines</li> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 123 | <li>once the parser has finished, the close() function of the handler is |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 124 | called once and the Input buffer and associed resources are |
| 125 | deallocated.</li> |
| 126 | </ol> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 127 | <p>The user defined callbacks are checked first to allow overriding of the |
| 128 | default libxml I/O routines.</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 129 | <h3><a name="basic">The basic buffer type</a></h3> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 130 | <p>All the buffer manipulation handling is done using the |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 131 | <code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a></code>which is a |
| 132 | resizable memory buffer. The buffer allocation strategy can be selected to be |
| 133 | either best-fit or use an exponential doubling one (CPU vs. memory use |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 134 | tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and |
| 135 | <code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a |
| 136 | system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number |
| 137 | of functions allows to manipulate buffers with names starting with the |
| 138 | <code>xmlBuffer...</code> prefix.</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 139 | <h3><a name="Input">Input I/O handlers</a></h3> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 140 | <p>An Input I/O handler is a simple structure |
| 141 | <code>xmlParserInputBuffer</code> containing a context associated to the |
| 142 | resource (file descriptor, or pointer to a protocol handler), the read() and |
| 143 | close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset |
| 144 | encoding handler are also present to support charset conversion when |
| 145 | needed.</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 146 | <h3><a name="Output">Output I/O handlers</a></h3> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 147 | <p>An Output handler <code>xmlOutputBuffer</code> is completely similar to an |
| 148 | Input one except the callbacks are write() and close().</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 149 | <h3><a name="entities">The entities loader</a></h3> |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 150 | <p>The entity loader resolves requests for new entities and create inputs for |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 151 | the parser. Creating an input from a filename or an URI string is done |
| 152 | through the xmlNewInputFromFile() routine. The default entity loader do not |
| 153 | handle the PUBLIC identifier associated with an entity (if any). So it just |
| 154 | calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 155 | XML).</p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 156 | <p>If you want to hook up a catalog mechanism then you simply need to |
| 157 | override the default entity loader, here is an example:</p> |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 158 | <pre>#include <libxml/xmlIO.h> |
| 159 | |
| 160 | xmlExternalEntityLoader defaultLoader = NULL; |
| 161 | |
| 162 | xmlParserInputPtr |
| 163 | xmlMyExternalEntityLoader(const char *URL, const char *ID, |
| 164 | xmlParserCtxtPtr ctxt) { |
| 165 | xmlParserInputPtr ret; |
| 166 | const char *fileID = NULL; |
| 167 | /* lookup for the fileID depending on ID */ |
| 168 | |
| 169 | ret = xmlNewInputFromFile(ctxt, fileID); |
| 170 | if (ret != NULL) |
| 171 | return(ret); |
| 172 | if (defaultLoader != NULL) |
| 173 | ret = defaultLoader(URL, ID, ctxt); |
| 174 | return(ret); |
| 175 | } |
| 176 | |
| 177 | int main(..) { |
| 178 | ... |
| 179 | |
| 180 | /* |
| 181 | * Install our own entity loader |
| 182 | */ |
| 183 | defaultLoader = xmlGetExternalEntityLoader(); |
| 184 | xmlSetExternalEntityLoader(xmlMyExternalEntityLoader); |
| 185 | |
| 186 | ... |
| 187 | }</pre> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 188 | <h3><a name="Example2">Example of customized I/O</a></h3> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 189 | <p>This example come from <a href="http://xmlsoft.org/messages/0708.html">a |
| 190 | real use case</a>, xmlDocDump() closes the FILE * passed by the application |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 191 | and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 192 | new output handler with the closing call deactivated:</p> |
| 193 | <ol> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 194 | <li>First define a new I/O ouput allocator where the output don't close the |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 195 | file: |
Daniel Veillard | a7ad452 | 2000-08-31 14:19:54 +0000 | [diff] [blame] | 196 | <pre>xmlOutputBufferPtr |
| 197 | xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) { |
| 198 | xmlOutputBufferPtr ret; |
| 199 | |
| 200 | if (xmlOutputCallbackInitialized == 0) |
| 201 | xmlRegisterDefaultOutputCallbacks(); |
| 202 | |
| 203 | if (file == NULL) return(NULL); |
| 204 | ret = xmlAllocOutputBuffer(encoder); |
| 205 | if (ret != NULL) { |
| 206 | ret->context = file; |
| 207 | ret->writecallback = xmlFileWrite; |
| 208 | ret->closecallback = NULL; /* No close callback */ |
| 209 | } |
| 210 | return(ret); <br> |
Daniel Veillard | 2bb8909 | 2000-08-31 14:57:50 +0000 | [diff] [blame] | 211 | |
| 212 | |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 213 | |
Daniel Veillard | 9c46682 | 2001-10-25 12:03:39 +0000 | [diff] [blame] | 214 | |
| 215 | |
Daniel Veillard | c6271d2 | 2001-10-27 07:50:58 +0000 | [diff] [blame] | 216 | |
Daniel Veillard | 5109531 | 2001-10-28 18:51:57 +0000 | [diff] [blame] | 217 | |
Daniel Veillard | 52dcab3 | 2001-10-30 12:51:17 +0000 | [diff] [blame] | 218 | |
Daniel Veillard | ed421aa | 2001-11-04 21:22:45 +0000 | [diff] [blame] | 219 | |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 220 | } </pre> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 221 | </li> |
| 222 | <li>And then use it to save the document: |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 223 | <pre>FILE *f; |
Daniel Veillard | a7ad452 | 2000-08-31 14:19:54 +0000 | [diff] [blame] | 224 | xmlOutputBufferPtr output; |
| 225 | xmlDocPtr doc; |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 226 | int res; |
Daniel Veillard | a7ad452 | 2000-08-31 14:19:54 +0000 | [diff] [blame] | 227 | |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 228 | f = ... |
| 229 | doc = .... |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 230 | |
Daniel Veillard | a7ad452 | 2000-08-31 14:19:54 +0000 | [diff] [blame] | 231 | output = xmlOutputBufferCreateOwn(f, NULL); |
| 232 | res = xmlSaveFileTo(output, doc, NULL); |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 233 | </pre> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 234 | </li> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 235 | </ol> |
Daniel Veillard | c5d6434 | 2001-06-24 12:13:24 +0000 | [diff] [blame] | 236 | <p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p> |
Daniel Veillard | b8cfbd1 | 2001-10-25 10:53:28 +0000 | [diff] [blame] | 237 | </td></tr></table></td></tr></table></td></tr></table></td> |
| 238 | </tr></table></td></tr></table> |
Daniel Veillard | 9698445 | 2000-08-31 13:50:12 +0000 | [diff] [blame] | 239 | </body> |
| 240 | </html> |