Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`xml.sax.xmlreader` --- Interface for XML parsers |
| 3 | ====================================================== |
| 4 | |
| 5 | .. module:: xml.sax.xmlreader |
| 6 | :synopsis: Interface which SAX-compliant XML parsers must implement. |
| 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 9 | |
| 10 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 | SAX parsers implement the :class:`XMLReader` interface. They are implemented in |
| 12 | a Python module, which must provide a function :func:`create_parser`. This |
| 13 | function is invoked by :func:`xml.sax.make_parser` with no arguments to create |
| 14 | a new parser object. |
| 15 | |
| 16 | |
| 17 | .. class:: XMLReader() |
| 18 | |
| 19 | Base class which can be inherited by SAX parsers. |
| 20 | |
| 21 | |
| 22 | .. class:: IncrementalParser() |
| 23 | |
| 24 | In some cases, it is desirable not to parse an input source at once, but to feed |
| 25 | chunks of the document as they get available. Note that the reader will normally |
| 26 | not read the entire file, but read it in chunks as well; still :meth:`parse` |
| 27 | won't return until the entire document is processed. So these interfaces should |
| 28 | be used if the blocking behaviour of :meth:`parse` is not desirable. |
| 29 | |
| 30 | When the parser is instantiated it is ready to begin accepting data from the |
| 31 | feed method immediately. After parsing has been finished with a call to close |
| 32 | the reset method must be called to make the parser ready to accept new data, |
| 33 | either from feed or using the parse method. |
| 34 | |
| 35 | Note that these methods must *not* be called during parsing, that is, after |
| 36 | parse has been called and before it returns. |
| 37 | |
| 38 | By default, the class also implements the parse method of the XMLReader |
| 39 | interface using the feed, close and reset methods of the IncrementalParser |
| 40 | interface as a convenience to SAX 2.0 driver writers. |
| 41 | |
| 42 | |
| 43 | .. class:: Locator() |
| 44 | |
| 45 | Interface for associating a SAX event with a document location. A locator object |
| 46 | will return valid results only during calls to DocumentHandler methods; at any |
| 47 | other time, the results are unpredictable. If information is not available, |
| 48 | methods may return ``None``. |
| 49 | |
| 50 | |
| 51 | .. class:: InputSource([systemId]) |
| 52 | |
| 53 | Encapsulation of the information needed by the :class:`XMLReader` to read |
| 54 | entities. |
| 55 | |
| 56 | This class may include information about the public identifier, system |
| 57 | identifier, byte stream (possibly with character encoding information) and/or |
| 58 | the character stream of an entity. |
| 59 | |
| 60 | Applications will create objects of this class for use in the |
| 61 | :meth:`XMLReader.parse` method and for returning from |
| 62 | EntityResolver.resolveEntity. |
| 63 | |
| 64 | An :class:`InputSource` belongs to the application, the :class:`XMLReader` is |
| 65 | not allowed to modify :class:`InputSource` objects passed to it from the |
| 66 | application, although it may make copies and modify those. |
| 67 | |
| 68 | |
| 69 | .. class:: AttributesImpl(attrs) |
| 70 | |
| 71 | This is an implementation of the :class:`Attributes` interface (see section |
| 72 | :ref:`attributes-objects`). This is a dictionary-like object which |
| 73 | represents the element attributes in a :meth:`startElement` call. In addition |
| 74 | to the most useful dictionary operations, it supports a number of other |
| 75 | methods as described by the interface. Objects of this class should be |
| 76 | instantiated by readers; *attrs* must be a dictionary-like object containing |
| 77 | a mapping from attribute names to attribute values. |
| 78 | |
| 79 | |
| 80 | .. class:: AttributesNSImpl(attrs, qnames) |
| 81 | |
| 82 | Namespace-aware variant of :class:`AttributesImpl`, which will be passed to |
| 83 | :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but |
| 84 | understands attribute names as two-tuples of *namespaceURI* and |
| 85 | *localname*. In addition, it provides a number of methods expecting qualified |
| 86 | names as they appear in the original document. This class implements the |
| 87 | :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). |
| 88 | |
| 89 | |
| 90 | .. _xmlreader-objects: |
| 91 | |
| 92 | XMLReader Objects |
| 93 | ----------------- |
| 94 | |
| 95 | The :class:`XMLReader` interface supports the following methods: |
| 96 | |
| 97 | |
| 98 | .. method:: XMLReader.parse(source) |
| 99 | |
| 100 | Process an input source, producing SAX events. The *source* object can be a |
| 101 | system identifier (a string identifying the input source -- typically a file |
| 102 | name or an URL), a file-like object, or an :class:`InputSource` object. When |
| 103 | :meth:`parse` returns, the input is completely processed, and the parser object |
| 104 | can be discarded or reset. As a limitation, the current implementation only |
| 105 | accepts byte streams; processing of character streams is for further study. |
| 106 | |
| 107 | |
| 108 | .. method:: XMLReader.getContentHandler() |
| 109 | |
| 110 | Return the current :class:`ContentHandler`. |
| 111 | |
| 112 | |
| 113 | .. method:: XMLReader.setContentHandler(handler) |
| 114 | |
| 115 | Set the current :class:`ContentHandler`. If no :class:`ContentHandler` is set, |
| 116 | content events will be discarded. |
| 117 | |
| 118 | |
| 119 | .. method:: XMLReader.getDTDHandler() |
| 120 | |
| 121 | Return the current :class:`DTDHandler`. |
| 122 | |
| 123 | |
| 124 | .. method:: XMLReader.setDTDHandler(handler) |
| 125 | |
| 126 | Set the current :class:`DTDHandler`. If no :class:`DTDHandler` is set, DTD |
| 127 | events will be discarded. |
| 128 | |
| 129 | |
| 130 | .. method:: XMLReader.getEntityResolver() |
| 131 | |
| 132 | Return the current :class:`EntityResolver`. |
| 133 | |
| 134 | |
| 135 | .. method:: XMLReader.setEntityResolver(handler) |
| 136 | |
| 137 | Set the current :class:`EntityResolver`. If no :class:`EntityResolver` is set, |
| 138 | attempts to resolve an external entity will result in opening the system |
| 139 | identifier for the entity, and fail if it is not available. |
| 140 | |
| 141 | |
| 142 | .. method:: XMLReader.getErrorHandler() |
| 143 | |
| 144 | Return the current :class:`ErrorHandler`. |
| 145 | |
| 146 | |
| 147 | .. method:: XMLReader.setErrorHandler(handler) |
| 148 | |
| 149 | Set the current error handler. If no :class:`ErrorHandler` is set, errors will |
| 150 | be raised as exceptions, and warnings will be printed. |
| 151 | |
| 152 | |
| 153 | .. method:: XMLReader.setLocale(locale) |
| 154 | |
| 155 | Allow an application to set the locale for errors and warnings. |
| 156 | |
| 157 | SAX parsers are not required to provide localization for errors and warnings; if |
| 158 | they cannot support the requested locale, however, they must throw a SAX |
| 159 | exception. Applications may request a locale change in the middle of a parse. |
| 160 | |
| 161 | |
| 162 | .. method:: XMLReader.getFeature(featurename) |
| 163 | |
| 164 | Return the current setting for feature *featurename*. If the feature is not |
| 165 | recognized, :exc:`SAXNotRecognizedException` is raised. The well-known |
| 166 | featurenames are listed in the module :mod:`xml.sax.handler`. |
| 167 | |
| 168 | |
| 169 | .. method:: XMLReader.setFeature(featurename, value) |
| 170 | |
| 171 | Set the *featurename* to *value*. If the feature is not recognized, |
| 172 | :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not |
| 173 | supported by the parser, *SAXNotSupportedException* is raised. |
| 174 | |
| 175 | |
| 176 | .. method:: XMLReader.getProperty(propertyname) |
| 177 | |
| 178 | Return the current setting for property *propertyname*. If the property is not |
| 179 | recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known |
| 180 | propertynames are listed in the module :mod:`xml.sax.handler`. |
| 181 | |
| 182 | |
| 183 | .. method:: XMLReader.setProperty(propertyname, value) |
| 184 | |
| 185 | Set the *propertyname* to *value*. If the property is not recognized, |
| 186 | :exc:`SAXNotRecognizedException` is raised. If the property or its setting is |
| 187 | not supported by the parser, *SAXNotSupportedException* is raised. |
| 188 | |
| 189 | |
| 190 | .. _incremental-parser-objects: |
| 191 | |
| 192 | IncrementalParser Objects |
| 193 | ------------------------- |
| 194 | |
| 195 | Instances of :class:`IncrementalParser` offer the following additional methods: |
| 196 | |
| 197 | |
| 198 | .. method:: IncrementalParser.feed(data) |
| 199 | |
| 200 | Process a chunk of *data*. |
| 201 | |
| 202 | |
| 203 | .. method:: IncrementalParser.close() |
| 204 | |
| 205 | Assume the end of the document. That will check well-formedness conditions that |
| 206 | can be checked only at the end, invoke handlers, and may clean up resources |
| 207 | allocated during parsing. |
| 208 | |
| 209 | |
| 210 | .. method:: IncrementalParser.reset() |
| 211 | |
| 212 | This method is called after close has been called to reset the parser so that it |
| 213 | is ready to parse new documents. The results of calling parse or feed after |
| 214 | close without calling reset are undefined. |
| 215 | |
| 216 | |
| 217 | .. _locator-objects: |
| 218 | |
| 219 | Locator Objects |
| 220 | --------------- |
| 221 | |
| 222 | Instances of :class:`Locator` provide these methods: |
| 223 | |
| 224 | |
| 225 | .. method:: Locator.getColumnNumber() |
| 226 | |
| 227 | Return the column number where the current event ends. |
| 228 | |
| 229 | |
| 230 | .. method:: Locator.getLineNumber() |
| 231 | |
| 232 | Return the line number where the current event ends. |
| 233 | |
| 234 | |
| 235 | .. method:: Locator.getPublicId() |
| 236 | |
| 237 | Return the public identifier for the current event. |
| 238 | |
| 239 | |
| 240 | .. method:: Locator.getSystemId() |
| 241 | |
| 242 | Return the system identifier for the current event. |
| 243 | |
| 244 | |
| 245 | .. _input-source-objects: |
| 246 | |
| 247 | InputSource Objects |
| 248 | ------------------- |
| 249 | |
| 250 | |
| 251 | .. method:: InputSource.setPublicId(id) |
| 252 | |
| 253 | Sets the public identifier of this :class:`InputSource`. |
| 254 | |
| 255 | |
| 256 | .. method:: InputSource.getPublicId() |
| 257 | |
| 258 | Returns the public identifier of this :class:`InputSource`. |
| 259 | |
| 260 | |
| 261 | .. method:: InputSource.setSystemId(id) |
| 262 | |
| 263 | Sets the system identifier of this :class:`InputSource`. |
| 264 | |
| 265 | |
| 266 | .. method:: InputSource.getSystemId() |
| 267 | |
| 268 | Returns the system identifier of this :class:`InputSource`. |
| 269 | |
| 270 | |
| 271 | .. method:: InputSource.setEncoding(encoding) |
| 272 | |
| 273 | Sets the character encoding of this :class:`InputSource`. |
| 274 | |
| 275 | The encoding must be a string acceptable for an XML encoding declaration (see |
| 276 | section 4.3.3 of the XML recommendation). |
| 277 | |
| 278 | The encoding attribute of the :class:`InputSource` is ignored if the |
| 279 | :class:`InputSource` also contains a character stream. |
| 280 | |
| 281 | |
| 282 | .. method:: InputSource.getEncoding() |
| 283 | |
| 284 | Get the character encoding of this InputSource. |
| 285 | |
| 286 | |
| 287 | .. method:: InputSource.setByteStream(bytefile) |
| 288 | |
| 289 | Set the byte stream (a Python file-like object which does not perform |
| 290 | byte-to-character conversion) for this input source. |
| 291 | |
| 292 | The SAX parser will ignore this if there is also a character stream specified, |
| 293 | but it will use a byte stream in preference to opening a URI connection itself. |
| 294 | |
| 295 | If the application knows the character encoding of the byte stream, it should |
| 296 | set it with the setEncoding method. |
| 297 | |
| 298 | |
| 299 | .. method:: InputSource.getByteStream() |
| 300 | |
| 301 | Get the byte stream for this input source. |
| 302 | |
| 303 | The getEncoding method will return the character encoding for this byte stream, |
| 304 | or None if unknown. |
| 305 | |
| 306 | |
| 307 | .. method:: InputSource.setCharacterStream(charfile) |
| 308 | |
| 309 | Set the character stream for this input source. (The stream must be a Python 1.6 |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame^] | 310 | Unicode-wrapped file-like that performs conversion to strings.) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 311 | |
| 312 | If there is a character stream specified, the SAX parser will ignore any byte |
| 313 | stream and will not attempt to open a URI connection to the system identifier. |
| 314 | |
| 315 | |
| 316 | .. method:: InputSource.getCharacterStream() |
| 317 | |
| 318 | Get the character stream for this input source. |
| 319 | |
| 320 | |
| 321 | .. _attributes-objects: |
| 322 | |
| 323 | The :class:`Attributes` Interface |
| 324 | --------------------------------- |
| 325 | |
| 326 | :class:`Attributes` objects implement a portion of the mapping protocol, |
Collin Winter | f6b8121 | 2007-09-10 00:03:41 +0000 | [diff] [blame] | 327 | including the methods :meth:`copy`, :meth:`get`, :meth:`__contains__`, |
| 328 | :meth:`items`, :meth:`keys`, and :meth:`values`. The following methods |
| 329 | are also provided: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 330 | |
| 331 | |
| 332 | .. method:: Attributes.getLength() |
| 333 | |
| 334 | Return the number of attributes. |
| 335 | |
| 336 | |
| 337 | .. method:: Attributes.getNames() |
| 338 | |
| 339 | Return the names of the attributes. |
| 340 | |
| 341 | |
| 342 | .. method:: Attributes.getType(name) |
| 343 | |
| 344 | Returns the type of the attribute *name*, which is normally ``'CDATA'``. |
| 345 | |
| 346 | |
| 347 | .. method:: Attributes.getValue(name) |
| 348 | |
| 349 | Return the value of attribute *name*. |
| 350 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 351 | .. getValueByQName, getNameByQName, getQNameByName, getQNames available |
| 352 | .. here already, but documented only for derived class. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 353 | |
| 354 | |
| 355 | .. _attributes-ns-objects: |
| 356 | |
| 357 | The :class:`AttributesNS` Interface |
| 358 | ----------------------------------- |
| 359 | |
| 360 | This interface is a subtype of the :class:`Attributes` interface (see section |
| 361 | :ref:`attributes-objects`). All methods supported by that interface are also |
| 362 | available on :class:`AttributesNS` objects. |
| 363 | |
| 364 | The following methods are also available: |
| 365 | |
| 366 | |
| 367 | .. method:: AttributesNS.getValueByQName(name) |
| 368 | |
| 369 | Return the value for a qualified name. |
| 370 | |
| 371 | |
| 372 | .. method:: AttributesNS.getNameByQName(name) |
| 373 | |
| 374 | Return the ``(namespace, localname)`` pair for a qualified *name*. |
| 375 | |
| 376 | |
| 377 | .. method:: AttributesNS.getQNameByName(name) |
| 378 | |
| 379 | Return the qualified name for a ``(namespace, localname)`` pair. |
| 380 | |
| 381 | |
| 382 | .. method:: AttributesNS.getQNames() |
| 383 | |
| 384 | Return the qualified names of all attributes. |
| 385 | |