Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`xml.sax.xmlreader` --- Interface for XML parsers |
| 3 | ====================================================== |
| 4 | |
| 5 | .. module:: xml.sax.xmlreader |
| 6 | :synopsis: Interface which SAX-compliant XML parsers must implement. |
| 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 9 | |
| 10 | |
| 11 | .. versionadded:: 2.0 |
| 12 | |
| 13 | SAX parsers implement the :class:`XMLReader` interface. They are implemented in |
| 14 | a Python module, which must provide a function :func:`create_parser`. This |
| 15 | function is invoked by :func:`xml.sax.make_parser` with no arguments to create |
| 16 | a new parser object. |
| 17 | |
| 18 | |
| 19 | .. class:: XMLReader() |
| 20 | |
| 21 | Base class which can be inherited by SAX parsers. |
| 22 | |
| 23 | |
| 24 | .. class:: IncrementalParser() |
| 25 | |
| 26 | In some cases, it is desirable not to parse an input source at once, but to feed |
| 27 | chunks of the document as they get available. Note that the reader will normally |
| 28 | not read the entire file, but read it in chunks as well; still :meth:`parse` |
| 29 | won't return until the entire document is processed. So these interfaces should |
| 30 | be used if the blocking behaviour of :meth:`parse` is not desirable. |
| 31 | |
| 32 | When the parser is instantiated it is ready to begin accepting data from the |
| 33 | feed method immediately. After parsing has been finished with a call to close |
| 34 | the reset method must be called to make the parser ready to accept new data, |
| 35 | either from feed or using the parse method. |
| 36 | |
| 37 | Note that these methods must *not* be called during parsing, that is, after |
| 38 | parse has been called and before it returns. |
| 39 | |
| 40 | By default, the class also implements the parse method of the XMLReader |
| 41 | interface using the feed, close and reset methods of the IncrementalParser |
| 42 | interface as a convenience to SAX 2.0 driver writers. |
| 43 | |
| 44 | |
| 45 | .. class:: Locator() |
| 46 | |
| 47 | Interface for associating a SAX event with a document location. A locator object |
| 48 | will return valid results only during calls to DocumentHandler methods; at any |
| 49 | other time, the results are unpredictable. If information is not available, |
| 50 | methods may return ``None``. |
| 51 | |
| 52 | |
| 53 | .. class:: InputSource([systemId]) |
| 54 | |
| 55 | Encapsulation of the information needed by the :class:`XMLReader` to read |
| 56 | entities. |
| 57 | |
| 58 | This class may include information about the public identifier, system |
| 59 | identifier, byte stream (possibly with character encoding information) and/or |
| 60 | the character stream of an entity. |
| 61 | |
| 62 | Applications will create objects of this class for use in the |
| 63 | :meth:`XMLReader.parse` method and for returning from |
| 64 | EntityResolver.resolveEntity. |
| 65 | |
| 66 | An :class:`InputSource` belongs to the application, the :class:`XMLReader` is |
| 67 | not allowed to modify :class:`InputSource` objects passed to it from the |
| 68 | application, although it may make copies and modify those. |
| 69 | |
| 70 | |
| 71 | .. class:: AttributesImpl(attrs) |
| 72 | |
| 73 | This is an implementation of the :class:`Attributes` interface (see section |
| 74 | :ref:`attributes-objects`). This is a dictionary-like object which |
| 75 | represents the element attributes in a :meth:`startElement` call. In addition |
| 76 | to the most useful dictionary operations, it supports a number of other |
| 77 | methods as described by the interface. Objects of this class should be |
| 78 | instantiated by readers; *attrs* must be a dictionary-like object containing |
| 79 | a mapping from attribute names to attribute values. |
| 80 | |
| 81 | |
| 82 | .. class:: AttributesNSImpl(attrs, qnames) |
| 83 | |
| 84 | Namespace-aware variant of :class:`AttributesImpl`, which will be passed to |
| 85 | :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but |
| 86 | understands attribute names as two-tuples of *namespaceURI* and |
| 87 | *localname*. In addition, it provides a number of methods expecting qualified |
| 88 | names as they appear in the original document. This class implements the |
| 89 | :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). |
| 90 | |
| 91 | |
| 92 | .. _xmlreader-objects: |
| 93 | |
| 94 | XMLReader Objects |
| 95 | ----------------- |
| 96 | |
| 97 | The :class:`XMLReader` interface supports the following methods: |
| 98 | |
| 99 | |
| 100 | .. method:: XMLReader.parse(source) |
| 101 | |
| 102 | Process an input source, producing SAX events. The *source* object can be a |
| 103 | system identifier (a string identifying the input source -- typically a file |
| 104 | name or an URL), a file-like object, or an :class:`InputSource` object. When |
| 105 | :meth:`parse` returns, the input is completely processed, and the parser object |
| 106 | can be discarded or reset. As a limitation, the current implementation only |
| 107 | accepts byte streams; processing of character streams is for further study. |
| 108 | |
| 109 | |
| 110 | .. method:: XMLReader.getContentHandler() |
| 111 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 112 | Return the current :class:`~xml.sax.handler.ContentHandler`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 113 | |
| 114 | |
| 115 | .. method:: XMLReader.setContentHandler(handler) |
| 116 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 117 | Set the current :class:`~xml.sax.handler.ContentHandler`. If no |
| 118 | :class:`~xml.sax.handler.ContentHandler` is set, content events will be |
| 119 | discarded. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 120 | |
| 121 | |
| 122 | .. method:: XMLReader.getDTDHandler() |
| 123 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 124 | Return the current :class:`~xml.sax.handler.DTDHandler`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 125 | |
| 126 | |
| 127 | .. method:: XMLReader.setDTDHandler(handler) |
| 128 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 129 | Set the current :class:`~xml.sax.handler.DTDHandler`. If no |
| 130 | :class:`~xml.sax.handler.DTDHandler` is set, DTD |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 131 | events will be discarded. |
| 132 | |
| 133 | |
| 134 | .. method:: XMLReader.getEntityResolver() |
| 135 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 136 | Return the current :class:`~xml.sax.handler.EntityResolver`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 137 | |
| 138 | |
| 139 | .. method:: XMLReader.setEntityResolver(handler) |
| 140 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 141 | Set the current :class:`~xml.sax.handler.EntityResolver`. If no |
| 142 | :class:`~xml.sax.handler.EntityResolver` is set, |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 143 | attempts to resolve an external entity will result in opening the system |
| 144 | identifier for the entity, and fail if it is not available. |
| 145 | |
| 146 | |
| 147 | .. method:: XMLReader.getErrorHandler() |
| 148 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 149 | Return the current :class:`~xml.sax.handler.ErrorHandler`. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 150 | |
| 151 | |
| 152 | .. method:: XMLReader.setErrorHandler(handler) |
| 153 | |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 154 | Set the current error handler. If no :class:`~xml.sax.handler.ErrorHandler` |
| 155 | is set, errors will be raised as exceptions, and warnings will be printed. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 156 | |
| 157 | |
| 158 | .. method:: XMLReader.setLocale(locale) |
| 159 | |
| 160 | Allow an application to set the locale for errors and warnings. |
| 161 | |
| 162 | SAX parsers are not required to provide localization for errors and warnings; if |
Georg Brandl | 21946af | 2010-10-06 09:28:45 +0000 | [diff] [blame] | 163 | they cannot support the requested locale, however, they must raise a SAX |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 164 | exception. Applications may request a locale change in the middle of a parse. |
| 165 | |
| 166 | |
| 167 | .. method:: XMLReader.getFeature(featurename) |
| 168 | |
| 169 | Return the current setting for feature *featurename*. If the feature is not |
| 170 | recognized, :exc:`SAXNotRecognizedException` is raised. The well-known |
| 171 | featurenames are listed in the module :mod:`xml.sax.handler`. |
| 172 | |
| 173 | |
| 174 | .. method:: XMLReader.setFeature(featurename, value) |
| 175 | |
| 176 | Set the *featurename* to *value*. If the feature is not recognized, |
| 177 | :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not |
| 178 | supported by the parser, *SAXNotSupportedException* is raised. |
| 179 | |
| 180 | |
| 181 | .. method:: XMLReader.getProperty(propertyname) |
| 182 | |
| 183 | Return the current setting for property *propertyname*. If the property is not |
| 184 | recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known |
| 185 | propertynames are listed in the module :mod:`xml.sax.handler`. |
| 186 | |
| 187 | |
| 188 | .. method:: XMLReader.setProperty(propertyname, value) |
| 189 | |
| 190 | Set the *propertyname* to *value*. If the property is not recognized, |
| 191 | :exc:`SAXNotRecognizedException` is raised. If the property or its setting is |
| 192 | not supported by the parser, *SAXNotSupportedException* is raised. |
| 193 | |
| 194 | |
| 195 | .. _incremental-parser-objects: |
| 196 | |
| 197 | IncrementalParser Objects |
| 198 | ------------------------- |
| 199 | |
| 200 | Instances of :class:`IncrementalParser` offer the following additional methods: |
| 201 | |
| 202 | |
| 203 | .. method:: IncrementalParser.feed(data) |
| 204 | |
| 205 | Process a chunk of *data*. |
| 206 | |
| 207 | |
| 208 | .. method:: IncrementalParser.close() |
| 209 | |
| 210 | Assume the end of the document. That will check well-formedness conditions that |
| 211 | can be checked only at the end, invoke handlers, and may clean up resources |
| 212 | allocated during parsing. |
| 213 | |
| 214 | |
| 215 | .. method:: IncrementalParser.reset() |
| 216 | |
| 217 | This method is called after close has been called to reset the parser so that it |
| 218 | is ready to parse new documents. The results of calling parse or feed after |
| 219 | close without calling reset are undefined. |
| 220 | |
| 221 | |
| 222 | .. _locator-objects: |
| 223 | |
| 224 | Locator Objects |
| 225 | --------------- |
| 226 | |
| 227 | Instances of :class:`Locator` provide these methods: |
| 228 | |
| 229 | |
| 230 | .. method:: Locator.getColumnNumber() |
| 231 | |
| 232 | Return the column number where the current event ends. |
| 233 | |
| 234 | |
| 235 | .. method:: Locator.getLineNumber() |
| 236 | |
| 237 | Return the line number where the current event ends. |
| 238 | |
| 239 | |
| 240 | .. method:: Locator.getPublicId() |
| 241 | |
| 242 | Return the public identifier for the current event. |
| 243 | |
| 244 | |
| 245 | .. method:: Locator.getSystemId() |
| 246 | |
| 247 | Return the system identifier for the current event. |
| 248 | |
| 249 | |
| 250 | .. _input-source-objects: |
| 251 | |
| 252 | InputSource Objects |
| 253 | ------------------- |
| 254 | |
| 255 | |
| 256 | .. method:: InputSource.setPublicId(id) |
| 257 | |
| 258 | Sets the public identifier of this :class:`InputSource`. |
| 259 | |
| 260 | |
| 261 | .. method:: InputSource.getPublicId() |
| 262 | |
| 263 | Returns the public identifier of this :class:`InputSource`. |
| 264 | |
| 265 | |
| 266 | .. method:: InputSource.setSystemId(id) |
| 267 | |
| 268 | Sets the system identifier of this :class:`InputSource`. |
| 269 | |
| 270 | |
| 271 | .. method:: InputSource.getSystemId() |
| 272 | |
| 273 | Returns the system identifier of this :class:`InputSource`. |
| 274 | |
| 275 | |
| 276 | .. method:: InputSource.setEncoding(encoding) |
| 277 | |
| 278 | Sets the character encoding of this :class:`InputSource`. |
| 279 | |
| 280 | The encoding must be a string acceptable for an XML encoding declaration (see |
| 281 | section 4.3.3 of the XML recommendation). |
| 282 | |
| 283 | The encoding attribute of the :class:`InputSource` is ignored if the |
| 284 | :class:`InputSource` also contains a character stream. |
| 285 | |
| 286 | |
| 287 | .. method:: InputSource.getEncoding() |
| 288 | |
| 289 | Get the character encoding of this InputSource. |
| 290 | |
| 291 | |
| 292 | .. method:: InputSource.setByteStream(bytefile) |
| 293 | |
| 294 | Set the byte stream (a Python file-like object which does not perform |
| 295 | byte-to-character conversion) for this input source. |
| 296 | |
| 297 | The SAX parser will ignore this if there is also a character stream specified, |
| 298 | but it will use a byte stream in preference to opening a URI connection itself. |
| 299 | |
| 300 | If the application knows the character encoding of the byte stream, it should |
| 301 | set it with the setEncoding method. |
| 302 | |
| 303 | |
| 304 | .. method:: InputSource.getByteStream() |
| 305 | |
| 306 | Get the byte stream for this input source. |
| 307 | |
| 308 | The getEncoding method will return the character encoding for this byte stream, |
| 309 | or None if unknown. |
| 310 | |
| 311 | |
| 312 | .. method:: InputSource.setCharacterStream(charfile) |
| 313 | |
| 314 | Set the character stream for this input source. (The stream must be a Python 1.6 |
| 315 | Unicode-wrapped file-like that performs conversion to Unicode strings.) |
| 316 | |
| 317 | If there is a character stream specified, the SAX parser will ignore any byte |
| 318 | stream and will not attempt to open a URI connection to the system identifier. |
| 319 | |
| 320 | |
| 321 | .. method:: InputSource.getCharacterStream() |
| 322 | |
| 323 | Get the character stream for this input source. |
| 324 | |
| 325 | |
| 326 | .. _attributes-objects: |
| 327 | |
| 328 | The :class:`Attributes` Interface |
| 329 | --------------------------------- |
| 330 | |
| 331 | :class:`Attributes` objects implement a portion of the mapping protocol, |
Serhiy Storchaka | b33336f | 2013-10-13 23:09:00 +0300 | [diff] [blame] | 332 | including the methods :meth:`~collections.Mapping.copy`, |
| 333 | :meth:`~collections.Mapping.get`, |
| 334 | :meth:`~collections.Mapping.has_key`, |
| 335 | :meth:`~collections.Mapping.items`, |
| 336 | :meth:`~collections.Mapping.keys`, |
| 337 | and :meth:`~collections.Mapping.values`. The following methods |
Serhiy Storchaka | 7653e26 | 2013-08-29 10:34:23 +0300 | [diff] [blame] | 338 | are also provided: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 339 | |
| 340 | |
| 341 | .. method:: Attributes.getLength() |
| 342 | |
| 343 | Return the number of attributes. |
| 344 | |
| 345 | |
| 346 | .. method:: Attributes.getNames() |
| 347 | |
| 348 | Return the names of the attributes. |
| 349 | |
| 350 | |
| 351 | .. method:: Attributes.getType(name) |
| 352 | |
| 353 | Returns the type of the attribute *name*, which is normally ``'CDATA'``. |
| 354 | |
| 355 | |
| 356 | .. method:: Attributes.getValue(name) |
| 357 | |
| 358 | Return the value of attribute *name*. |
| 359 | |
Georg Brandl | b19be57 | 2007-12-29 10:57:00 +0000 | [diff] [blame] | 360 | .. getValueByQName, getNameByQName, getQNameByName, getQNames available |
| 361 | .. here already, but documented only for derived class. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 362 | |
| 363 | |
| 364 | .. _attributes-ns-objects: |
| 365 | |
| 366 | The :class:`AttributesNS` Interface |
| 367 | ----------------------------------- |
| 368 | |
| 369 | This interface is a subtype of the :class:`Attributes` interface (see section |
| 370 | :ref:`attributes-objects`). All methods supported by that interface are also |
| 371 | available on :class:`AttributesNS` objects. |
| 372 | |
| 373 | The following methods are also available: |
| 374 | |
| 375 | |
| 376 | .. method:: AttributesNS.getValueByQName(name) |
| 377 | |
| 378 | Return the value for a qualified name. |
| 379 | |
| 380 | |
| 381 | .. method:: AttributesNS.getNameByQName(name) |
| 382 | |
| 383 | Return the ``(namespace, localname)`` pair for a qualified *name*. |
| 384 | |
| 385 | |
| 386 | .. method:: AttributesNS.getQNameByName(name) |
| 387 | |
| 388 | Return the qualified name for a ``(namespace, localname)`` pair. |
| 389 | |
| 390 | |
| 391 | .. method:: AttributesNS.getQNames() |
| 392 | |
| 393 | Return the qualified names of all attributes. |
| 394 | |