Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`xml.sax.xmlreader` --- Interface for XML parsers |
| 2 | ====================================================== |
| 3 | |
| 4 | .. module:: xml.sax.xmlreader |
| 5 | :synopsis: Interface which SAX-compliant XML parsers must implement. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 9 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 10 | **Source code:** :source:`Lib/xml/sax/xmlreader.py` |
| 11 | |
| 12 | -------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 13 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | SAX parsers implement the :class:`XMLReader` interface. They are implemented in |
| 15 | a Python module, which must provide a function :func:`create_parser`. This |
| 16 | function is invoked by :func:`xml.sax.make_parser` with no arguments to create |
| 17 | a new parser object. |
| 18 | |
| 19 | |
| 20 | .. class:: XMLReader() |
| 21 | |
| 22 | Base class which can be inherited by SAX parsers. |
| 23 | |
| 24 | |
| 25 | .. class:: IncrementalParser() |
| 26 | |
| 27 | In some cases, it is desirable not to parse an input source at once, but to feed |
| 28 | chunks of the document as they get available. Note that the reader will normally |
| 29 | not read the entire file, but read it in chunks as well; still :meth:`parse` |
| 30 | won't return until the entire document is processed. So these interfaces should |
| 31 | be used if the blocking behaviour of :meth:`parse` is not desirable. |
| 32 | |
| 33 | When the parser is instantiated it is ready to begin accepting data from the |
| 34 | feed method immediately. After parsing has been finished with a call to close |
| 35 | the reset method must be called to make the parser ready to accept new data, |
| 36 | either from feed or using the parse method. |
| 37 | |
| 38 | Note that these methods must *not* be called during parsing, that is, after |
| 39 | parse has been called and before it returns. |
| 40 | |
| 41 | By default, the class also implements the parse method of the XMLReader |
| 42 | interface using the feed, close and reset methods of the IncrementalParser |
| 43 | interface as a convenience to SAX 2.0 driver writers. |
| 44 | |
| 45 | |
| 46 | .. class:: Locator() |
| 47 | |
| 48 | Interface for associating a SAX event with a document location. A locator object |
| 49 | will return valid results only during calls to DocumentHandler methods; at any |
| 50 | other time, the results are unpredictable. If information is not available, |
| 51 | methods may return ``None``. |
| 52 | |
| 53 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 54 | .. class:: InputSource(system_id=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 55 | |
| 56 | Encapsulation of the information needed by the :class:`XMLReader` to read |
| 57 | entities. |
| 58 | |
| 59 | This class may include information about the public identifier, system |
| 60 | identifier, byte stream (possibly with character encoding information) and/or |
| 61 | the character stream of an entity. |
| 62 | |
| 63 | Applications will create objects of this class for use in the |
| 64 | :meth:`XMLReader.parse` method and for returning from |
| 65 | EntityResolver.resolveEntity. |
| 66 | |
| 67 | An :class:`InputSource` belongs to the application, the :class:`XMLReader` is |
| 68 | not allowed to modify :class:`InputSource` objects passed to it from the |
| 69 | application, although it may make copies and modify those. |
| 70 | |
| 71 | |
| 72 | .. class:: AttributesImpl(attrs) |
| 73 | |
| 74 | This is an implementation of the :class:`Attributes` interface (see section |
| 75 | :ref:`attributes-objects`). This is a dictionary-like object which |
| 76 | represents the element attributes in a :meth:`startElement` call. In addition |
| 77 | to the most useful dictionary operations, it supports a number of other |
| 78 | methods as described by the interface. Objects of this class should be |
| 79 | instantiated by readers; *attrs* must be a dictionary-like object containing |
| 80 | a mapping from attribute names to attribute values. |
| 81 | |
| 82 | |
| 83 | .. class:: AttributesNSImpl(attrs, qnames) |
| 84 | |
| 85 | Namespace-aware variant of :class:`AttributesImpl`, which will be passed to |
| 86 | :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but |
| 87 | understands attribute names as two-tuples of *namespaceURI* and |
| 88 | *localname*. In addition, it provides a number of methods expecting qualified |
| 89 | names as they appear in the original document. This class implements the |
| 90 | :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). |
| 91 | |
| 92 | |
| 93 | .. _xmlreader-objects: |
| 94 | |
| 95 | XMLReader Objects |
| 96 | ----------------- |
| 97 | |
| 98 | The :class:`XMLReader` interface supports the following methods: |
| 99 | |
| 100 | |
| 101 | .. method:: XMLReader.parse(source) |
| 102 | |
| 103 | Process an input source, producing SAX events. The *source* object can be a |
| 104 | system identifier (a string identifying the input source -- typically a file |
Mickaël Schoentgen | 929b704 | 2019-04-14 09:16:54 +0000 | [diff] [blame] | 105 | name or a URL), a :class:`pathlib.Path` or :term:`path-like <path-like object>` |
| 106 | object, or an :class:`InputSource` object. When |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 107 | :meth:`parse` returns, the input is completely processed, and the parser object |
Serhiy Storchaka | 61de087 | 2015-04-02 21:00:13 +0300 | [diff] [blame] | 108 | can be discarded or reset. |
| 109 | |
| 110 | .. versionchanged:: 3.5 |
| 111 | Added support of character streams. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 112 | |
Mickaël Schoentgen | 929b704 | 2019-04-14 09:16:54 +0000 | [diff] [blame] | 113 | .. versionchanged:: 3.8 |
| 114 | Added support of path-like objects. |
| 115 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 | |
| 117 | .. method:: XMLReader.getContentHandler() |
| 118 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 119 | Return the current :class:`~xml.sax.handler.ContentHandler`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 120 | |
| 121 | |
| 122 | .. method:: XMLReader.setContentHandler(handler) |
| 123 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 124 | Set the current :class:`~xml.sax.handler.ContentHandler`. If no |
| 125 | :class:`~xml.sax.handler.ContentHandler` is set, content events will be |
| 126 | discarded. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 127 | |
| 128 | |
| 129 | .. method:: XMLReader.getDTDHandler() |
| 130 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 131 | Return the current :class:`~xml.sax.handler.DTDHandler`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 132 | |
| 133 | |
| 134 | .. method:: XMLReader.setDTDHandler(handler) |
| 135 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 136 | Set the current :class:`~xml.sax.handler.DTDHandler`. If no |
| 137 | :class:`~xml.sax.handler.DTDHandler` is set, DTD |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 | events will be discarded. |
| 139 | |
| 140 | |
| 141 | .. method:: XMLReader.getEntityResolver() |
| 142 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 143 | Return the current :class:`~xml.sax.handler.EntityResolver`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 144 | |
| 145 | |
| 146 | .. method:: XMLReader.setEntityResolver(handler) |
| 147 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 148 | Set the current :class:`~xml.sax.handler.EntityResolver`. If no |
| 149 | :class:`~xml.sax.handler.EntityResolver` is set, |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 150 | attempts to resolve an external entity will result in opening the system |
| 151 | identifier for the entity, and fail if it is not available. |
| 152 | |
| 153 | |
| 154 | .. method:: XMLReader.getErrorHandler() |
| 155 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 156 | Return the current :class:`~xml.sax.handler.ErrorHandler`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 | |
| 158 | |
| 159 | .. method:: XMLReader.setErrorHandler(handler) |
| 160 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 161 | Set the current error handler. If no :class:`~xml.sax.handler.ErrorHandler` |
| 162 | is set, errors will be raised as exceptions, and warnings will be printed. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 163 | |
| 164 | |
| 165 | .. method:: XMLReader.setLocale(locale) |
| 166 | |
| 167 | Allow an application to set the locale for errors and warnings. |
| 168 | |
| 169 | SAX parsers are not required to provide localization for errors and warnings; if |
Georg Brandl | 7cb1319 | 2010-08-03 12:06:29 +0000 | [diff] [blame] | 170 | they cannot support the requested locale, however, they must raise a SAX |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 171 | exception. Applications may request a locale change in the middle of a parse. |
| 172 | |
| 173 | |
| 174 | .. method:: XMLReader.getFeature(featurename) |
| 175 | |
| 176 | Return the current setting for feature *featurename*. If the feature is not |
| 177 | recognized, :exc:`SAXNotRecognizedException` is raised. The well-known |
| 178 | featurenames are listed in the module :mod:`xml.sax.handler`. |
| 179 | |
| 180 | |
| 181 | .. method:: XMLReader.setFeature(featurename, value) |
| 182 | |
| 183 | Set the *featurename* to *value*. If the feature is not recognized, |
| 184 | :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not |
| 185 | supported by the parser, *SAXNotSupportedException* is raised. |
| 186 | |
| 187 | |
| 188 | .. method:: XMLReader.getProperty(propertyname) |
| 189 | |
| 190 | Return the current setting for property *propertyname*. If the property is not |
| 191 | recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known |
| 192 | propertynames are listed in the module :mod:`xml.sax.handler`. |
| 193 | |
| 194 | |
| 195 | .. method:: XMLReader.setProperty(propertyname, value) |
| 196 | |
| 197 | Set the *propertyname* to *value*. If the property is not recognized, |
| 198 | :exc:`SAXNotRecognizedException` is raised. If the property or its setting is |
| 199 | not supported by the parser, *SAXNotSupportedException* is raised. |
| 200 | |
| 201 | |
| 202 | .. _incremental-parser-objects: |
| 203 | |
| 204 | IncrementalParser Objects |
| 205 | ------------------------- |
| 206 | |
| 207 | Instances of :class:`IncrementalParser` offer the following additional methods: |
| 208 | |
| 209 | |
| 210 | .. method:: IncrementalParser.feed(data) |
| 211 | |
| 212 | Process a chunk of *data*. |
| 213 | |
| 214 | |
| 215 | .. method:: IncrementalParser.close() |
| 216 | |
| 217 | Assume the end of the document. That will check well-formedness conditions that |
| 218 | can be checked only at the end, invoke handlers, and may clean up resources |
| 219 | allocated during parsing. |
| 220 | |
| 221 | |
| 222 | .. method:: IncrementalParser.reset() |
| 223 | |
| 224 | This method is called after close has been called to reset the parser so that it |
| 225 | is ready to parse new documents. The results of calling parse or feed after |
| 226 | close without calling reset are undefined. |
| 227 | |
| 228 | |
| 229 | .. _locator-objects: |
| 230 | |
| 231 | Locator Objects |
| 232 | --------------- |
| 233 | |
| 234 | Instances of :class:`Locator` provide these methods: |
| 235 | |
| 236 | |
| 237 | .. method:: Locator.getColumnNumber() |
| 238 | |
R David Murray | f86959d | 2016-06-02 15:14:30 -0400 | [diff] [blame] | 239 | Return the column number where the current event begins. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 240 | |
| 241 | |
| 242 | .. method:: Locator.getLineNumber() |
| 243 | |
R David Murray | f86959d | 2016-06-02 15:14:30 -0400 | [diff] [blame] | 244 | Return the line number where the current event begins. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 245 | |
| 246 | |
| 247 | .. method:: Locator.getPublicId() |
| 248 | |
| 249 | Return the public identifier for the current event. |
| 250 | |
| 251 | |
| 252 | .. method:: Locator.getSystemId() |
| 253 | |
| 254 | Return the system identifier for the current event. |
| 255 | |
| 256 | |
| 257 | .. _input-source-objects: |
| 258 | |
| 259 | InputSource Objects |
| 260 | ------------------- |
| 261 | |
| 262 | |
| 263 | .. method:: InputSource.setPublicId(id) |
| 264 | |
| 265 | Sets the public identifier of this :class:`InputSource`. |
| 266 | |
| 267 | |
| 268 | .. method:: InputSource.getPublicId() |
| 269 | |
| 270 | Returns the public identifier of this :class:`InputSource`. |
| 271 | |
| 272 | |
| 273 | .. method:: InputSource.setSystemId(id) |
| 274 | |
| 275 | Sets the system identifier of this :class:`InputSource`. |
| 276 | |
| 277 | |
| 278 | .. method:: InputSource.getSystemId() |
| 279 | |
| 280 | Returns the system identifier of this :class:`InputSource`. |
| 281 | |
| 282 | |
| 283 | .. method:: InputSource.setEncoding(encoding) |
| 284 | |
| 285 | Sets the character encoding of this :class:`InputSource`. |
| 286 | |
| 287 | The encoding must be a string acceptable for an XML encoding declaration (see |
| 288 | section 4.3.3 of the XML recommendation). |
| 289 | |
| 290 | The encoding attribute of the :class:`InputSource` is ignored if the |
| 291 | :class:`InputSource` also contains a character stream. |
| 292 | |
| 293 | |
| 294 | .. method:: InputSource.getEncoding() |
| 295 | |
| 296 | Get the character encoding of this InputSource. |
| 297 | |
| 298 | |
| 299 | .. method:: InputSource.setByteStream(bytefile) |
| 300 | |
Serhiy Storchaka | 61de087 | 2015-04-02 21:00:13 +0300 | [diff] [blame] | 301 | Set the byte stream (a :term:`binary file`) for this input source. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 302 | |
| 303 | The SAX parser will ignore this if there is also a character stream specified, |
| 304 | but it will use a byte stream in preference to opening a URI connection itself. |
| 305 | |
| 306 | If the application knows the character encoding of the byte stream, it should |
| 307 | set it with the setEncoding method. |
| 308 | |
| 309 | |
| 310 | .. method:: InputSource.getByteStream() |
| 311 | |
| 312 | Get the byte stream for this input source. |
| 313 | |
| 314 | The getEncoding method will return the character encoding for this byte stream, |
Serhiy Storchaka | ecf41da | 2016-10-19 16:29:26 +0300 | [diff] [blame] | 315 | or ``None`` if unknown. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 316 | |
| 317 | |
| 318 | .. method:: InputSource.setCharacterStream(charfile) |
| 319 | |
Serhiy Storchaka | 61de087 | 2015-04-02 21:00:13 +0300 | [diff] [blame] | 320 | Set the character stream (a :term:`text file`) for this input source. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 321 | |
| 322 | If there is a character stream specified, the SAX parser will ignore any byte |
| 323 | stream and will not attempt to open a URI connection to the system identifier. |
| 324 | |
| 325 | |
| 326 | .. method:: InputSource.getCharacterStream() |
| 327 | |
| 328 | Get the character stream for this input source. |
| 329 | |
| 330 | |
| 331 | .. _attributes-objects: |
| 332 | |
| 333 | The :class:`Attributes` Interface |
| 334 | --------------------------------- |
| 335 | |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 336 | :class:`Attributes` objects implement a portion of the :term:`mapping protocol |
| 337 | <mapping>`, including the methods :meth:`~collections.abc.Mapping.copy`, |
| 338 | :meth:`~collections.abc.Mapping.get`, :meth:`~object.__contains__`, |
| 339 | :meth:`~collections.abc.Mapping.items`, :meth:`~collections.abc.Mapping.keys`, |
| 340 | and :meth:`~collections.abc.Mapping.values`. The following methods |
Collin Winter | f6b8121 | 2007-09-10 00:03:41 +0000 | [diff] [blame] | 341 | are also provided: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 342 | |
| 343 | |
| 344 | .. method:: Attributes.getLength() |
| 345 | |
| 346 | Return the number of attributes. |
| 347 | |
| 348 | |
| 349 | .. method:: Attributes.getNames() |
| 350 | |
| 351 | Return the names of the attributes. |
| 352 | |
| 353 | |
| 354 | .. method:: Attributes.getType(name) |
| 355 | |
| 356 | Returns the type of the attribute *name*, which is normally ``'CDATA'``. |
| 357 | |
| 358 | |
| 359 | .. method:: Attributes.getValue(name) |
| 360 | |
| 361 | Return the value of attribute *name*. |
| 362 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 363 | .. getValueByQName, getNameByQName, getQNameByName, getQNames available |
| 364 | .. here already, but documented only for derived class. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 365 | |
| 366 | |
| 367 | .. _attributes-ns-objects: |
| 368 | |
| 369 | The :class:`AttributesNS` Interface |
| 370 | ----------------------------------- |
| 371 | |
| 372 | This interface is a subtype of the :class:`Attributes` interface (see section |
| 373 | :ref:`attributes-objects`). All methods supported by that interface are also |
| 374 | available on :class:`AttributesNS` objects. |
| 375 | |
| 376 | The following methods are also available: |
| 377 | |
| 378 | |
| 379 | .. method:: AttributesNS.getValueByQName(name) |
| 380 | |
| 381 | Return the value for a qualified name. |
| 382 | |
| 383 | |
| 384 | .. method:: AttributesNS.getNameByQName(name) |
| 385 | |
| 386 | Return the ``(namespace, localname)`` pair for a qualified *name*. |
| 387 | |
| 388 | |
| 389 | .. method:: AttributesNS.getQNameByName(name) |
| 390 | |
| 391 | Return the qualified name for a ``(namespace, localname)`` pair. |
| 392 | |
| 393 | |
| 394 | .. method:: AttributesNS.getQNames() |
| 395 | |
| 396 | Return the qualified names of all attributes. |
| 397 | |