Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`xml.sax.xmlreader` --- Interface for XML parsers |
| 2 | ====================================================== |
| 3 | |
| 4 | .. module:: xml.sax.xmlreader |
| 5 | :synopsis: Interface which SAX-compliant XML parsers must implement. |
| 6 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 7 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 8 | |
| 9 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 10 | SAX parsers implement the :class:`XMLReader` interface. They are implemented in |
| 11 | a Python module, which must provide a function :func:`create_parser`. This |
| 12 | function is invoked by :func:`xml.sax.make_parser` with no arguments to create |
| 13 | a new parser object. |
| 14 | |
| 15 | |
| 16 | .. class:: XMLReader() |
| 17 | |
| 18 | Base class which can be inherited by SAX parsers. |
| 19 | |
| 20 | |
| 21 | .. class:: IncrementalParser() |
| 22 | |
| 23 | In some cases, it is desirable not to parse an input source at once, but to feed |
| 24 | chunks of the document as they get available. Note that the reader will normally |
| 25 | not read the entire file, but read it in chunks as well; still :meth:`parse` |
| 26 | won't return until the entire document is processed. So these interfaces should |
| 27 | be used if the blocking behaviour of :meth:`parse` is not desirable. |
| 28 | |
| 29 | When the parser is instantiated it is ready to begin accepting data from the |
| 30 | feed method immediately. After parsing has been finished with a call to close |
| 31 | the reset method must be called to make the parser ready to accept new data, |
| 32 | either from feed or using the parse method. |
| 33 | |
| 34 | Note that these methods must *not* be called during parsing, that is, after |
| 35 | parse has been called and before it returns. |
| 36 | |
| 37 | By default, the class also implements the parse method of the XMLReader |
| 38 | interface using the feed, close and reset methods of the IncrementalParser |
| 39 | interface as a convenience to SAX 2.0 driver writers. |
| 40 | |
| 41 | |
| 42 | .. class:: Locator() |
| 43 | |
| 44 | Interface for associating a SAX event with a document location. A locator object |
| 45 | will return valid results only during calls to DocumentHandler methods; at any |
| 46 | other time, the results are unpredictable. If information is not available, |
| 47 | methods may return ``None``. |
| 48 | |
| 49 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 50 | .. class:: InputSource(system_id=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 51 | |
| 52 | Encapsulation of the information needed by the :class:`XMLReader` to read |
| 53 | entities. |
| 54 | |
| 55 | This class may include information about the public identifier, system |
| 56 | identifier, byte stream (possibly with character encoding information) and/or |
| 57 | the character stream of an entity. |
| 58 | |
| 59 | Applications will create objects of this class for use in the |
| 60 | :meth:`XMLReader.parse` method and for returning from |
| 61 | EntityResolver.resolveEntity. |
| 62 | |
| 63 | An :class:`InputSource` belongs to the application, the :class:`XMLReader` is |
| 64 | not allowed to modify :class:`InputSource` objects passed to it from the |
| 65 | application, although it may make copies and modify those. |
| 66 | |
| 67 | |
| 68 | .. class:: AttributesImpl(attrs) |
| 69 | |
| 70 | This is an implementation of the :class:`Attributes` interface (see section |
| 71 | :ref:`attributes-objects`). This is a dictionary-like object which |
| 72 | represents the element attributes in a :meth:`startElement` call. In addition |
| 73 | to the most useful dictionary operations, it supports a number of other |
| 74 | methods as described by the interface. Objects of this class should be |
| 75 | instantiated by readers; *attrs* must be a dictionary-like object containing |
| 76 | a mapping from attribute names to attribute values. |
| 77 | |
| 78 | |
| 79 | .. class:: AttributesNSImpl(attrs, qnames) |
| 80 | |
| 81 | Namespace-aware variant of :class:`AttributesImpl`, which will be passed to |
| 82 | :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but |
| 83 | understands attribute names as two-tuples of *namespaceURI* and |
| 84 | *localname*. In addition, it provides a number of methods expecting qualified |
| 85 | names as they appear in the original document. This class implements the |
| 86 | :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`). |
| 87 | |
| 88 | |
| 89 | .. _xmlreader-objects: |
| 90 | |
| 91 | XMLReader Objects |
| 92 | ----------------- |
| 93 | |
| 94 | The :class:`XMLReader` interface supports the following methods: |
| 95 | |
| 96 | |
| 97 | .. method:: XMLReader.parse(source) |
| 98 | |
| 99 | Process an input source, producing SAX events. The *source* object can be a |
| 100 | system identifier (a string identifying the input source -- typically a file |
| 101 | name or an URL), a file-like object, or an :class:`InputSource` object. When |
| 102 | :meth:`parse` returns, the input is completely processed, and the parser object |
| 103 | can be discarded or reset. As a limitation, the current implementation only |
| 104 | accepts byte streams; processing of character streams is for further study. |
| 105 | |
| 106 | |
| 107 | .. method:: XMLReader.getContentHandler() |
| 108 | |
| 109 | Return the current :class:`ContentHandler`. |
| 110 | |
| 111 | |
| 112 | .. method:: XMLReader.setContentHandler(handler) |
| 113 | |
| 114 | Set the current :class:`ContentHandler`. If no :class:`ContentHandler` is set, |
| 115 | content events will be discarded. |
| 116 | |
| 117 | |
| 118 | .. method:: XMLReader.getDTDHandler() |
| 119 | |
| 120 | Return the current :class:`DTDHandler`. |
| 121 | |
| 122 | |
| 123 | .. method:: XMLReader.setDTDHandler(handler) |
| 124 | |
| 125 | Set the current :class:`DTDHandler`. If no :class:`DTDHandler` is set, DTD |
| 126 | events will be discarded. |
| 127 | |
| 128 | |
| 129 | .. method:: XMLReader.getEntityResolver() |
| 130 | |
| 131 | Return the current :class:`EntityResolver`. |
| 132 | |
| 133 | |
| 134 | .. method:: XMLReader.setEntityResolver(handler) |
| 135 | |
| 136 | Set the current :class:`EntityResolver`. If no :class:`EntityResolver` is set, |
| 137 | attempts to resolve an external entity will result in opening the system |
| 138 | identifier for the entity, and fail if it is not available. |
| 139 | |
| 140 | |
| 141 | .. method:: XMLReader.getErrorHandler() |
| 142 | |
| 143 | Return the current :class:`ErrorHandler`. |
| 144 | |
| 145 | |
| 146 | .. method:: XMLReader.setErrorHandler(handler) |
| 147 | |
| 148 | Set the current error handler. If no :class:`ErrorHandler` is set, errors will |
| 149 | be raised as exceptions, and warnings will be printed. |
| 150 | |
| 151 | |
| 152 | .. method:: XMLReader.setLocale(locale) |
| 153 | |
| 154 | Allow an application to set the locale for errors and warnings. |
| 155 | |
| 156 | SAX parsers are not required to provide localization for errors and warnings; if |
| 157 | they cannot support the requested locale, however, they must throw a SAX |
| 158 | exception. Applications may request a locale change in the middle of a parse. |
| 159 | |
| 160 | |
| 161 | .. method:: XMLReader.getFeature(featurename) |
| 162 | |
| 163 | Return the current setting for feature *featurename*. If the feature is not |
| 164 | recognized, :exc:`SAXNotRecognizedException` is raised. The well-known |
| 165 | featurenames are listed in the module :mod:`xml.sax.handler`. |
| 166 | |
| 167 | |
| 168 | .. method:: XMLReader.setFeature(featurename, value) |
| 169 | |
| 170 | Set the *featurename* to *value*. If the feature is not recognized, |
| 171 | :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not |
| 172 | supported by the parser, *SAXNotSupportedException* is raised. |
| 173 | |
| 174 | |
| 175 | .. method:: XMLReader.getProperty(propertyname) |
| 176 | |
| 177 | Return the current setting for property *propertyname*. If the property is not |
| 178 | recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known |
| 179 | propertynames are listed in the module :mod:`xml.sax.handler`. |
| 180 | |
| 181 | |
| 182 | .. method:: XMLReader.setProperty(propertyname, value) |
| 183 | |
| 184 | Set the *propertyname* to *value*. If the property is not recognized, |
| 185 | :exc:`SAXNotRecognizedException` is raised. If the property or its setting is |
| 186 | not supported by the parser, *SAXNotSupportedException* is raised. |
| 187 | |
| 188 | |
| 189 | .. _incremental-parser-objects: |
| 190 | |
| 191 | IncrementalParser Objects |
| 192 | ------------------------- |
| 193 | |
| 194 | Instances of :class:`IncrementalParser` offer the following additional methods: |
| 195 | |
| 196 | |
| 197 | .. method:: IncrementalParser.feed(data) |
| 198 | |
| 199 | Process a chunk of *data*. |
| 200 | |
| 201 | |
| 202 | .. method:: IncrementalParser.close() |
| 203 | |
| 204 | Assume the end of the document. That will check well-formedness conditions that |
| 205 | can be checked only at the end, invoke handlers, and may clean up resources |
| 206 | allocated during parsing. |
| 207 | |
| 208 | |
| 209 | .. method:: IncrementalParser.reset() |
| 210 | |
| 211 | This method is called after close has been called to reset the parser so that it |
| 212 | is ready to parse new documents. The results of calling parse or feed after |
| 213 | close without calling reset are undefined. |
| 214 | |
| 215 | |
| 216 | .. _locator-objects: |
| 217 | |
| 218 | Locator Objects |
| 219 | --------------- |
| 220 | |
| 221 | Instances of :class:`Locator` provide these methods: |
| 222 | |
| 223 | |
| 224 | .. method:: Locator.getColumnNumber() |
| 225 | |
| 226 | Return the column number where the current event ends. |
| 227 | |
| 228 | |
| 229 | .. method:: Locator.getLineNumber() |
| 230 | |
| 231 | Return the line number where the current event ends. |
| 232 | |
| 233 | |
| 234 | .. method:: Locator.getPublicId() |
| 235 | |
| 236 | Return the public identifier for the current event. |
| 237 | |
| 238 | |
| 239 | .. method:: Locator.getSystemId() |
| 240 | |
| 241 | Return the system identifier for the current event. |
| 242 | |
| 243 | |
| 244 | .. _input-source-objects: |
| 245 | |
| 246 | InputSource Objects |
| 247 | ------------------- |
| 248 | |
| 249 | |
| 250 | .. method:: InputSource.setPublicId(id) |
| 251 | |
| 252 | Sets the public identifier of this :class:`InputSource`. |
| 253 | |
| 254 | |
| 255 | .. method:: InputSource.getPublicId() |
| 256 | |
| 257 | Returns the public identifier of this :class:`InputSource`. |
| 258 | |
| 259 | |
| 260 | .. method:: InputSource.setSystemId(id) |
| 261 | |
| 262 | Sets the system identifier of this :class:`InputSource`. |
| 263 | |
| 264 | |
| 265 | .. method:: InputSource.getSystemId() |
| 266 | |
| 267 | Returns the system identifier of this :class:`InputSource`. |
| 268 | |
| 269 | |
| 270 | .. method:: InputSource.setEncoding(encoding) |
| 271 | |
| 272 | Sets the character encoding of this :class:`InputSource`. |
| 273 | |
| 274 | The encoding must be a string acceptable for an XML encoding declaration (see |
| 275 | section 4.3.3 of the XML recommendation). |
| 276 | |
| 277 | The encoding attribute of the :class:`InputSource` is ignored if the |
| 278 | :class:`InputSource` also contains a character stream. |
| 279 | |
| 280 | |
| 281 | .. method:: InputSource.getEncoding() |
| 282 | |
| 283 | Get the character encoding of this InputSource. |
| 284 | |
| 285 | |
| 286 | .. method:: InputSource.setByteStream(bytefile) |
| 287 | |
| 288 | Set the byte stream (a Python file-like object which does not perform |
| 289 | byte-to-character conversion) for this input source. |
| 290 | |
| 291 | The SAX parser will ignore this if there is also a character stream specified, |
| 292 | but it will use a byte stream in preference to opening a URI connection itself. |
| 293 | |
| 294 | If the application knows the character encoding of the byte stream, it should |
| 295 | set it with the setEncoding method. |
| 296 | |
| 297 | |
| 298 | .. method:: InputSource.getByteStream() |
| 299 | |
| 300 | Get the byte stream for this input source. |
| 301 | |
| 302 | The getEncoding method will return the character encoding for this byte stream, |
| 303 | or None if unknown. |
| 304 | |
| 305 | |
| 306 | .. method:: InputSource.setCharacterStream(charfile) |
| 307 | |
| 308 | Set the character stream for this input source. (The stream must be a Python 1.6 |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 309 | Unicode-wrapped file-like that performs conversion to strings.) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 310 | |
| 311 | If there is a character stream specified, the SAX parser will ignore any byte |
| 312 | stream and will not attempt to open a URI connection to the system identifier. |
| 313 | |
| 314 | |
| 315 | .. method:: InputSource.getCharacterStream() |
| 316 | |
| 317 | Get the character stream for this input source. |
| 318 | |
| 319 | |
| 320 | .. _attributes-objects: |
| 321 | |
| 322 | The :class:`Attributes` Interface |
| 323 | --------------------------------- |
| 324 | |
| 325 | :class:`Attributes` objects implement a portion of the mapping protocol, |
Collin Winter | f6b8121 | 2007-09-10 00:03:41 +0000 | [diff] [blame] | 326 | including the methods :meth:`copy`, :meth:`get`, :meth:`__contains__`, |
| 327 | :meth:`items`, :meth:`keys`, and :meth:`values`. The following methods |
| 328 | are also provided: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 329 | |
| 330 | |
| 331 | .. method:: Attributes.getLength() |
| 332 | |
| 333 | Return the number of attributes. |
| 334 | |
| 335 | |
| 336 | .. method:: Attributes.getNames() |
| 337 | |
| 338 | Return the names of the attributes. |
| 339 | |
| 340 | |
| 341 | .. method:: Attributes.getType(name) |
| 342 | |
| 343 | Returns the type of the attribute *name*, which is normally ``'CDATA'``. |
| 344 | |
| 345 | |
| 346 | .. method:: Attributes.getValue(name) |
| 347 | |
| 348 | Return the value of attribute *name*. |
| 349 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 350 | .. getValueByQName, getNameByQName, getQNameByName, getQNames available |
| 351 | .. here already, but documented only for derived class. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 352 | |
| 353 | |
| 354 | .. _attributes-ns-objects: |
| 355 | |
| 356 | The :class:`AttributesNS` Interface |
| 357 | ----------------------------------- |
| 358 | |
| 359 | This interface is a subtype of the :class:`Attributes` interface (see section |
| 360 | :ref:`attributes-objects`). All methods supported by that interface are also |
| 361 | available on :class:`AttributesNS` objects. |
| 362 | |
| 363 | The following methods are also available: |
| 364 | |
| 365 | |
| 366 | .. method:: AttributesNS.getValueByQName(name) |
| 367 | |
| 368 | Return the value for a qualified name. |
| 369 | |
| 370 | |
| 371 | .. method:: AttributesNS.getNameByQName(name) |
| 372 | |
| 373 | Return the ``(namespace, localname)`` pair for a qualified *name*. |
| 374 | |
| 375 | |
| 376 | .. method:: AttributesNS.getQNameByName(name) |
| 377 | |
| 378 | Return the qualified name for a ``(namespace, localname)`` pair. |
| 379 | |
| 380 | |
| 381 | .. method:: AttributesNS.getQNames() |
| 382 | |
| 383 | Return the qualified names of all attributes. |
| 384 | |