Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`xml.sax.handler` --- Base classes for SAX handlers |
| 2 | ======================================================== |
| 3 | |
| 4 | .. module:: xml.sax.handler |
| 5 | :synopsis: Base classes for SAX event handlers. |
| 6 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 7 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 8 | |
| 9 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 10 | The SAX API defines four kinds of handlers: content handlers, DTD handlers, |
| 11 | error handlers, and entity resolvers. Applications normally only need to |
| 12 | implement those interfaces whose events they are interested in; they can |
| 13 | implement the interfaces in a single object or in multiple objects. Handler |
| 14 | implementations should inherit from the base classes provided in the module |
| 15 | :mod:`xml.sax.handler`, so that all methods get default implementations. |
| 16 | |
| 17 | |
| 18 | .. class:: ContentHandler |
| 19 | |
| 20 | This is the main callback interface in SAX, and the one most important to |
| 21 | applications. The order of events in this interface mirrors the order of the |
| 22 | information in the document. |
| 23 | |
| 24 | |
| 25 | .. class:: DTDHandler |
| 26 | |
| 27 | Handle DTD events. |
| 28 | |
| 29 | This interface specifies only those DTD events required for basic parsing |
| 30 | (unparsed entities and attributes). |
| 31 | |
| 32 | |
| 33 | .. class:: EntityResolver |
| 34 | |
| 35 | Basic interface for resolving entities. If you create an object implementing |
| 36 | this interface, then register the object with your Parser, the parser will call |
| 37 | the method in your object to resolve all external entities. |
| 38 | |
| 39 | |
| 40 | .. class:: ErrorHandler |
| 41 | |
| 42 | Interface used by the parser to present error and warning messages to the |
| 43 | application. The methods of this object control whether errors are immediately |
| 44 | converted to exceptions or are handled in some other way. |
| 45 | |
| 46 | In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants |
| 47 | for the feature and property names. |
| 48 | |
| 49 | |
| 50 | .. data:: feature_namespaces |
| 51 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 52 | | value: ``"http://xml.org/sax/features/namespaces"`` |
| 53 | | true: Perform Namespace processing. |
| 54 | | false: Optionally do not perform Namespace processing (implies |
| 55 | namespace-prefixes; default). |
| 56 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 57 | |
| 58 | |
| 59 | .. data:: feature_namespace_prefixes |
| 60 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 61 | | value: ``"http://xml.org/sax/features/namespace-prefixes"`` |
| 62 | | true: Report the original prefixed names and attributes used for Namespace |
| 63 | declarations. |
| 64 | | false: Do not report attributes used for Namespace declarations, and |
| 65 | optionally do not report original prefixed names (default). |
| 66 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 67 | |
| 68 | |
| 69 | .. data:: feature_string_interning |
| 70 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 71 | | value: ``"http://xml.org/sax/features/string-interning"`` |
| 72 | | true: All element names, prefixes, attribute names, Namespace URIs, and |
| 73 | local names are interned using the built-in intern function. |
| 74 | | false: Names are not necessarily interned, although they may be (default). |
| 75 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | |
| 77 | |
| 78 | .. data:: feature_validation |
| 79 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 80 | | value: ``"http://xml.org/sax/features/validation"`` |
| 81 | | true: Report all validation errors (implies external-general-entities and |
| 82 | external-parameter-entities). |
| 83 | | false: Do not report validation errors. |
| 84 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 85 | |
| 86 | |
| 87 | .. data:: feature_external_ges |
| 88 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 89 | | value: ``"http://xml.org/sax/features/external-general-entities"`` |
| 90 | | true: Include all external general (text) entities. |
| 91 | | false: Do not include external general entities. |
| 92 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 93 | |
| 94 | |
| 95 | .. data:: feature_external_pes |
| 96 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 97 | | value: ``"http://xml.org/sax/features/external-parameter-entities"`` |
| 98 | | true: Include all external parameter entities, including the external DTD |
| 99 | subset. |
| 100 | | false: Do not include any external parameter entities, even the external |
| 101 | DTD subset. |
| 102 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 103 | |
| 104 | |
| 105 | .. data:: all_features |
| 106 | |
| 107 | List of all features. |
| 108 | |
| 109 | |
| 110 | .. data:: property_lexical_handler |
| 111 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 112 | | value: ``"http://xml.org/sax/properties/lexical-handler"`` |
| 113 | | data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2) |
| 114 | | description: An optional extension handler for lexical events like |
| 115 | comments. |
| 116 | | access: read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 117 | |
| 118 | |
| 119 | .. data:: property_declaration_handler |
| 120 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 121 | | value: ``"http://xml.org/sax/properties/declaration-handler"`` |
| 122 | | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2) |
| 123 | | description: An optional extension handler for DTD-related events other |
| 124 | than notations and unparsed entities. |
| 125 | | access: read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 126 | |
| 127 | |
| 128 | .. data:: property_dom_node |
| 129 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 130 | | value: ``"http://xml.org/sax/properties/dom-node"`` |
| 131 | | data type: org.w3c.dom.Node (not supported in Python 2) |
| 132 | | description: When parsing, the current DOM node being visited if this is |
| 133 | a DOM iterator; when not parsing, the root DOM node for iteration. |
| 134 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 | |
| 136 | |
| 137 | .. data:: property_xml_string |
| 138 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 139 | | value: ``"http://xml.org/sax/properties/xml-string"`` |
| 140 | | data type: String |
| 141 | | description: The literal string of characters that was the source for the |
| 142 | current event. |
| 143 | | access: read-only |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 144 | |
| 145 | |
| 146 | .. data:: all_properties |
| 147 | |
| 148 | List of all known property names. |
| 149 | |
| 150 | |
| 151 | .. _content-handler-objects: |
| 152 | |
| 153 | ContentHandler Objects |
| 154 | ---------------------- |
| 155 | |
| 156 | Users are expected to subclass :class:`ContentHandler` to support their |
| 157 | application. The following methods are called by the parser on the appropriate |
| 158 | events in the input document: |
| 159 | |
| 160 | |
| 161 | .. method:: ContentHandler.setDocumentLocator(locator) |
| 162 | |
| 163 | Called by the parser to give the application a locator for locating the origin |
| 164 | of document events. |
| 165 | |
| 166 | SAX parsers are strongly encouraged (though not absolutely required) to supply a |
| 167 | locator: if it does so, it must supply the locator to the application by |
| 168 | invoking this method before invoking any of the other methods in the |
| 169 | DocumentHandler interface. |
| 170 | |
| 171 | The locator allows the application to determine the end position of any |
| 172 | document-related event, even if the parser is not reporting an error. Typically, |
| 173 | the application will use this information for reporting its own errors (such as |
| 174 | character content that does not match an application's business rules). The |
| 175 | information returned by the locator is probably not sufficient for use with a |
| 176 | search engine. |
| 177 | |
| 178 | Note that the locator will return correct information only during the invocation |
| 179 | of the events in this interface. The application should not attempt to use it at |
| 180 | any other time. |
| 181 | |
| 182 | |
| 183 | .. method:: ContentHandler.startDocument() |
| 184 | |
| 185 | Receive notification of the beginning of a document. |
| 186 | |
| 187 | The SAX parser will invoke this method only once, before any other methods in |
| 188 | this interface or in DTDHandler (except for :meth:`setDocumentLocator`). |
| 189 | |
| 190 | |
| 191 | .. method:: ContentHandler.endDocument() |
| 192 | |
| 193 | Receive notification of the end of a document. |
| 194 | |
| 195 | The SAX parser will invoke this method only once, and it will be the last method |
| 196 | invoked during the parse. The parser shall not invoke this method until it has |
| 197 | either abandoned parsing (because of an unrecoverable error) or reached the end |
| 198 | of input. |
| 199 | |
| 200 | |
| 201 | .. method:: ContentHandler.startPrefixMapping(prefix, uri) |
| 202 | |
| 203 | Begin the scope of a prefix-URI Namespace mapping. |
| 204 | |
| 205 | The information from this event is not necessary for normal Namespace |
| 206 | processing: the SAX XML reader will automatically replace prefixes for element |
| 207 | and attribute names when the ``feature_namespaces`` feature is enabled (the |
| 208 | default). |
| 209 | |
| 210 | There are cases, however, when applications need to use prefixes in character |
| 211 | data or in attribute values, where they cannot safely be expanded automatically; |
| 212 | the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the |
| 213 | information to the application to expand prefixes in those contexts itself, if |
| 214 | necessary. |
| 215 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 216 | .. XXX This is not really the default, is it? MvL |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 217 | |
| 218 | Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not |
| 219 | guaranteed to be properly nested relative to each-other: all |
| 220 | :meth:`startPrefixMapping` events will occur before the corresponding |
| 221 | :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur |
| 222 | after the corresponding :meth:`endElement` event, but their order is not |
| 223 | guaranteed. |
| 224 | |
| 225 | |
| 226 | .. method:: ContentHandler.endPrefixMapping(prefix) |
| 227 | |
| 228 | End the scope of a prefix-URI mapping. |
| 229 | |
| 230 | See :meth:`startPrefixMapping` for details. This event will always occur after |
| 231 | the corresponding :meth:`endElement` event, but the order of |
| 232 | :meth:`endPrefixMapping` events is not otherwise guaranteed. |
| 233 | |
| 234 | |
| 235 | .. method:: ContentHandler.startElement(name, attrs) |
| 236 | |
| 237 | Signals the start of an element in non-namespace mode. |
| 238 | |
| 239 | The *name* parameter contains the raw XML 1.0 name of the element type as a |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 240 | string and the *attrs* parameter holds an object of the |
| 241 | :class:`~xml.sax.xmlreader.Attributes` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 242 | interface (see :ref:`attributes-objects`) containing the attributes of |
| 243 | the element. The object passed as *attrs* may be re-used by the parser; holding |
| 244 | on to a reference to it is not a reliable way to keep a copy of the attributes. |
| 245 | To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* |
| 246 | object. |
| 247 | |
| 248 | |
| 249 | .. method:: ContentHandler.endElement(name) |
| 250 | |
| 251 | Signals the end of an element in non-namespace mode. |
| 252 | |
| 253 | The *name* parameter contains the name of the element type, just as with the |
| 254 | :meth:`startElement` event. |
| 255 | |
| 256 | |
| 257 | .. method:: ContentHandler.startElementNS(name, qname, attrs) |
| 258 | |
| 259 | Signals the start of an element in namespace mode. |
| 260 | |
| 261 | The *name* parameter contains the name of the element type as a ``(uri, |
| 262 | localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in |
| 263 | the source document, and the *attrs* parameter holds an instance of the |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 264 | :class:`~xml.sax.xmlreader.AttributesNS` interface (see |
| 265 | :ref:`attributes-ns-objects`) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 266 | containing the attributes of the element. If no namespace is associated with |
| 267 | the element, the *uri* component of *name* will be ``None``. The object passed |
| 268 | as *attrs* may be re-used by the parser; holding on to a reference to it is not |
| 269 | a reliable way to keep a copy of the attributes. To keep a copy of the |
| 270 | attributes, use the :meth:`copy` method of the *attrs* object. |
| 271 | |
| 272 | Parsers may set the *qname* parameter to ``None``, unless the |
| 273 | ``feature_namespace_prefixes`` feature is activated. |
| 274 | |
| 275 | |
| 276 | .. method:: ContentHandler.endElementNS(name, qname) |
| 277 | |
| 278 | Signals the end of an element in namespace mode. |
| 279 | |
| 280 | The *name* parameter contains the name of the element type, just as with the |
| 281 | :meth:`startElementNS` method, likewise the *qname* parameter. |
| 282 | |
| 283 | |
| 284 | .. method:: ContentHandler.characters(content) |
| 285 | |
| 286 | Receive notification of character data. |
| 287 | |
| 288 | The Parser will call this method to report each chunk of character data. SAX |
| 289 | parsers may return all contiguous character data in a single chunk, or they may |
| 290 | split it into several chunks; however, all of the characters in any single event |
| 291 | must come from the same external entity so that the Locator provides useful |
| 292 | information. |
| 293 | |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 294 | *content* may be a string or bytes instance; the ``expat`` reader module |
| 295 | always produces strings. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 296 | |
| 297 | .. note:: |
| 298 | |
| 299 | The earlier SAX 1 interface provided by the Python XML Special Interest Group |
| 300 | used a more Java-like interface for this method. Since most parsers used from |
| 301 | Python did not take advantage of the older interface, the simpler signature was |
| 302 | chosen to replace it. To convert old code to the new interface, use *content* |
| 303 | instead of slicing content with the old *offset* and *length* parameters. |
| 304 | |
| 305 | |
| 306 | .. method:: ContentHandler.ignorableWhitespace(whitespace) |
| 307 | |
| 308 | Receive notification of ignorable whitespace in element content. |
| 309 | |
| 310 | Validating Parsers must use this method to report each chunk of ignorable |
| 311 | whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating |
| 312 | parsers may also use this method if they are capable of parsing and using |
| 313 | content models. |
| 314 | |
| 315 | SAX parsers may return all contiguous whitespace in a single chunk, or they may |
| 316 | split it into several chunks; however, all of the characters in any single event |
| 317 | must come from the same external entity, so that the Locator provides useful |
| 318 | information. |
| 319 | |
| 320 | |
| 321 | .. method:: ContentHandler.processingInstruction(target, data) |
| 322 | |
| 323 | Receive notification of a processing instruction. |
| 324 | |
| 325 | The Parser will invoke this method once for each processing instruction found: |
| 326 | note that processing instructions may occur before or after the main document |
| 327 | element. |
| 328 | |
| 329 | A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a |
| 330 | text declaration (XML 1.0, section 4.3.1) using this method. |
| 331 | |
| 332 | |
| 333 | .. method:: ContentHandler.skippedEntity(name) |
| 334 | |
| 335 | Receive notification of a skipped entity. |
| 336 | |
| 337 | The Parser will invoke this method once for each entity skipped. Non-validating |
| 338 | processors may skip entities if they have not seen the declarations (because, |
| 339 | for example, the entity was declared in an external DTD subset). All processors |
| 340 | may skip external entities, depending on the values of the |
| 341 | ``feature_external_ges`` and the ``feature_external_pes`` properties. |
| 342 | |
| 343 | |
| 344 | .. _dtd-handler-objects: |
| 345 | |
| 346 | DTDHandler Objects |
| 347 | ------------------ |
| 348 | |
| 349 | :class:`DTDHandler` instances provide the following methods: |
| 350 | |
| 351 | |
| 352 | .. method:: DTDHandler.notationDecl(name, publicId, systemId) |
| 353 | |
| 354 | Handle a notation declaration event. |
| 355 | |
| 356 | |
| 357 | .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) |
| 358 | |
| 359 | Handle an unparsed entity declaration event. |
| 360 | |
| 361 | |
| 362 | .. _entity-resolver-objects: |
| 363 | |
| 364 | EntityResolver Objects |
| 365 | ---------------------- |
| 366 | |
| 367 | |
| 368 | .. method:: EntityResolver.resolveEntity(publicId, systemId) |
| 369 | |
| 370 | Resolve the system identifier of an entity and return either the system |
| 371 | identifier to read from as a string, or an InputSource to read from. The default |
| 372 | implementation returns *systemId*. |
| 373 | |
| 374 | |
| 375 | .. _sax-error-handler: |
| 376 | |
| 377 | ErrorHandler Objects |
| 378 | -------------------- |
| 379 | |
| 380 | Objects with this interface are used to receive error and warning information |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 381 | from the :class:`~xml.sax.xmlreader.XMLReader`. If you create an object that |
| 382 | implements this interface, then register the object with your |
| 383 | :class:`~xml.sax.xmlreader.XMLReader`, the parser |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 384 | will call the methods in your object to report all warnings and errors. There |
| 385 | are three levels of errors available: warnings, (possibly) recoverable errors, |
| 386 | and unrecoverable errors. All methods take a :exc:`SAXParseException` as the |
| 387 | only parameter. Errors and warnings may be converted to an exception by raising |
| 388 | the passed-in exception object. |
| 389 | |
| 390 | |
| 391 | .. method:: ErrorHandler.error(exception) |
| 392 | |
| 393 | Called when the parser encounters a recoverable error. If this method does not |
| 394 | raise an exception, parsing may continue, but further document information |
| 395 | should not be expected by the application. Allowing the parser to continue may |
| 396 | allow additional errors to be discovered in the input document. |
| 397 | |
| 398 | |
| 399 | .. method:: ErrorHandler.fatalError(exception) |
| 400 | |
| 401 | Called when the parser encounters an error it cannot recover from; parsing is |
| 402 | expected to terminate when this method returns. |
| 403 | |
| 404 | |
| 405 | .. method:: ErrorHandler.warning(exception) |
| 406 | |
| 407 | Called when the parser presents minor warning information to the application. |
| 408 | Parsing is expected to continue when this method returns, and document |
| 409 | information will continue to be passed to the application. Raising an exception |
| 410 | in this method will cause parsing to end. |
| 411 | |