Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`xml.sax.handler` --- Base classes for SAX handlers |
| 2 | ======================================================== |
| 3 | |
| 4 | .. module:: xml.sax.handler |
| 5 | :synopsis: Base classes for SAX event handlers. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> |
| 8 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 9 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 10 | **Source code:** :source:`Lib/xml/sax/handler.py` |
| 11 | |
| 12 | -------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 13 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | The SAX API defines four kinds of handlers: content handlers, DTD handlers, |
| 15 | error handlers, and entity resolvers. Applications normally only need to |
| 16 | implement those interfaces whose events they are interested in; they can |
| 17 | implement the interfaces in a single object or in multiple objects. Handler |
| 18 | implementations should inherit from the base classes provided in the module |
| 19 | :mod:`xml.sax.handler`, so that all methods get default implementations. |
| 20 | |
| 21 | |
| 22 | .. class:: ContentHandler |
| 23 | |
| 24 | This is the main callback interface in SAX, and the one most important to |
| 25 | applications. The order of events in this interface mirrors the order of the |
| 26 | information in the document. |
| 27 | |
| 28 | |
| 29 | .. class:: DTDHandler |
| 30 | |
| 31 | Handle DTD events. |
| 32 | |
| 33 | This interface specifies only those DTD events required for basic parsing |
| 34 | (unparsed entities and attributes). |
| 35 | |
| 36 | |
| 37 | .. class:: EntityResolver |
| 38 | |
| 39 | Basic interface for resolving entities. If you create an object implementing |
| 40 | this interface, then register the object with your Parser, the parser will call |
| 41 | the method in your object to resolve all external entities. |
| 42 | |
| 43 | |
| 44 | .. class:: ErrorHandler |
| 45 | |
| 46 | Interface used by the parser to present error and warning messages to the |
| 47 | application. The methods of this object control whether errors are immediately |
| 48 | converted to exceptions or are handled in some other way. |
| 49 | |
| 50 | In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants |
| 51 | for the feature and property names. |
| 52 | |
| 53 | |
| 54 | .. data:: feature_namespaces |
| 55 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 56 | | value: ``"http://xml.org/sax/features/namespaces"`` |
| 57 | | true: Perform Namespace processing. |
| 58 | | false: Optionally do not perform Namespace processing (implies |
| 59 | namespace-prefixes; default). |
| 60 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 61 | |
| 62 | |
| 63 | .. data:: feature_namespace_prefixes |
| 64 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 65 | | value: ``"http://xml.org/sax/features/namespace-prefixes"`` |
| 66 | | true: Report the original prefixed names and attributes used for Namespace |
| 67 | declarations. |
| 68 | | false: Do not report attributes used for Namespace declarations, and |
| 69 | optionally do not report original prefixed names (default). |
| 70 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 71 | |
| 72 | |
| 73 | .. data:: feature_string_interning |
| 74 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 75 | | value: ``"http://xml.org/sax/features/string-interning"`` |
| 76 | | true: All element names, prefixes, attribute names, Namespace URIs, and |
| 77 | local names are interned using the built-in intern function. |
| 78 | | false: Names are not necessarily interned, although they may be (default). |
| 79 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 80 | |
| 81 | |
| 82 | .. data:: feature_validation |
| 83 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 84 | | value: ``"http://xml.org/sax/features/validation"`` |
| 85 | | true: Report all validation errors (implies external-general-entities and |
| 86 | external-parameter-entities). |
| 87 | | false: Do not report validation errors. |
| 88 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 89 | |
| 90 | |
| 91 | .. data:: feature_external_ges |
| 92 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 93 | | value: ``"http://xml.org/sax/features/external-general-entities"`` |
| 94 | | true: Include all external general (text) entities. |
| 95 | | false: Do not include external general entities. |
| 96 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 97 | |
| 98 | |
| 99 | .. data:: feature_external_pes |
| 100 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 101 | | value: ``"http://xml.org/sax/features/external-parameter-entities"`` |
| 102 | | true: Include all external parameter entities, including the external DTD |
| 103 | subset. |
| 104 | | false: Do not include any external parameter entities, even the external |
| 105 | DTD subset. |
| 106 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 107 | |
| 108 | |
| 109 | .. data:: all_features |
| 110 | |
| 111 | List of all features. |
| 112 | |
| 113 | |
| 114 | .. data:: property_lexical_handler |
| 115 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 116 | | value: ``"http://xml.org/sax/properties/lexical-handler"`` |
| 117 | | data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2) |
| 118 | | description: An optional extension handler for lexical events like |
| 119 | comments. |
| 120 | | access: read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 121 | |
| 122 | |
| 123 | .. data:: property_declaration_handler |
| 124 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 125 | | value: ``"http://xml.org/sax/properties/declaration-handler"`` |
| 126 | | data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2) |
| 127 | | description: An optional extension handler for DTD-related events other |
| 128 | than notations and unparsed entities. |
| 129 | | access: read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 130 | |
| 131 | |
| 132 | .. data:: property_dom_node |
| 133 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 134 | | value: ``"http://xml.org/sax/properties/dom-node"`` |
| 135 | | data type: org.w3c.dom.Node (not supported in Python 2) |
| 136 | | description: When parsing, the current DOM node being visited if this is |
| 137 | a DOM iterator; when not parsing, the root DOM node for iteration. |
| 138 | | access: (parsing) read-only; (not parsing) read/write |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 139 | |
| 140 | |
| 141 | .. data:: property_xml_string |
| 142 | |
Georg Brandl | facfb15 | 2010-11-08 11:05:18 +0000 | [diff] [blame] | 143 | | value: ``"http://xml.org/sax/properties/xml-string"`` |
| 144 | | data type: String |
| 145 | | description: The literal string of characters that was the source for the |
| 146 | current event. |
| 147 | | access: read-only |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 148 | |
| 149 | |
| 150 | .. data:: all_properties |
| 151 | |
| 152 | List of all known property names. |
| 153 | |
| 154 | |
| 155 | .. _content-handler-objects: |
| 156 | |
| 157 | ContentHandler Objects |
| 158 | ---------------------- |
| 159 | |
| 160 | Users are expected to subclass :class:`ContentHandler` to support their |
| 161 | application. The following methods are called by the parser on the appropriate |
| 162 | events in the input document: |
| 163 | |
| 164 | |
| 165 | .. method:: ContentHandler.setDocumentLocator(locator) |
| 166 | |
| 167 | Called by the parser to give the application a locator for locating the origin |
| 168 | of document events. |
| 169 | |
| 170 | SAX parsers are strongly encouraged (though not absolutely required) to supply a |
| 171 | locator: if it does so, it must supply the locator to the application by |
| 172 | invoking this method before invoking any of the other methods in the |
| 173 | DocumentHandler interface. |
| 174 | |
| 175 | The locator allows the application to determine the end position of any |
| 176 | document-related event, even if the parser is not reporting an error. Typically, |
| 177 | the application will use this information for reporting its own errors (such as |
| 178 | character content that does not match an application's business rules). The |
| 179 | information returned by the locator is probably not sufficient for use with a |
| 180 | search engine. |
| 181 | |
| 182 | Note that the locator will return correct information only during the invocation |
| 183 | of the events in this interface. The application should not attempt to use it at |
| 184 | any other time. |
| 185 | |
| 186 | |
| 187 | .. method:: ContentHandler.startDocument() |
| 188 | |
| 189 | Receive notification of the beginning of a document. |
| 190 | |
| 191 | The SAX parser will invoke this method only once, before any other methods in |
| 192 | this interface or in DTDHandler (except for :meth:`setDocumentLocator`). |
| 193 | |
| 194 | |
| 195 | .. method:: ContentHandler.endDocument() |
| 196 | |
| 197 | Receive notification of the end of a document. |
| 198 | |
| 199 | The SAX parser will invoke this method only once, and it will be the last method |
| 200 | invoked during the parse. The parser shall not invoke this method until it has |
| 201 | either abandoned parsing (because of an unrecoverable error) or reached the end |
| 202 | of input. |
| 203 | |
| 204 | |
| 205 | .. method:: ContentHandler.startPrefixMapping(prefix, uri) |
| 206 | |
| 207 | Begin the scope of a prefix-URI Namespace mapping. |
| 208 | |
| 209 | The information from this event is not necessary for normal Namespace |
| 210 | processing: the SAX XML reader will automatically replace prefixes for element |
| 211 | and attribute names when the ``feature_namespaces`` feature is enabled (the |
| 212 | default). |
| 213 | |
| 214 | There are cases, however, when applications need to use prefixes in character |
| 215 | data or in attribute values, where they cannot safely be expanded automatically; |
| 216 | the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the |
| 217 | information to the application to expand prefixes in those contexts itself, if |
| 218 | necessary. |
| 219 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 220 | .. XXX This is not really the default, is it? MvL |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 221 | |
| 222 | Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not |
| 223 | guaranteed to be properly nested relative to each-other: all |
| 224 | :meth:`startPrefixMapping` events will occur before the corresponding |
| 225 | :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur |
| 226 | after the corresponding :meth:`endElement` event, but their order is not |
| 227 | guaranteed. |
| 228 | |
| 229 | |
| 230 | .. method:: ContentHandler.endPrefixMapping(prefix) |
| 231 | |
| 232 | End the scope of a prefix-URI mapping. |
| 233 | |
| 234 | See :meth:`startPrefixMapping` for details. This event will always occur after |
| 235 | the corresponding :meth:`endElement` event, but the order of |
| 236 | :meth:`endPrefixMapping` events is not otherwise guaranteed. |
| 237 | |
| 238 | |
| 239 | .. method:: ContentHandler.startElement(name, attrs) |
| 240 | |
| 241 | Signals the start of an element in non-namespace mode. |
| 242 | |
| 243 | The *name* parameter contains the raw XML 1.0 name of the element type as a |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 244 | string and the *attrs* parameter holds an object of the |
| 245 | :class:`~xml.sax.xmlreader.Attributes` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 246 | interface (see :ref:`attributes-objects`) containing the attributes of |
| 247 | the element. The object passed as *attrs* may be re-used by the parser; holding |
| 248 | on to a reference to it is not a reliable way to keep a copy of the attributes. |
| 249 | To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* |
| 250 | object. |
| 251 | |
| 252 | |
| 253 | .. method:: ContentHandler.endElement(name) |
| 254 | |
| 255 | Signals the end of an element in non-namespace mode. |
| 256 | |
| 257 | The *name* parameter contains the name of the element type, just as with the |
| 258 | :meth:`startElement` event. |
| 259 | |
| 260 | |
| 261 | .. method:: ContentHandler.startElementNS(name, qname, attrs) |
| 262 | |
| 263 | Signals the start of an element in namespace mode. |
| 264 | |
| 265 | The *name* parameter contains the name of the element type as a ``(uri, |
| 266 | localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in |
| 267 | the source document, and the *attrs* parameter holds an instance of the |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 268 | :class:`~xml.sax.xmlreader.AttributesNS` interface (see |
| 269 | :ref:`attributes-ns-objects`) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 270 | containing the attributes of the element. If no namespace is associated with |
| 271 | the element, the *uri* component of *name* will be ``None``. The object passed |
| 272 | as *attrs* may be re-used by the parser; holding on to a reference to it is not |
| 273 | a reliable way to keep a copy of the attributes. To keep a copy of the |
| 274 | attributes, use the :meth:`copy` method of the *attrs* object. |
| 275 | |
| 276 | Parsers may set the *qname* parameter to ``None``, unless the |
| 277 | ``feature_namespace_prefixes`` feature is activated. |
| 278 | |
| 279 | |
| 280 | .. method:: ContentHandler.endElementNS(name, qname) |
| 281 | |
| 282 | Signals the end of an element in namespace mode. |
| 283 | |
| 284 | The *name* parameter contains the name of the element type, just as with the |
| 285 | :meth:`startElementNS` method, likewise the *qname* parameter. |
| 286 | |
| 287 | |
| 288 | .. method:: ContentHandler.characters(content) |
| 289 | |
| 290 | Receive notification of character data. |
| 291 | |
| 292 | The Parser will call this method to report each chunk of character data. SAX |
| 293 | parsers may return all contiguous character data in a single chunk, or they may |
| 294 | split it into several chunks; however, all of the characters in any single event |
| 295 | must come from the same external entity so that the Locator provides useful |
| 296 | information. |
| 297 | |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 298 | *content* may be a string or bytes instance; the ``expat`` reader module |
| 299 | always produces strings. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 300 | |
| 301 | .. note:: |
| 302 | |
| 303 | The earlier SAX 1 interface provided by the Python XML Special Interest Group |
| 304 | used a more Java-like interface for this method. Since most parsers used from |
| 305 | Python did not take advantage of the older interface, the simpler signature was |
| 306 | chosen to replace it. To convert old code to the new interface, use *content* |
| 307 | instead of slicing content with the old *offset* and *length* parameters. |
| 308 | |
| 309 | |
| 310 | .. method:: ContentHandler.ignorableWhitespace(whitespace) |
| 311 | |
| 312 | Receive notification of ignorable whitespace in element content. |
| 313 | |
| 314 | Validating Parsers must use this method to report each chunk of ignorable |
| 315 | whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating |
| 316 | parsers may also use this method if they are capable of parsing and using |
| 317 | content models. |
| 318 | |
| 319 | SAX parsers may return all contiguous whitespace in a single chunk, or they may |
| 320 | split it into several chunks; however, all of the characters in any single event |
| 321 | must come from the same external entity, so that the Locator provides useful |
| 322 | information. |
| 323 | |
| 324 | |
| 325 | .. method:: ContentHandler.processingInstruction(target, data) |
| 326 | |
| 327 | Receive notification of a processing instruction. |
| 328 | |
| 329 | The Parser will invoke this method once for each processing instruction found: |
| 330 | note that processing instructions may occur before or after the main document |
| 331 | element. |
| 332 | |
| 333 | A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a |
| 334 | text declaration (XML 1.0, section 4.3.1) using this method. |
| 335 | |
| 336 | |
| 337 | .. method:: ContentHandler.skippedEntity(name) |
| 338 | |
| 339 | Receive notification of a skipped entity. |
| 340 | |
| 341 | The Parser will invoke this method once for each entity skipped. Non-validating |
| 342 | processors may skip entities if they have not seen the declarations (because, |
| 343 | for example, the entity was declared in an external DTD subset). All processors |
| 344 | may skip external entities, depending on the values of the |
| 345 | ``feature_external_ges`` and the ``feature_external_pes`` properties. |
| 346 | |
| 347 | |
| 348 | .. _dtd-handler-objects: |
| 349 | |
| 350 | DTDHandler Objects |
| 351 | ------------------ |
| 352 | |
| 353 | :class:`DTDHandler` instances provide the following methods: |
| 354 | |
| 355 | |
| 356 | .. method:: DTDHandler.notationDecl(name, publicId, systemId) |
| 357 | |
| 358 | Handle a notation declaration event. |
| 359 | |
| 360 | |
| 361 | .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) |
| 362 | |
| 363 | Handle an unparsed entity declaration event. |
| 364 | |
| 365 | |
| 366 | .. _entity-resolver-objects: |
| 367 | |
| 368 | EntityResolver Objects |
| 369 | ---------------------- |
| 370 | |
| 371 | |
| 372 | .. method:: EntityResolver.resolveEntity(publicId, systemId) |
| 373 | |
| 374 | Resolve the system identifier of an entity and return either the system |
| 375 | identifier to read from as a string, or an InputSource to read from. The default |
| 376 | implementation returns *systemId*. |
| 377 | |
| 378 | |
| 379 | .. _sax-error-handler: |
| 380 | |
| 381 | ErrorHandler Objects |
| 382 | -------------------- |
| 383 | |
| 384 | Objects with this interface are used to receive error and warning information |
Serhiy Storchaka | 15e6590 | 2013-08-29 10:28:44 +0300 | [diff] [blame] | 385 | from the :class:`~xml.sax.xmlreader.XMLReader`. If you create an object that |
| 386 | implements this interface, then register the object with your |
| 387 | :class:`~xml.sax.xmlreader.XMLReader`, the parser |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 388 | will call the methods in your object to report all warnings and errors. There |
| 389 | are three levels of errors available: warnings, (possibly) recoverable errors, |
| 390 | and unrecoverable errors. All methods take a :exc:`SAXParseException` as the |
| 391 | only parameter. Errors and warnings may be converted to an exception by raising |
| 392 | the passed-in exception object. |
| 393 | |
| 394 | |
| 395 | .. method:: ErrorHandler.error(exception) |
| 396 | |
| 397 | Called when the parser encounters a recoverable error. If this method does not |
| 398 | raise an exception, parsing may continue, but further document information |
| 399 | should not be expected by the application. Allowing the parser to continue may |
| 400 | allow additional errors to be discovered in the input document. |
| 401 | |
| 402 | |
| 403 | .. method:: ErrorHandler.fatalError(exception) |
| 404 | |
| 405 | Called when the parser encounters an error it cannot recover from; parsing is |
| 406 | expected to terminate when this method returns. |
| 407 | |
| 408 | |
| 409 | .. method:: ErrorHandler.warning(exception) |
| 410 | |
| 411 | Called when the parser presents minor warning information to the application. |
| 412 | Parsing is expected to continue when this method returns, and document |
| 413 | information will continue to be passed to the application. Raising an exception |
| 414 | in this method will cause parsing to end. |
| 415 | |