| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`xml.sax.handler` --- Base classes for SAX handlers | 
|  | 2 | ======================================================== | 
|  | 3 |  | 
|  | 4 | .. module:: xml.sax.handler | 
|  | 5 | :synopsis: Base classes for SAX event handlers. | 
|  | 6 | .. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no> | 
|  | 7 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> | 
|  | 8 |  | 
|  | 9 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 10 | The SAX API defines four kinds of handlers: content handlers, DTD handlers, | 
|  | 11 | error handlers, and entity resolvers. Applications normally only need to | 
|  | 12 | implement those interfaces whose events they are interested in; they can | 
|  | 13 | implement the interfaces in a single object or in multiple objects. Handler | 
|  | 14 | implementations should inherit from the base classes provided in the module | 
|  | 15 | :mod:`xml.sax.handler`, so that all methods get default implementations. | 
|  | 16 |  | 
|  | 17 |  | 
|  | 18 | .. class:: ContentHandler | 
|  | 19 |  | 
|  | 20 | This is the main callback interface in SAX, and the one most important to | 
|  | 21 | applications. The order of events in this interface mirrors the order of the | 
|  | 22 | information in the document. | 
|  | 23 |  | 
|  | 24 |  | 
|  | 25 | .. class:: DTDHandler | 
|  | 26 |  | 
|  | 27 | Handle DTD events. | 
|  | 28 |  | 
|  | 29 | This interface specifies only those DTD events required for basic parsing | 
|  | 30 | (unparsed entities and attributes). | 
|  | 31 |  | 
|  | 32 |  | 
|  | 33 | .. class:: EntityResolver | 
|  | 34 |  | 
|  | 35 | Basic interface for resolving entities. If you create an object implementing | 
|  | 36 | this interface, then register the object with your Parser, the parser will call | 
|  | 37 | the method in your object to resolve all external entities. | 
|  | 38 |  | 
|  | 39 |  | 
|  | 40 | .. class:: ErrorHandler | 
|  | 41 |  | 
|  | 42 | Interface used by the parser to present error and warning messages to the | 
|  | 43 | application.  The methods of this object control whether errors are immediately | 
|  | 44 | converted to exceptions or are handled in some other way. | 
|  | 45 |  | 
|  | 46 | In addition to these classes, :mod:`xml.sax.handler` provides symbolic constants | 
|  | 47 | for the feature and property names. | 
|  | 48 |  | 
|  | 49 |  | 
|  | 50 | .. data:: feature_namespaces | 
|  | 51 |  | 
|  | 52 | Value: ``"http://xml.org/sax/features/namespaces"`` ---  true: Perform Namespace | 
|  | 53 | processing. ---  false: Optionally do not perform Namespace processing (implies | 
|  | 54 | namespace-prefixes; default). ---  access: (parsing) read-only; (not parsing) | 
|  | 55 | read/write | 
|  | 56 |  | 
|  | 57 |  | 
|  | 58 | .. data:: feature_namespace_prefixes | 
|  | 59 |  | 
|  | 60 | Value: ``"http://xml.org/sax/features/namespace-prefixes"`` --- true: Report | 
|  | 61 | the original prefixed names and attributes used for Namespace | 
|  | 62 | declarations. --- false: Do not report attributes used for Namespace | 
|  | 63 | declarations, and optionally do not report original prefixed names | 
|  | 64 | (default). --- access: (parsing) read-only; (not parsing) read/write | 
|  | 65 |  | 
|  | 66 |  | 
|  | 67 | .. data:: feature_string_interning | 
|  | 68 |  | 
|  | 69 | Value: ``"http://xml.org/sax/features/string-interning"`` ---  true: All element | 
|  | 70 | names, prefixes, attribute names, Namespace URIs, and local names are interned | 
|  | 71 | using the built-in intern function. ---  false: Names are not necessarily | 
|  | 72 | interned, although they may be (default). ---  access: (parsing) read-only; (not | 
|  | 73 | parsing) read/write | 
|  | 74 |  | 
|  | 75 |  | 
|  | 76 | .. data:: feature_validation | 
|  | 77 |  | 
|  | 78 | Value: ``"http://xml.org/sax/features/validation"`` --- true: Report all | 
|  | 79 | validation errors (implies external-general-entities and | 
|  | 80 | external-parameter-entities). --- false: Do not report validation errors. --- | 
|  | 81 | access: (parsing) read-only; (not parsing) read/write | 
|  | 82 |  | 
|  | 83 |  | 
|  | 84 | .. data:: feature_external_ges | 
|  | 85 |  | 
|  | 86 | Value: ``"http://xml.org/sax/features/external-general-entities"`` ---  true: | 
|  | 87 | Include all external general (text) entities. ---  false: Do not include | 
|  | 88 | external general entities. ---  access: (parsing) read-only; (not parsing) | 
|  | 89 | read/write | 
|  | 90 |  | 
|  | 91 |  | 
|  | 92 | .. data:: feature_external_pes | 
|  | 93 |  | 
|  | 94 | Value: ``"http://xml.org/sax/features/external-parameter-entities"`` ---  true: | 
|  | 95 | Include all external parameter entities, including the external DTD subset. --- | 
|  | 96 | false: Do not include any external parameter entities, even the external DTD | 
|  | 97 | subset. ---  access: (parsing) read-only; (not parsing) read/write | 
|  | 98 |  | 
|  | 99 |  | 
|  | 100 | .. data:: all_features | 
|  | 101 |  | 
|  | 102 | List of all features. | 
|  | 103 |  | 
|  | 104 |  | 
|  | 105 | .. data:: property_lexical_handler | 
|  | 106 |  | 
|  | 107 | Value: ``"http://xml.org/sax/properties/lexical-handler"`` ---  data type: | 
|  | 108 | xml.sax.sax2lib.LexicalHandler (not supported in Python 2) ---  description: An | 
|  | 109 | optional extension handler for lexical events like comments. ---  access: | 
|  | 110 | read/write | 
|  | 111 |  | 
|  | 112 |  | 
|  | 113 | .. data:: property_declaration_handler | 
|  | 114 |  | 
|  | 115 | Value: ``"http://xml.org/sax/properties/declaration-handler"`` ---  data type: | 
|  | 116 | xml.sax.sax2lib.DeclHandler (not supported in Python 2) ---  description: An | 
|  | 117 | optional extension handler for DTD-related events other than notations and | 
|  | 118 | unparsed entities. ---  access: read/write | 
|  | 119 |  | 
|  | 120 |  | 
|  | 121 | .. data:: property_dom_node | 
|  | 122 |  | 
|  | 123 | Value: ``"http://xml.org/sax/properties/dom-node"`` ---  data type: | 
|  | 124 | org.w3c.dom.Node (not supported in Python 2)  ---  description: When parsing, | 
|  | 125 | the current DOM node being visited if this is a DOM iterator; when not parsing, | 
|  | 126 | the root DOM node for iteration. ---  access: (parsing) read-only; (not parsing) | 
|  | 127 | read/write | 
|  | 128 |  | 
|  | 129 |  | 
|  | 130 | .. data:: property_xml_string | 
|  | 131 |  | 
|  | 132 | Value: ``"http://xml.org/sax/properties/xml-string"`` ---  data type: String --- | 
|  | 133 | description: The literal string of characters that was the source for the | 
|  | 134 | current event. ---  access: read-only | 
|  | 135 |  | 
|  | 136 |  | 
|  | 137 | .. data:: all_properties | 
|  | 138 |  | 
|  | 139 | List of all known property names. | 
|  | 140 |  | 
|  | 141 |  | 
|  | 142 | .. _content-handler-objects: | 
|  | 143 |  | 
|  | 144 | ContentHandler Objects | 
|  | 145 | ---------------------- | 
|  | 146 |  | 
|  | 147 | Users are expected to subclass :class:`ContentHandler` to support their | 
|  | 148 | application.  The following methods are called by the parser on the appropriate | 
|  | 149 | events in the input document: | 
|  | 150 |  | 
|  | 151 |  | 
|  | 152 | .. method:: ContentHandler.setDocumentLocator(locator) | 
|  | 153 |  | 
|  | 154 | Called by the parser to give the application a locator for locating the origin | 
|  | 155 | of document events. | 
|  | 156 |  | 
|  | 157 | SAX parsers are strongly encouraged (though not absolutely required) to supply a | 
|  | 158 | locator: if it does so, it must supply the locator to the application by | 
|  | 159 | invoking this method before invoking any of the other methods in the | 
|  | 160 | DocumentHandler interface. | 
|  | 161 |  | 
|  | 162 | The locator allows the application to determine the end position of any | 
|  | 163 | document-related event, even if the parser is not reporting an error. Typically, | 
|  | 164 | the application will use this information for reporting its own errors (such as | 
|  | 165 | character content that does not match an application's business rules). The | 
|  | 166 | information returned by the locator is probably not sufficient for use with a | 
|  | 167 | search engine. | 
|  | 168 |  | 
|  | 169 | Note that the locator will return correct information only during the invocation | 
|  | 170 | of the events in this interface. The application should not attempt to use it at | 
|  | 171 | any other time. | 
|  | 172 |  | 
|  | 173 |  | 
|  | 174 | .. method:: ContentHandler.startDocument() | 
|  | 175 |  | 
|  | 176 | Receive notification of the beginning of a document. | 
|  | 177 |  | 
|  | 178 | The SAX parser will invoke this method only once, before any other methods in | 
|  | 179 | this interface or in DTDHandler (except for :meth:`setDocumentLocator`). | 
|  | 180 |  | 
|  | 181 |  | 
|  | 182 | .. method:: ContentHandler.endDocument() | 
|  | 183 |  | 
|  | 184 | Receive notification of the end of a document. | 
|  | 185 |  | 
|  | 186 | The SAX parser will invoke this method only once, and it will be the last method | 
|  | 187 | invoked during the parse. The parser shall not invoke this method until it has | 
|  | 188 | either abandoned parsing (because of an unrecoverable error) or reached the end | 
|  | 189 | of input. | 
|  | 190 |  | 
|  | 191 |  | 
|  | 192 | .. method:: ContentHandler.startPrefixMapping(prefix, uri) | 
|  | 193 |  | 
|  | 194 | Begin the scope of a prefix-URI Namespace mapping. | 
|  | 195 |  | 
|  | 196 | The information from this event is not necessary for normal Namespace | 
|  | 197 | processing: the SAX XML reader will automatically replace prefixes for element | 
|  | 198 | and attribute names when the ``feature_namespaces`` feature is enabled (the | 
|  | 199 | default). | 
|  | 200 |  | 
|  | 201 | There are cases, however, when applications need to use prefixes in character | 
|  | 202 | data or in attribute values, where they cannot safely be expanded automatically; | 
|  | 203 | the :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events supply the | 
|  | 204 | information to the application to expand prefixes in those contexts itself, if | 
|  | 205 | necessary. | 
|  | 206 |  | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 207 | .. XXX This is not really the default, is it? MvL | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 208 |  | 
|  | 209 | Note that :meth:`startPrefixMapping` and :meth:`endPrefixMapping` events are not | 
|  | 210 | guaranteed to be properly nested relative to each-other: all | 
|  | 211 | :meth:`startPrefixMapping` events will occur before the corresponding | 
|  | 212 | :meth:`startElement` event, and all :meth:`endPrefixMapping` events will occur | 
|  | 213 | after the corresponding :meth:`endElement` event, but their order is not | 
|  | 214 | guaranteed. | 
|  | 215 |  | 
|  | 216 |  | 
|  | 217 | .. method:: ContentHandler.endPrefixMapping(prefix) | 
|  | 218 |  | 
|  | 219 | End the scope of a prefix-URI mapping. | 
|  | 220 |  | 
|  | 221 | See :meth:`startPrefixMapping` for details. This event will always occur after | 
|  | 222 | the corresponding :meth:`endElement` event, but the order of | 
|  | 223 | :meth:`endPrefixMapping` events is not otherwise guaranteed. | 
|  | 224 |  | 
|  | 225 |  | 
|  | 226 | .. method:: ContentHandler.startElement(name, attrs) | 
|  | 227 |  | 
|  | 228 | Signals the start of an element in non-namespace mode. | 
|  | 229 |  | 
|  | 230 | The *name* parameter contains the raw XML 1.0 name of the element type as a | 
|  | 231 | string and the *attrs* parameter holds an object of the :class:`Attributes` | 
|  | 232 | interface (see :ref:`attributes-objects`) containing the attributes of | 
|  | 233 | the element.  The object passed as *attrs* may be re-used by the parser; holding | 
|  | 234 | on to a reference to it is not a reliable way to keep a copy of the attributes. | 
|  | 235 | To keep a copy of the attributes, use the :meth:`copy` method of the *attrs* | 
|  | 236 | object. | 
|  | 237 |  | 
|  | 238 |  | 
|  | 239 | .. method:: ContentHandler.endElement(name) | 
|  | 240 |  | 
|  | 241 | Signals the end of an element in non-namespace mode. | 
|  | 242 |  | 
|  | 243 | The *name* parameter contains the name of the element type, just as with the | 
|  | 244 | :meth:`startElement` event. | 
|  | 245 |  | 
|  | 246 |  | 
|  | 247 | .. method:: ContentHandler.startElementNS(name, qname, attrs) | 
|  | 248 |  | 
|  | 249 | Signals the start of an element in namespace mode. | 
|  | 250 |  | 
|  | 251 | The *name* parameter contains the name of the element type as a ``(uri, | 
|  | 252 | localname)`` tuple, the *qname* parameter contains the raw XML 1.0 name used in | 
|  | 253 | the source document, and the *attrs* parameter holds an instance of the | 
|  | 254 | :class:`AttributesNS` interface (see :ref:`attributes-ns-objects`) | 
|  | 255 | containing the attributes of the element.  If no namespace is associated with | 
|  | 256 | the element, the *uri* component of *name* will be ``None``.  The object passed | 
|  | 257 | as *attrs* may be re-used by the parser; holding on to a reference to it is not | 
|  | 258 | a reliable way to keep a copy of the attributes.  To keep a copy of the | 
|  | 259 | attributes, use the :meth:`copy` method of the *attrs* object. | 
|  | 260 |  | 
|  | 261 | Parsers may set the *qname* parameter to ``None``, unless the | 
|  | 262 | ``feature_namespace_prefixes`` feature is activated. | 
|  | 263 |  | 
|  | 264 |  | 
|  | 265 | .. method:: ContentHandler.endElementNS(name, qname) | 
|  | 266 |  | 
|  | 267 | Signals the end of an element in namespace mode. | 
|  | 268 |  | 
|  | 269 | The *name* parameter contains the name of the element type, just as with the | 
|  | 270 | :meth:`startElementNS` method, likewise the *qname* parameter. | 
|  | 271 |  | 
|  | 272 |  | 
|  | 273 | .. method:: ContentHandler.characters(content) | 
|  | 274 |  | 
|  | 275 | Receive notification of character data. | 
|  | 276 |  | 
|  | 277 | The Parser will call this method to report each chunk of character data. SAX | 
|  | 278 | parsers may return all contiguous character data in a single chunk, or they may | 
|  | 279 | split it into several chunks; however, all of the characters in any single event | 
|  | 280 | must come from the same external entity so that the Locator provides useful | 
|  | 281 | information. | 
|  | 282 |  | 
| Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 283 | *content* may be a string or bytes instance; the ``expat`` reader module | 
|  | 284 | always produces strings. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 285 |  | 
|  | 286 | .. note:: | 
|  | 287 |  | 
|  | 288 | The earlier SAX 1 interface provided by the Python XML Special Interest Group | 
|  | 289 | used a more Java-like interface for this method.  Since most parsers used from | 
|  | 290 | Python did not take advantage of the older interface, the simpler signature was | 
|  | 291 | chosen to replace it.  To convert old code to the new interface, use *content* | 
|  | 292 | instead of slicing content with the old *offset* and *length* parameters. | 
|  | 293 |  | 
|  | 294 |  | 
|  | 295 | .. method:: ContentHandler.ignorableWhitespace(whitespace) | 
|  | 296 |  | 
|  | 297 | Receive notification of ignorable whitespace in element content. | 
|  | 298 |  | 
|  | 299 | Validating Parsers must use this method to report each chunk of ignorable | 
|  | 300 | whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating | 
|  | 301 | parsers may also use this method if they are capable of parsing and using | 
|  | 302 | content models. | 
|  | 303 |  | 
|  | 304 | SAX parsers may return all contiguous whitespace in a single chunk, or they may | 
|  | 305 | split it into several chunks; however, all of the characters in any single event | 
|  | 306 | must come from the same external entity, so that the Locator provides useful | 
|  | 307 | information. | 
|  | 308 |  | 
|  | 309 |  | 
|  | 310 | .. method:: ContentHandler.processingInstruction(target, data) | 
|  | 311 |  | 
|  | 312 | Receive notification of a processing instruction. | 
|  | 313 |  | 
|  | 314 | The Parser will invoke this method once for each processing instruction found: | 
|  | 315 | note that processing instructions may occur before or after the main document | 
|  | 316 | element. | 
|  | 317 |  | 
|  | 318 | A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a | 
|  | 319 | text declaration (XML 1.0, section 4.3.1) using this method. | 
|  | 320 |  | 
|  | 321 |  | 
|  | 322 | .. method:: ContentHandler.skippedEntity(name) | 
|  | 323 |  | 
|  | 324 | Receive notification of a skipped entity. | 
|  | 325 |  | 
|  | 326 | The Parser will invoke this method once for each entity skipped. Non-validating | 
|  | 327 | processors may skip entities if they have not seen the declarations (because, | 
|  | 328 | for example, the entity was declared in an external DTD subset). All processors | 
|  | 329 | may skip external entities, depending on the values of the | 
|  | 330 | ``feature_external_ges`` and the ``feature_external_pes`` properties. | 
|  | 331 |  | 
|  | 332 |  | 
|  | 333 | .. _dtd-handler-objects: | 
|  | 334 |  | 
|  | 335 | DTDHandler Objects | 
|  | 336 | ------------------ | 
|  | 337 |  | 
|  | 338 | :class:`DTDHandler` instances provide the following methods: | 
|  | 339 |  | 
|  | 340 |  | 
|  | 341 | .. method:: DTDHandler.notationDecl(name, publicId, systemId) | 
|  | 342 |  | 
|  | 343 | Handle a notation declaration event. | 
|  | 344 |  | 
|  | 345 |  | 
|  | 346 | .. method:: DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) | 
|  | 347 |  | 
|  | 348 | Handle an unparsed entity declaration event. | 
|  | 349 |  | 
|  | 350 |  | 
|  | 351 | .. _entity-resolver-objects: | 
|  | 352 |  | 
|  | 353 | EntityResolver Objects | 
|  | 354 | ---------------------- | 
|  | 355 |  | 
|  | 356 |  | 
|  | 357 | .. method:: EntityResolver.resolveEntity(publicId, systemId) | 
|  | 358 |  | 
|  | 359 | Resolve the system identifier of an entity and return either the system | 
|  | 360 | identifier to read from as a string, or an InputSource to read from. The default | 
|  | 361 | implementation returns *systemId*. | 
|  | 362 |  | 
|  | 363 |  | 
|  | 364 | .. _sax-error-handler: | 
|  | 365 |  | 
|  | 366 | ErrorHandler Objects | 
|  | 367 | -------------------- | 
|  | 368 |  | 
|  | 369 | Objects with this interface are used to receive error and warning information | 
|  | 370 | from the :class:`XMLReader`.  If you create an object that implements this | 
|  | 371 | interface, then register the object with your :class:`XMLReader`, the parser | 
|  | 372 | will call the methods in your object to report all warnings and errors. There | 
|  | 373 | are three levels of errors available: warnings, (possibly) recoverable errors, | 
|  | 374 | and unrecoverable errors.  All methods take a :exc:`SAXParseException` as the | 
|  | 375 | only parameter.  Errors and warnings may be converted to an exception by raising | 
|  | 376 | the passed-in exception object. | 
|  | 377 |  | 
|  | 378 |  | 
|  | 379 | .. method:: ErrorHandler.error(exception) | 
|  | 380 |  | 
|  | 381 | Called when the parser encounters a recoverable error.  If this method does not | 
|  | 382 | raise an exception, parsing may continue, but further document information | 
|  | 383 | should not be expected by the application.  Allowing the parser to continue may | 
|  | 384 | allow additional errors to be discovered in the input document. | 
|  | 385 |  | 
|  | 386 |  | 
|  | 387 | .. method:: ErrorHandler.fatalError(exception) | 
|  | 388 |  | 
|  | 389 | Called when the parser encounters an error it cannot recover from; parsing is | 
|  | 390 | expected to terminate when this method returns. | 
|  | 391 |  | 
|  | 392 |  | 
|  | 393 | .. method:: ErrorHandler.warning(exception) | 
|  | 394 |  | 
|  | 395 | Called when the parser presents minor warning information to the application. | 
|  | 396 | Parsing is expected to continue when this method returns, and document | 
|  | 397 | information will continue to be passed to the application. Raising an exception | 
|  | 398 | in this method will cause parsing to end. | 
|  | 399 |  |