| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 |  | 
 | 2 | :mod:`xml.parsers.expat` --- Fast XML parsing using Expat | 
 | 3 | ========================================================= | 
 | 4 |  | 
 | 5 | .. module:: xml.parsers.expat | 
 | 6 |    :synopsis: An interface to the Expat non-validating XML parser. | 
 | 7 | .. moduleauthor:: Paul Prescod <paul@prescod.net> | 
 | 8 |  | 
 | 9 |  | 
| Georg Brandl | b19be57 | 2007-12-29 10:57:00 +0000 | [diff] [blame] | 10 | .. Markup notes: | 
 | 11 |  | 
 | 12 |    Many of the attributes of the XMLParser objects are callbacks.  Since | 
 | 13 |    signature information must be presented, these are described using the method | 
 | 14 |    directive.  Since they are attributes which are set by client code, in-text | 
 | 15 |    references to these attributes should be marked using the :member: role. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 16 |  | 
 | 17 | .. versionadded:: 2.0 | 
 | 18 |  | 
 | 19 | .. index:: single: Expat | 
 | 20 |  | 
 | 21 | The :mod:`xml.parsers.expat` module is a Python interface to the Expat | 
 | 22 | non-validating XML parser. The module provides a single extension type, | 
 | 23 | :class:`xmlparser`, that represents the current state of an XML parser.  After | 
 | 24 | an :class:`xmlparser` object has been created, various attributes of the object | 
 | 25 | can be set to handler functions.  When an XML document is then fed to the | 
 | 26 | parser, the handler functions are called for the character data and markup in | 
 | 27 | the XML document. | 
 | 28 |  | 
 | 29 | .. index:: module: pyexpat | 
 | 30 |  | 
 | 31 | This module uses the :mod:`pyexpat` module to provide access to the Expat | 
 | 32 | parser.  Direct use of the :mod:`pyexpat` module is deprecated. | 
 | 33 |  | 
 | 34 | This module provides one exception and one type object: | 
 | 35 |  | 
 | 36 |  | 
 | 37 | .. exception:: ExpatError | 
 | 38 |  | 
 | 39 |    The exception raised when Expat reports an error.  See section | 
 | 40 |    :ref:`expaterror-objects` for more information on interpreting Expat errors. | 
 | 41 |  | 
 | 42 |  | 
 | 43 | .. exception:: error | 
 | 44 |  | 
 | 45 |    Alias for :exc:`ExpatError`. | 
 | 46 |  | 
 | 47 |  | 
 | 48 | .. data:: XMLParserType | 
 | 49 |  | 
 | 50 |    The type of the return values from the :func:`ParserCreate` function. | 
 | 51 |  | 
 | 52 | The :mod:`xml.parsers.expat` module contains two functions: | 
 | 53 |  | 
 | 54 |  | 
 | 55 | .. function:: ErrorString(errno) | 
 | 56 |  | 
 | 57 |    Returns an explanatory string for a given error number *errno*. | 
 | 58 |  | 
 | 59 |  | 
 | 60 | .. function:: ParserCreate([encoding[, namespace_separator]]) | 
 | 61 |  | 
 | 62 |    Creates and returns a new :class:`xmlparser` object.   *encoding*, if specified, | 
 | 63 |    must be a string naming the encoding  used by the XML data.  Expat doesn't | 
 | 64 |    support as many encodings as Python does, and its repertoire of encodings can't | 
 | 65 |    be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII.  If | 
| Mark Summerfield | 43da35d | 2008-03-17 08:28:15 +0000 | [diff] [blame] | 66 |    *encoding* [1]_ is given it will override the implicit or explicit encoding of the | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 67 |    document. | 
 | 68 |  | 
 | 69 |    Expat can optionally do XML namespace processing for you, enabled by providing a | 
 | 70 |    value for *namespace_separator*.  The value must be a one-character string; a | 
 | 71 |    :exc:`ValueError` will be raised if the string has an illegal length (``None`` | 
 | 72 |    is considered the same as omission).  When namespace processing is enabled, | 
 | 73 |    element type names and attribute names that belong to a namespace will be | 
 | 74 |    expanded.  The element name passed to the element handlers | 
 | 75 |    :attr:`StartElementHandler` and :attr:`EndElementHandler` will be the | 
 | 76 |    concatenation of the namespace URI, the namespace separator character, and the | 
 | 77 |    local part of the name.  If the namespace separator is a zero byte (``chr(0)``) | 
 | 78 |    then the namespace URI and the local part will be concatenated without any | 
 | 79 |    separator. | 
 | 80 |  | 
 | 81 |    For example, if *namespace_separator* is set to a space character (``' '``) and | 
 | 82 |    the following document is parsed:: | 
 | 83 |  | 
 | 84 |       <?xml version="1.0"?> | 
 | 85 |       <root xmlns    = "http://default-namespace.org/" | 
 | 86 |             xmlns:py = "http://www.python.org/ns/"> | 
 | 87 |         <py:elem1 /> | 
 | 88 |         <elem2 xmlns="" /> | 
 | 89 |       </root> | 
 | 90 |  | 
 | 91 |    :attr:`StartElementHandler` will receive the following strings for each | 
 | 92 |    element:: | 
 | 93 |  | 
 | 94 |       http://default-namespace.org/ root | 
 | 95 |       http://www.python.org/ns/ elem1 | 
 | 96 |       elem2 | 
 | 97 |  | 
 | 98 |  | 
 | 99 | .. seealso:: | 
 | 100 |  | 
 | 101 |    `The Expat XML Parser <http://www.libexpat.org/>`_ | 
 | 102 |       Home page of the Expat project. | 
 | 103 |  | 
 | 104 |  | 
 | 105 | .. _xmlparser-objects: | 
 | 106 |  | 
 | 107 | XMLParser Objects | 
 | 108 | ----------------- | 
 | 109 |  | 
 | 110 | :class:`xmlparser` objects have the following methods: | 
 | 111 |  | 
 | 112 |  | 
 | 113 | .. method:: xmlparser.Parse(data[, isfinal]) | 
 | 114 |  | 
 | 115 |    Parses the contents of the string *data*, calling the appropriate handler | 
 | 116 |    functions to process the parsed data.  *isfinal* must be true on the final call | 
 | 117 |    to this method.  *data* can be the empty string at any time. | 
 | 118 |  | 
 | 119 |  | 
 | 120 | .. method:: xmlparser.ParseFile(file) | 
 | 121 |  | 
 | 122 |    Parse XML data reading from the object *file*.  *file* only needs to provide | 
 | 123 |    the ``read(nbytes)`` method, returning the empty string when there's no more | 
 | 124 |    data. | 
 | 125 |  | 
 | 126 |  | 
 | 127 | .. method:: xmlparser.SetBase(base) | 
 | 128 |  | 
 | 129 |    Sets the base to be used for resolving relative URIs in system identifiers in | 
 | 130 |    declarations.  Resolving relative identifiers is left to the application: this | 
 | 131 |    value will be passed through as the *base* argument to the | 
 | 132 |    :func:`ExternalEntityRefHandler`, :func:`NotationDeclHandler`, and | 
 | 133 |    :func:`UnparsedEntityDeclHandler` functions. | 
 | 134 |  | 
 | 135 |  | 
 | 136 | .. method:: xmlparser.GetBase() | 
 | 137 |  | 
 | 138 |    Returns a string containing the base set by a previous call to :meth:`SetBase`, | 
 | 139 |    or ``None`` if  :meth:`SetBase` hasn't been called. | 
 | 140 |  | 
 | 141 |  | 
 | 142 | .. method:: xmlparser.GetInputContext() | 
 | 143 |  | 
 | 144 |    Returns the input data that generated the current event as a string. The data is | 
 | 145 |    in the encoding of the entity which contains the text. When called while an | 
 | 146 |    event handler is not active, the return value is ``None``. | 
 | 147 |  | 
 | 148 |    .. versionadded:: 2.1 | 
 | 149 |  | 
 | 150 |  | 
 | 151 | .. method:: xmlparser.ExternalEntityParserCreate(context[, encoding]) | 
 | 152 |  | 
 | 153 |    Create a "child" parser which can be used to parse an external parsed entity | 
 | 154 |    referred to by content parsed by the parent parser.  The *context* parameter | 
 | 155 |    should be the string passed to the :meth:`ExternalEntityRefHandler` handler | 
 | 156 |    function, described below. The child parser is created with the | 
 | 157 |    :attr:`ordered_attributes`, :attr:`returns_unicode` and | 
 | 158 |    :attr:`specified_attributes` set to the values of this parser. | 
 | 159 |  | 
| Antoine Pitrou | 5047225 | 2011-01-05 18:41:13 +0000 | [diff] [blame] | 160 | .. method:: xmlparser.SetParamEntityParsing(flag) | 
 | 161 |  | 
 | 162 |    Control parsing of parameter entities (including the external DTD subset). | 
 | 163 |    Possible *flag* values are :const:`XML_PARAM_ENTITY_PARSING_NEVER`, | 
 | 164 |    :const:`XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE` and | 
 | 165 |    :const:`XML_PARAM_ENTITY_PARSING_ALWAYS`.  Return true if setting the flag | 
 | 166 |    was successful. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 167 |  | 
 | 168 | .. method:: xmlparser.UseForeignDTD([flag]) | 
 | 169 |  | 
 | 170 |    Calling this with a true value for *flag* (the default) will cause Expat to call | 
 | 171 |    the :attr:`ExternalEntityRefHandler` with :const:`None` for all arguments to | 
 | 172 |    allow an alternate DTD to be loaded.  If the document does not contain a | 
 | 173 |    document type declaration, the :attr:`ExternalEntityRefHandler` will still be | 
 | 174 |    called, but the :attr:`StartDoctypeDeclHandler` and | 
 | 175 |    :attr:`EndDoctypeDeclHandler` will not be called. | 
 | 176 |  | 
 | 177 |    Passing a false value for *flag* will cancel a previous call that passed a true | 
 | 178 |    value, but otherwise has no effect. | 
 | 179 |  | 
 | 180 |    This method can only be called before the :meth:`Parse` or :meth:`ParseFile` | 
 | 181 |    methods are called; calling it after either of those have been called causes | 
 | 182 |    :exc:`ExpatError` to be raised with the :attr:`code` attribute set to | 
 | 183 |    :const:`errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING`. | 
 | 184 |  | 
 | 185 |    .. versionadded:: 2.3 | 
 | 186 |  | 
 | 187 | :class:`xmlparser` objects have the following attributes: | 
 | 188 |  | 
 | 189 |  | 
 | 190 | .. attribute:: xmlparser.buffer_size | 
 | 191 |  | 
| Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 192 |    The size of the buffer used when :attr:`buffer_text` is true. | 
 | 193 |    A new buffer size can be set by assigning a new integer value | 
 | 194 |    to this attribute. | 
| Andrew M. Kuchling | e0a49b6 | 2008-01-08 14:30:55 +0000 | [diff] [blame] | 195 |    When the size is changed, the buffer will be flushed. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 196 |  | 
 | 197 |    .. versionadded:: 2.3 | 
 | 198 |  | 
| Andrew M. Kuchling | e0a49b6 | 2008-01-08 14:30:55 +0000 | [diff] [blame] | 199 |    .. versionchanged:: 2.6 | 
 | 200 |       The buffer size can now be changed. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 201 |  | 
 | 202 | .. attribute:: xmlparser.buffer_text | 
 | 203 |  | 
 | 204 |    Setting this to true causes the :class:`xmlparser` object to buffer textual | 
 | 205 |    content returned by Expat to avoid multiple calls to the | 
 | 206 |    :meth:`CharacterDataHandler` callback whenever possible.  This can improve | 
 | 207 |    performance substantially since Expat normally breaks character data into chunks | 
 | 208 |    at every line ending.  This attribute is false by default, and may be changed at | 
 | 209 |    any time. | 
 | 210 |  | 
 | 211 |    .. versionadded:: 2.3 | 
 | 212 |  | 
 | 213 |  | 
 | 214 | .. attribute:: xmlparser.buffer_used | 
 | 215 |  | 
 | 216 |    If :attr:`buffer_text` is enabled, the number of bytes stored in the buffer. | 
 | 217 |    These bytes represent UTF-8 encoded text.  This attribute has no meaningful | 
 | 218 |    interpretation when :attr:`buffer_text` is false. | 
 | 219 |  | 
 | 220 |    .. versionadded:: 2.3 | 
 | 221 |  | 
 | 222 |  | 
 | 223 | .. attribute:: xmlparser.ordered_attributes | 
 | 224 |  | 
 | 225 |    Setting this attribute to a non-zero integer causes the attributes to be | 
 | 226 |    reported as a list rather than a dictionary.  The attributes are presented in | 
 | 227 |    the order found in the document text.  For each attribute, two list entries are | 
 | 228 |    presented: the attribute name and the attribute value.  (Older versions of this | 
 | 229 |    module also used this format.)  By default, this attribute is false; it may be | 
 | 230 |    changed at any time. | 
 | 231 |  | 
 | 232 |    .. versionadded:: 2.1 | 
 | 233 |  | 
 | 234 |  | 
 | 235 | .. attribute:: xmlparser.returns_unicode | 
 | 236 |  | 
 | 237 |    If this attribute is set to a non-zero integer, the handler functions will be | 
 | 238 |    passed Unicode strings.  If :attr:`returns_unicode` is :const:`False`, 8-bit | 
 | 239 |    strings containing UTF-8 encoded data will be passed to the handlers.  This is | 
 | 240 |    :const:`True` by default when Python is built with Unicode support. | 
 | 241 |  | 
 | 242 |    .. versionchanged:: 1.6 | 
 | 243 |       Can be changed at any time to affect the result type. | 
 | 244 |  | 
 | 245 |  | 
 | 246 | .. attribute:: xmlparser.specified_attributes | 
 | 247 |  | 
 | 248 |    If set to a non-zero integer, the parser will report only those attributes which | 
 | 249 |    were specified in the document instance and not those which were derived from | 
 | 250 |    attribute declarations.  Applications which set this need to be especially | 
 | 251 |    careful to use what additional information is available from the declarations as | 
 | 252 |    needed to comply with the standards for the behavior of XML processors.  By | 
 | 253 |    default, this attribute is false; it may be changed at any time. | 
 | 254 |  | 
 | 255 |    .. versionadded:: 2.1 | 
 | 256 |  | 
 | 257 | The following attributes contain values relating to the most recent error | 
 | 258 | encountered by an :class:`xmlparser` object, and will only have correct values | 
 | 259 | once a call to :meth:`Parse` or :meth:`ParseFile` has raised a | 
 | 260 | :exc:`xml.parsers.expat.ExpatError` exception. | 
 | 261 |  | 
 | 262 |  | 
 | 263 | .. attribute:: xmlparser.ErrorByteIndex | 
 | 264 |  | 
 | 265 |    Byte index at which an error occurred. | 
 | 266 |  | 
 | 267 |  | 
 | 268 | .. attribute:: xmlparser.ErrorCode | 
 | 269 |  | 
 | 270 |    Numeric code specifying the problem.  This value can be passed to the | 
 | 271 |    :func:`ErrorString` function, or compared to one of the constants defined in the | 
 | 272 |    ``errors`` object. | 
 | 273 |  | 
 | 274 |  | 
 | 275 | .. attribute:: xmlparser.ErrorColumnNumber | 
 | 276 |  | 
 | 277 |    Column number at which an error occurred. | 
 | 278 |  | 
 | 279 |  | 
 | 280 | .. attribute:: xmlparser.ErrorLineNumber | 
 | 281 |  | 
 | 282 |    Line number at which an error occurred. | 
 | 283 |  | 
 | 284 | The following attributes contain values relating to the current parse location | 
 | 285 | in an :class:`xmlparser` object.  During a callback reporting a parse event they | 
 | 286 | indicate the location of the first of the sequence of characters that generated | 
 | 287 | the event.  When called outside of a callback, the position indicated will be | 
 | 288 | just past the last parse event (regardless of whether there was an associated | 
 | 289 | callback). | 
 | 290 |  | 
 | 291 | .. versionadded:: 2.4 | 
 | 292 |  | 
 | 293 |  | 
 | 294 | .. attribute:: xmlparser.CurrentByteIndex | 
 | 295 |  | 
 | 296 |    Current byte index in the parser input. | 
 | 297 |  | 
 | 298 |  | 
 | 299 | .. attribute:: xmlparser.CurrentColumnNumber | 
 | 300 |  | 
 | 301 |    Current column number in the parser input. | 
 | 302 |  | 
 | 303 |  | 
 | 304 | .. attribute:: xmlparser.CurrentLineNumber | 
 | 305 |  | 
 | 306 |    Current line number in the parser input. | 
 | 307 |  | 
 | 308 | Here is the list of handlers that can be set.  To set a handler on an | 
 | 309 | :class:`xmlparser` object *o*, use ``o.handlername = func``.  *handlername* must | 
 | 310 | be taken from the following list, and *func* must be a callable object accepting | 
 | 311 | the correct number of arguments.  The arguments are all strings, unless | 
 | 312 | otherwise stated. | 
 | 313 |  | 
 | 314 |  | 
 | 315 | .. method:: xmlparser.XmlDeclHandler(version, encoding, standalone) | 
 | 316 |  | 
 | 317 |    Called when the XML declaration is parsed.  The XML declaration is the | 
 | 318 |    (optional) declaration of the applicable version of the XML recommendation, the | 
 | 319 |    encoding of the document text, and an optional "standalone" declaration. | 
 | 320 |    *version* and *encoding* will be strings of the type dictated by the | 
 | 321 |    :attr:`returns_unicode` attribute, and *standalone* will be ``1`` if the | 
 | 322 |    document is declared standalone, ``0`` if it is declared not to be standalone, | 
 | 323 |    or ``-1`` if the standalone clause was omitted. This is only available with | 
 | 324 |    Expat version 1.95.0 or newer. | 
 | 325 |  | 
 | 326 |    .. versionadded:: 2.1 | 
 | 327 |  | 
 | 328 |  | 
 | 329 | .. method:: xmlparser.StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset) | 
 | 330 |  | 
 | 331 |    Called when Expat begins parsing the document type declaration (``<!DOCTYPE | 
 | 332 |    ...``).  The *doctypeName* is provided exactly as presented.  The *systemId* and | 
 | 333 |    *publicId* parameters give the system and public identifiers if specified, or | 
 | 334 |    ``None`` if omitted.  *has_internal_subset* will be true if the document | 
 | 335 |    contains and internal document declaration subset. This requires Expat version | 
 | 336 |    1.2 or newer. | 
 | 337 |  | 
 | 338 |  | 
 | 339 | .. method:: xmlparser.EndDoctypeDeclHandler() | 
 | 340 |  | 
 | 341 |    Called when Expat is done parsing the document type declaration. This requires | 
 | 342 |    Expat version 1.2 or newer. | 
 | 343 |  | 
 | 344 |  | 
 | 345 | .. method:: xmlparser.ElementDeclHandler(name, model) | 
 | 346 |  | 
 | 347 |    Called once for each element type declaration.  *name* is the name of the | 
 | 348 |    element type, and *model* is a representation of the content model. | 
 | 349 |  | 
 | 350 |  | 
 | 351 | .. method:: xmlparser.AttlistDeclHandler(elname, attname, type, default, required) | 
 | 352 |  | 
 | 353 |    Called for each declared attribute for an element type.  If an attribute list | 
 | 354 |    declaration declares three attributes, this handler is called three times, once | 
 | 355 |    for each attribute.  *elname* is the name of the element to which the | 
 | 356 |    declaration applies and *attname* is the name of the attribute declared.  The | 
 | 357 |    attribute type is a string passed as *type*; the possible values are | 
 | 358 |    ``'CDATA'``, ``'ID'``, ``'IDREF'``, ... *default* gives the default value for | 
 | 359 |    the attribute used when the attribute is not specified by the document instance, | 
 | 360 |    or ``None`` if there is no default value (``#IMPLIED`` values).  If the | 
 | 361 |    attribute is required to be given in the document instance, *required* will be | 
 | 362 |    true. This requires Expat version 1.95.0 or newer. | 
 | 363 |  | 
 | 364 |  | 
 | 365 | .. method:: xmlparser.StartElementHandler(name, attributes) | 
 | 366 |  | 
 | 367 |    Called for the start of every element.  *name* is a string containing the | 
 | 368 |    element name, and *attributes* is a dictionary mapping attribute names to their | 
 | 369 |    values. | 
 | 370 |  | 
 | 371 |  | 
 | 372 | .. method:: xmlparser.EndElementHandler(name) | 
 | 373 |  | 
 | 374 |    Called for the end of every element. | 
 | 375 |  | 
 | 376 |  | 
 | 377 | .. method:: xmlparser.ProcessingInstructionHandler(target, data) | 
 | 378 |  | 
 | 379 |    Called for every processing instruction. | 
 | 380 |  | 
 | 381 |  | 
 | 382 | .. method:: xmlparser.CharacterDataHandler(data) | 
 | 383 |  | 
 | 384 |    Called for character data.  This will be called for normal character data, CDATA | 
 | 385 |    marked content, and ignorable whitespace.  Applications which must distinguish | 
 | 386 |    these cases can use the :attr:`StartCdataSectionHandler`, | 
 | 387 |    :attr:`EndCdataSectionHandler`, and :attr:`ElementDeclHandler` callbacks to | 
 | 388 |    collect the required information. | 
 | 389 |  | 
 | 390 |  | 
 | 391 | .. method:: xmlparser.UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName) | 
 | 392 |  | 
 | 393 |    Called for unparsed (NDATA) entity declarations.  This is only present for | 
 | 394 |    version 1.2 of the Expat library; for more recent versions, use | 
 | 395 |    :attr:`EntityDeclHandler` instead.  (The underlying function in the Expat | 
 | 396 |    library has been declared obsolete.) | 
 | 397 |  | 
 | 398 |  | 
 | 399 | .. method:: xmlparser.EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName) | 
 | 400 |  | 
 | 401 |    Called for all entity declarations.  For parameter and internal entities, | 
 | 402 |    *value* will be a string giving the declared contents of the entity; this will | 
 | 403 |    be ``None`` for external entities.  The *notationName* parameter will be | 
 | 404 |    ``None`` for parsed entities, and the name of the notation for unparsed | 
 | 405 |    entities. *is_parameter_entity* will be true if the entity is a parameter entity | 
 | 406 |    or false for general entities (most applications only need to be concerned with | 
 | 407 |    general entities). This is only available starting with version 1.95.0 of the | 
 | 408 |    Expat library. | 
 | 409 |  | 
 | 410 |    .. versionadded:: 2.1 | 
 | 411 |  | 
 | 412 |  | 
 | 413 | .. method:: xmlparser.NotationDeclHandler(notationName, base, systemId, publicId) | 
 | 414 |  | 
 | 415 |    Called for notation declarations.  *notationName*, *base*, and *systemId*, and | 
 | 416 |    *publicId* are strings if given.  If the public identifier is omitted, | 
 | 417 |    *publicId* will be ``None``. | 
 | 418 |  | 
 | 419 |  | 
 | 420 | .. method:: xmlparser.StartNamespaceDeclHandler(prefix, uri) | 
 | 421 |  | 
 | 422 |    Called when an element contains a namespace declaration.  Namespace declarations | 
 | 423 |    are processed before the :attr:`StartElementHandler` is called for the element | 
 | 424 |    on which declarations are placed. | 
 | 425 |  | 
 | 426 |  | 
 | 427 | .. method:: xmlparser.EndNamespaceDeclHandler(prefix) | 
 | 428 |  | 
 | 429 |    Called when the closing tag is reached for an element  that contained a | 
 | 430 |    namespace declaration.  This is called once for each namespace declaration on | 
 | 431 |    the element in the reverse of the order for which the | 
 | 432 |    :attr:`StartNamespaceDeclHandler` was called to indicate the start of each | 
 | 433 |    namespace declaration's scope.  Calls to this handler are made after the | 
 | 434 |    corresponding :attr:`EndElementHandler` for the end of the element. | 
 | 435 |  | 
 | 436 |  | 
 | 437 | .. method:: xmlparser.CommentHandler(data) | 
 | 438 |  | 
 | 439 |    Called for comments.  *data* is the text of the comment, excluding the leading | 
 | 440 |    '``<!-``\ ``-``' and trailing '``-``\ ``->``'. | 
 | 441 |  | 
 | 442 |  | 
 | 443 | .. method:: xmlparser.StartCdataSectionHandler() | 
 | 444 |  | 
 | 445 |    Called at the start of a CDATA section.  This and :attr:`EndCdataSectionHandler` | 
 | 446 |    are needed to be able to identify the syntactical start and end for CDATA | 
 | 447 |    sections. | 
 | 448 |  | 
 | 449 |  | 
 | 450 | .. method:: xmlparser.EndCdataSectionHandler() | 
 | 451 |  | 
 | 452 |    Called at the end of a CDATA section. | 
 | 453 |  | 
 | 454 |  | 
 | 455 | .. method:: xmlparser.DefaultHandler(data) | 
 | 456 |  | 
 | 457 |    Called for any characters in the XML document for which no applicable handler | 
 | 458 |    has been specified.  This means characters that are part of a construct which | 
 | 459 |    could be reported, but for which no handler has been supplied. | 
 | 460 |  | 
 | 461 |  | 
 | 462 | .. method:: xmlparser.DefaultHandlerExpand(data) | 
 | 463 |  | 
 | 464 |    This is the same as the :func:`DefaultHandler`,  but doesn't inhibit expansion | 
 | 465 |    of internal entities. The entity reference will not be passed to the default | 
 | 466 |    handler. | 
 | 467 |  | 
 | 468 |  | 
 | 469 | .. method:: xmlparser.NotStandaloneHandler() | 
 | 470 |  | 
 | 471 |    Called if the XML document hasn't been declared as being a standalone document. | 
 | 472 |    This happens when there is an external subset or a reference to a parameter | 
 | 473 |    entity, but the XML declaration does not set standalone to ``yes`` in an XML | 
| Georg Brandl | 21946af | 2010-10-06 09:28:45 +0000 | [diff] [blame] | 474 |    declaration.  If this handler returns ``0``, then the parser will raise an | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 475 |    :const:`XML_ERROR_NOT_STANDALONE` error.  If this handler is not set, no | 
 | 476 |    exception is raised by the parser for this condition. | 
 | 477 |  | 
 | 478 |  | 
 | 479 | .. method:: xmlparser.ExternalEntityRefHandler(context, base, systemId, publicId) | 
 | 480 |  | 
 | 481 |    Called for references to external entities.  *base* is the current base, as set | 
 | 482 |    by a previous call to :meth:`SetBase`.  The public and system identifiers, | 
 | 483 |    *systemId* and *publicId*, are strings if given; if the public identifier is not | 
 | 484 |    given, *publicId* will be ``None``.  The *context* value is opaque and should | 
 | 485 |    only be used as described below. | 
 | 486 |  | 
 | 487 |    For external entities to be parsed, this handler must be implemented. It is | 
 | 488 |    responsible for creating the sub-parser using | 
 | 489 |    ``ExternalEntityParserCreate(context)``, initializing it with the appropriate | 
 | 490 |    callbacks, and parsing the entity.  This handler should return an integer; if it | 
| Georg Brandl | 21946af | 2010-10-06 09:28:45 +0000 | [diff] [blame] | 491 |    returns ``0``, the parser will raise an | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 492 |    :const:`XML_ERROR_EXTERNAL_ENTITY_HANDLING` error, otherwise parsing will | 
 | 493 |    continue. | 
 | 494 |  | 
 | 495 |    If this handler is not provided, external entities are reported by the | 
 | 496 |    :attr:`DefaultHandler` callback, if provided. | 
 | 497 |  | 
 | 498 |  | 
 | 499 | .. _expaterror-objects: | 
 | 500 |  | 
 | 501 | ExpatError Exceptions | 
 | 502 | --------------------- | 
 | 503 |  | 
 | 504 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 505 |  | 
 | 506 |  | 
 | 507 | :exc:`ExpatError` exceptions have a number of interesting attributes: | 
 | 508 |  | 
 | 509 |  | 
 | 510 | .. attribute:: ExpatError.code | 
 | 511 |  | 
 | 512 |    Expat's internal error number for the specific error.  This will match one of | 
 | 513 |    the constants defined in the ``errors`` object from this module. | 
 | 514 |  | 
 | 515 |    .. versionadded:: 2.1 | 
 | 516 |  | 
 | 517 |  | 
 | 518 | .. attribute:: ExpatError.lineno | 
 | 519 |  | 
 | 520 |    Line number on which the error was detected.  The first line is numbered ``1``. | 
 | 521 |  | 
 | 522 |    .. versionadded:: 2.1 | 
 | 523 |  | 
 | 524 |  | 
 | 525 | .. attribute:: ExpatError.offset | 
 | 526 |  | 
 | 527 |    Character offset into the line where the error occurred.  The first column is | 
 | 528 |    numbered ``0``. | 
 | 529 |  | 
 | 530 |    .. versionadded:: 2.1 | 
 | 531 |  | 
 | 532 |  | 
 | 533 | .. _expat-example: | 
 | 534 |  | 
 | 535 | Example | 
 | 536 | ------- | 
 | 537 |  | 
 | 538 | The following program defines three handlers that just print out their | 
 | 539 | arguments. :: | 
 | 540 |  | 
 | 541 |    import xml.parsers.expat | 
 | 542 |  | 
 | 543 |    # 3 handler functions | 
 | 544 |    def start_element(name, attrs): | 
 | 545 |        print 'Start element:', name, attrs | 
 | 546 |    def end_element(name): | 
 | 547 |        print 'End element:', name | 
 | 548 |    def char_data(data): | 
 | 549 |        print 'Character data:', repr(data) | 
 | 550 |  | 
 | 551 |    p = xml.parsers.expat.ParserCreate() | 
 | 552 |  | 
 | 553 |    p.StartElementHandler = start_element | 
 | 554 |    p.EndElementHandler = end_element | 
 | 555 |    p.CharacterDataHandler = char_data | 
 | 556 |  | 
 | 557 |    p.Parse("""<?xml version="1.0"?> | 
 | 558 |    <parent id="top"><child1 name="paul">Text goes here</child1> | 
 | 559 |    <child2 name="fred">More text</child2> | 
 | 560 |    </parent>""", 1) | 
 | 561 |  | 
 | 562 | The output from this program is:: | 
 | 563 |  | 
 | 564 |    Start element: parent {'id': 'top'} | 
 | 565 |    Start element: child1 {'name': 'paul'} | 
 | 566 |    Character data: 'Text goes here' | 
 | 567 |    End element: child1 | 
 | 568 |    Character data: '\n' | 
 | 569 |    Start element: child2 {'name': 'fred'} | 
 | 570 |    Character data: 'More text' | 
 | 571 |    End element: child2 | 
 | 572 |    Character data: '\n' | 
 | 573 |    End element: parent | 
 | 574 |  | 
 | 575 |  | 
 | 576 | .. _expat-content-models: | 
 | 577 |  | 
 | 578 | Content Model Descriptions | 
 | 579 | -------------------------- | 
 | 580 |  | 
 | 581 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 582 |  | 
 | 583 |  | 
 | 584 | Content modules are described using nested tuples.  Each tuple contains four | 
 | 585 | values: the type, the quantifier, the name, and a tuple of children.  Children | 
 | 586 | are simply additional content module descriptions. | 
 | 587 |  | 
 | 588 | The values of the first two fields are constants defined in the ``model`` object | 
 | 589 | of the :mod:`xml.parsers.expat` module.  These constants can be collected in two | 
 | 590 | groups: the model type group and the quantifier group. | 
 | 591 |  | 
 | 592 | The constants in the model type group are: | 
 | 593 |  | 
 | 594 |  | 
 | 595 | .. data:: XML_CTYPE_ANY | 
 | 596 |    :noindex: | 
 | 597 |  | 
 | 598 |    The element named by the model name was declared to have a content model of | 
 | 599 |    ``ANY``. | 
 | 600 |  | 
 | 601 |  | 
 | 602 | .. data:: XML_CTYPE_CHOICE | 
 | 603 |    :noindex: | 
 | 604 |  | 
 | 605 |    The named element allows a choice from a number of options; this is used for | 
 | 606 |    content models such as ``(A | B | C)``. | 
 | 607 |  | 
 | 608 |  | 
 | 609 | .. data:: XML_CTYPE_EMPTY | 
 | 610 |    :noindex: | 
 | 611 |  | 
 | 612 |    Elements which are declared to be ``EMPTY`` have this model type. | 
 | 613 |  | 
 | 614 |  | 
 | 615 | .. data:: XML_CTYPE_MIXED | 
 | 616 |    :noindex: | 
 | 617 |  | 
 | 618 |  | 
 | 619 | .. data:: XML_CTYPE_NAME | 
 | 620 |    :noindex: | 
 | 621 |  | 
 | 622 |  | 
 | 623 | .. data:: XML_CTYPE_SEQ | 
 | 624 |    :noindex: | 
 | 625 |  | 
 | 626 |    Models which represent a series of models which follow one after the other are | 
 | 627 |    indicated with this model type.  This is used for models such as ``(A, B, C)``. | 
 | 628 |  | 
 | 629 | The constants in the quantifier group are: | 
 | 630 |  | 
 | 631 |  | 
 | 632 | .. data:: XML_CQUANT_NONE | 
 | 633 |    :noindex: | 
 | 634 |  | 
 | 635 |    No modifier is given, so it can appear exactly once, as for ``A``. | 
 | 636 |  | 
 | 637 |  | 
 | 638 | .. data:: XML_CQUANT_OPT | 
 | 639 |    :noindex: | 
 | 640 |  | 
 | 641 |    The model is optional: it can appear once or not at all, as for ``A?``. | 
 | 642 |  | 
 | 643 |  | 
 | 644 | .. data:: XML_CQUANT_PLUS | 
 | 645 |    :noindex: | 
 | 646 |  | 
 | 647 |    The model must occur one or more times (like ``A+``). | 
 | 648 |  | 
 | 649 |  | 
 | 650 | .. data:: XML_CQUANT_REP | 
 | 651 |    :noindex: | 
 | 652 |  | 
 | 653 |    The model must occur zero or more times, as for ``A*``. | 
 | 654 |  | 
 | 655 |  | 
 | 656 | .. _expat-errors: | 
 | 657 |  | 
 | 658 | Expat error constants | 
 | 659 | --------------------- | 
 | 660 |  | 
 | 661 | The following constants are provided in the ``errors`` object of the | 
 | 662 | :mod:`xml.parsers.expat` module.  These constants are useful in interpreting | 
 | 663 | some of the attributes of the :exc:`ExpatError` exception objects raised when an | 
 | 664 | error has occurred. | 
 | 665 |  | 
 | 666 | The ``errors`` object has the following attributes: | 
 | 667 |  | 
 | 668 |  | 
 | 669 | .. data:: XML_ERROR_ASYNC_ENTITY | 
 | 670 |    :noindex: | 
 | 671 |  | 
 | 672 |  | 
 | 673 | .. data:: XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF | 
 | 674 |    :noindex: | 
 | 675 |  | 
 | 676 |    An entity reference in an attribute value referred to an external entity instead | 
 | 677 |    of an internal entity. | 
 | 678 |  | 
 | 679 |  | 
 | 680 | .. data:: XML_ERROR_BAD_CHAR_REF | 
 | 681 |    :noindex: | 
 | 682 |  | 
 | 683 |    A character reference referred to a character which is illegal in XML (for | 
 | 684 |    example, character ``0``, or '``�``'). | 
 | 685 |  | 
 | 686 |  | 
 | 687 | .. data:: XML_ERROR_BINARY_ENTITY_REF | 
 | 688 |    :noindex: | 
 | 689 |  | 
 | 690 |    An entity reference referred to an entity which was declared with a notation, so | 
 | 691 |    cannot be parsed. | 
 | 692 |  | 
 | 693 |  | 
 | 694 | .. data:: XML_ERROR_DUPLICATE_ATTRIBUTE | 
 | 695 |    :noindex: | 
 | 696 |  | 
 | 697 |    An attribute was used more than once in a start tag. | 
 | 698 |  | 
 | 699 |  | 
 | 700 | .. data:: XML_ERROR_INCORRECT_ENCODING | 
 | 701 |    :noindex: | 
 | 702 |  | 
 | 703 |  | 
 | 704 | .. data:: XML_ERROR_INVALID_TOKEN | 
 | 705 |    :noindex: | 
 | 706 |  | 
 | 707 |    Raised when an input byte could not properly be assigned to a character; for | 
 | 708 |    example, a NUL byte (value ``0``) in a UTF-8 input stream. | 
 | 709 |  | 
 | 710 |  | 
 | 711 | .. data:: XML_ERROR_JUNK_AFTER_DOC_ELEMENT | 
 | 712 |    :noindex: | 
 | 713 |  | 
 | 714 |    Something other than whitespace occurred after the document element. | 
 | 715 |  | 
 | 716 |  | 
 | 717 | .. data:: XML_ERROR_MISPLACED_XML_PI | 
 | 718 |    :noindex: | 
 | 719 |  | 
 | 720 |    An XML declaration was found somewhere other than the start of the input data. | 
 | 721 |  | 
 | 722 |  | 
 | 723 | .. data:: XML_ERROR_NO_ELEMENTS | 
 | 724 |    :noindex: | 
 | 725 |  | 
 | 726 |    The document contains no elements (XML requires all documents to contain exactly | 
 | 727 |    one top-level element).. | 
 | 728 |  | 
 | 729 |  | 
 | 730 | .. data:: XML_ERROR_NO_MEMORY | 
 | 731 |    :noindex: | 
 | 732 |  | 
 | 733 |    Expat was not able to allocate memory internally. | 
 | 734 |  | 
 | 735 |  | 
 | 736 | .. data:: XML_ERROR_PARAM_ENTITY_REF | 
 | 737 |    :noindex: | 
 | 738 |  | 
 | 739 |    A parameter entity reference was found where it was not allowed. | 
 | 740 |  | 
 | 741 |  | 
 | 742 | .. data:: XML_ERROR_PARTIAL_CHAR | 
 | 743 |    :noindex: | 
 | 744 |  | 
 | 745 |    An incomplete character was found in the input. | 
 | 746 |  | 
 | 747 |  | 
 | 748 | .. data:: XML_ERROR_RECURSIVE_ENTITY_REF | 
 | 749 |    :noindex: | 
 | 750 |  | 
 | 751 |    An entity reference contained another reference to the same entity; possibly via | 
 | 752 |    a different name, and possibly indirectly. | 
 | 753 |  | 
 | 754 |  | 
 | 755 | .. data:: XML_ERROR_SYNTAX | 
 | 756 |    :noindex: | 
 | 757 |  | 
 | 758 |    Some unspecified syntax error was encountered. | 
 | 759 |  | 
 | 760 |  | 
 | 761 | .. data:: XML_ERROR_TAG_MISMATCH | 
 | 762 |    :noindex: | 
 | 763 |  | 
 | 764 |    An end tag did not match the innermost open start tag. | 
 | 765 |  | 
 | 766 |  | 
 | 767 | .. data:: XML_ERROR_UNCLOSED_TOKEN | 
 | 768 |    :noindex: | 
 | 769 |  | 
 | 770 |    Some token (such as a start tag) was not closed before the end of the stream or | 
 | 771 |    the next token was encountered. | 
 | 772 |  | 
 | 773 |  | 
 | 774 | .. data:: XML_ERROR_UNDEFINED_ENTITY | 
 | 775 |    :noindex: | 
 | 776 |  | 
 | 777 |    A reference was made to a entity which was not defined. | 
 | 778 |  | 
 | 779 |  | 
 | 780 | .. data:: XML_ERROR_UNKNOWN_ENCODING | 
 | 781 |    :noindex: | 
 | 782 |  | 
 | 783 |    The document encoding is not supported by Expat. | 
 | 784 |  | 
 | 785 |  | 
 | 786 | .. data:: XML_ERROR_UNCLOSED_CDATA_SECTION | 
 | 787 |    :noindex: | 
 | 788 |  | 
 | 789 |    A CDATA marked section was not closed. | 
 | 790 |  | 
 | 791 |  | 
 | 792 | .. data:: XML_ERROR_EXTERNAL_ENTITY_HANDLING | 
 | 793 |    :noindex: | 
 | 794 |  | 
 | 795 |  | 
 | 796 | .. data:: XML_ERROR_NOT_STANDALONE | 
 | 797 |    :noindex: | 
 | 798 |  | 
 | 799 |    The parser determined that the document was not "standalone" though it declared | 
 | 800 |    itself to be in the XML declaration, and the :attr:`NotStandaloneHandler` was | 
 | 801 |    set and returned ``0``. | 
 | 802 |  | 
 | 803 |  | 
 | 804 | .. data:: XML_ERROR_UNEXPECTED_STATE | 
 | 805 |    :noindex: | 
 | 806 |  | 
 | 807 |  | 
 | 808 | .. data:: XML_ERROR_ENTITY_DECLARED_IN_PE | 
 | 809 |    :noindex: | 
 | 810 |  | 
 | 811 |  | 
 | 812 | .. data:: XML_ERROR_FEATURE_REQUIRES_XML_DTD | 
 | 813 |    :noindex: | 
 | 814 |  | 
 | 815 |    An operation was requested that requires DTD support to be compiled in, but | 
 | 816 |    Expat was configured without DTD support.  This should never be reported by a | 
 | 817 |    standard build of the :mod:`xml.parsers.expat` module. | 
 | 818 |  | 
 | 819 |  | 
 | 820 | .. data:: XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING | 
 | 821 |    :noindex: | 
 | 822 |  | 
 | 823 |    A behavioral change was requested after parsing started that can only be changed | 
 | 824 |    before parsing has started.  This is (currently) only raised by | 
 | 825 |    :meth:`UseForeignDTD`. | 
 | 826 |  | 
 | 827 |  | 
 | 828 | .. data:: XML_ERROR_UNBOUND_PREFIX | 
 | 829 |    :noindex: | 
 | 830 |  | 
 | 831 |    An undeclared prefix was found when namespace processing was enabled. | 
 | 832 |  | 
 | 833 |  | 
 | 834 | .. data:: XML_ERROR_UNDECLARING_PREFIX | 
 | 835 |    :noindex: | 
 | 836 |  | 
 | 837 |    The document attempted to remove the namespace declaration associated with a | 
 | 838 |    prefix. | 
 | 839 |  | 
 | 840 |  | 
 | 841 | .. data:: XML_ERROR_INCOMPLETE_PE | 
 | 842 |    :noindex: | 
 | 843 |  | 
 | 844 |    A parameter entity contained incomplete markup. | 
 | 845 |  | 
 | 846 |  | 
 | 847 | .. data:: XML_ERROR_XML_DECL | 
 | 848 |    :noindex: | 
 | 849 |  | 
 | 850 |    The document contained no document element at all. | 
 | 851 |  | 
 | 852 |  | 
 | 853 | .. data:: XML_ERROR_TEXT_DECL | 
 | 854 |    :noindex: | 
 | 855 |  | 
 | 856 |    There was an error parsing a text declaration in an external entity. | 
 | 857 |  | 
 | 858 |  | 
 | 859 | .. data:: XML_ERROR_PUBLICID | 
 | 860 |    :noindex: | 
 | 861 |  | 
 | 862 |    Characters were found in the public id that are not allowed. | 
 | 863 |  | 
 | 864 |  | 
 | 865 | .. data:: XML_ERROR_SUSPENDED | 
 | 866 |    :noindex: | 
 | 867 |  | 
 | 868 |    The requested operation was made on a suspended parser, but isn't allowed.  This | 
 | 869 |    includes attempts to provide additional input or to stop the parser. | 
 | 870 |  | 
 | 871 |  | 
 | 872 | .. data:: XML_ERROR_NOT_SUSPENDED | 
 | 873 |    :noindex: | 
 | 874 |  | 
 | 875 |    An attempt to resume the parser was made when the parser had not been suspended. | 
 | 876 |  | 
 | 877 |  | 
 | 878 | .. data:: XML_ERROR_ABORTED | 
 | 879 |    :noindex: | 
 | 880 |  | 
 | 881 |    This should not be reported to Python applications. | 
 | 882 |  | 
 | 883 |  | 
 | 884 | .. data:: XML_ERROR_FINISHED | 
 | 885 |    :noindex: | 
 | 886 |  | 
 | 887 |    The requested operation was made on a parser which was finished parsing input, | 
 | 888 |    but isn't allowed.  This includes attempts to provide additional input or to | 
 | 889 |    stop the parser. | 
 | 890 |  | 
 | 891 |  | 
 | 892 | .. data:: XML_ERROR_SUSPEND_PE | 
 | 893 |    :noindex: | 
 | 894 |  | 
| Mark Summerfield | 43da35d | 2008-03-17 08:28:15 +0000 | [diff] [blame] | 895 |  | 
 | 896 | .. rubric:: Footnotes | 
 | 897 |  | 
 | 898 | .. [#] The encoding string included in XML output should conform to the | 
 | 899 |    appropriate standards. For example, "UTF-8" is valid, but "UTF8" is | 
 | 900 |    not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl | 
 | 901 |    and http://www.iana.org/assignments/character-sets . | 
 | 902 |  |