| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 |  | 
 | 2 | :mod:`xml.parsers.expat` --- Fast XML parsing using Expat | 
 | 3 | ========================================================= | 
 | 4 |  | 
 | 5 | .. module:: xml.parsers.expat | 
 | 6 |    :synopsis: An interface to the Expat non-validating XML parser. | 
 | 7 | .. moduleauthor:: Paul Prescod <paul@prescod.net> | 
 | 8 |  | 
 | 9 |  | 
| Georg Brandl | b19be57 | 2007-12-29 10:57:00 +0000 | [diff] [blame] | 10 | .. Markup notes: | 
 | 11 |  | 
 | 12 |    Many of the attributes of the XMLParser objects are callbacks.  Since | 
 | 13 |    signature information must be presented, these are described using the method | 
 | 14 |    directive.  Since they are attributes which are set by client code, in-text | 
 | 15 |    references to these attributes should be marked using the :member: role. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 16 |  | 
 | 17 | .. versionadded:: 2.0 | 
 | 18 |  | 
 | 19 | .. index:: single: Expat | 
 | 20 |  | 
 | 21 | The :mod:`xml.parsers.expat` module is a Python interface to the Expat | 
 | 22 | non-validating XML parser. The module provides a single extension type, | 
 | 23 | :class:`xmlparser`, that represents the current state of an XML parser.  After | 
 | 24 | an :class:`xmlparser` object has been created, various attributes of the object | 
 | 25 | can be set to handler functions.  When an XML document is then fed to the | 
 | 26 | parser, the handler functions are called for the character data and markup in | 
 | 27 | the XML document. | 
 | 28 |  | 
 | 29 | .. index:: module: pyexpat | 
 | 30 |  | 
 | 31 | This module uses the :mod:`pyexpat` module to provide access to the Expat | 
 | 32 | parser.  Direct use of the :mod:`pyexpat` module is deprecated. | 
 | 33 |  | 
 | 34 | This module provides one exception and one type object: | 
 | 35 |  | 
 | 36 |  | 
 | 37 | .. exception:: ExpatError | 
 | 38 |  | 
 | 39 |    The exception raised when Expat reports an error.  See section | 
 | 40 |    :ref:`expaterror-objects` for more information on interpreting Expat errors. | 
 | 41 |  | 
 | 42 |  | 
 | 43 | .. exception:: error | 
 | 44 |  | 
 | 45 |    Alias for :exc:`ExpatError`. | 
 | 46 |  | 
 | 47 |  | 
 | 48 | .. data:: XMLParserType | 
 | 49 |  | 
 | 50 |    The type of the return values from the :func:`ParserCreate` function. | 
 | 51 |  | 
 | 52 | The :mod:`xml.parsers.expat` module contains two functions: | 
 | 53 |  | 
 | 54 |  | 
 | 55 | .. function:: ErrorString(errno) | 
 | 56 |  | 
 | 57 |    Returns an explanatory string for a given error number *errno*. | 
 | 58 |  | 
 | 59 |  | 
 | 60 | .. function:: ParserCreate([encoding[, namespace_separator]]) | 
 | 61 |  | 
 | 62 |    Creates and returns a new :class:`xmlparser` object.   *encoding*, if specified, | 
 | 63 |    must be a string naming the encoding  used by the XML data.  Expat doesn't | 
 | 64 |    support as many encodings as Python does, and its repertoire of encodings can't | 
 | 65 |    be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII.  If | 
| Mark Summerfield | 43da35d | 2008-03-17 08:28:15 +0000 | [diff] [blame] | 66 |    *encoding* [1]_ is given it will override the implicit or explicit encoding of the | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 67 |    document. | 
 | 68 |  | 
 | 69 |    Expat can optionally do XML namespace processing for you, enabled by providing a | 
 | 70 |    value for *namespace_separator*.  The value must be a one-character string; a | 
 | 71 |    :exc:`ValueError` will be raised if the string has an illegal length (``None`` | 
 | 72 |    is considered the same as omission).  When namespace processing is enabled, | 
 | 73 |    element type names and attribute names that belong to a namespace will be | 
 | 74 |    expanded.  The element name passed to the element handlers | 
 | 75 |    :attr:`StartElementHandler` and :attr:`EndElementHandler` will be the | 
 | 76 |    concatenation of the namespace URI, the namespace separator character, and the | 
 | 77 |    local part of the name.  If the namespace separator is a zero byte (``chr(0)``) | 
 | 78 |    then the namespace URI and the local part will be concatenated without any | 
 | 79 |    separator. | 
 | 80 |  | 
 | 81 |    For example, if *namespace_separator* is set to a space character (``' '``) and | 
 | 82 |    the following document is parsed:: | 
 | 83 |  | 
 | 84 |       <?xml version="1.0"?> | 
 | 85 |       <root xmlns    = "http://default-namespace.org/" | 
 | 86 |             xmlns:py = "http://www.python.org/ns/"> | 
 | 87 |         <py:elem1 /> | 
 | 88 |         <elem2 xmlns="" /> | 
 | 89 |       </root> | 
 | 90 |  | 
 | 91 |    :attr:`StartElementHandler` will receive the following strings for each | 
 | 92 |    element:: | 
 | 93 |  | 
 | 94 |       http://default-namespace.org/ root | 
 | 95 |       http://www.python.org/ns/ elem1 | 
 | 96 |       elem2 | 
 | 97 |  | 
 | 98 |  | 
 | 99 | .. seealso:: | 
 | 100 |  | 
 | 101 |    `The Expat XML Parser <http://www.libexpat.org/>`_ | 
 | 102 |       Home page of the Expat project. | 
 | 103 |  | 
 | 104 |  | 
 | 105 | .. _xmlparser-objects: | 
 | 106 |  | 
 | 107 | XMLParser Objects | 
 | 108 | ----------------- | 
 | 109 |  | 
 | 110 | :class:`xmlparser` objects have the following methods: | 
 | 111 |  | 
 | 112 |  | 
 | 113 | .. method:: xmlparser.Parse(data[, isfinal]) | 
 | 114 |  | 
 | 115 |    Parses the contents of the string *data*, calling the appropriate handler | 
 | 116 |    functions to process the parsed data.  *isfinal* must be true on the final call | 
 | 117 |    to this method.  *data* can be the empty string at any time. | 
 | 118 |  | 
 | 119 |  | 
 | 120 | .. method:: xmlparser.ParseFile(file) | 
 | 121 |  | 
 | 122 |    Parse XML data reading from the object *file*.  *file* only needs to provide | 
 | 123 |    the ``read(nbytes)`` method, returning the empty string when there's no more | 
 | 124 |    data. | 
 | 125 |  | 
 | 126 |  | 
 | 127 | .. method:: xmlparser.SetBase(base) | 
 | 128 |  | 
 | 129 |    Sets the base to be used for resolving relative URIs in system identifiers in | 
 | 130 |    declarations.  Resolving relative identifiers is left to the application: this | 
 | 131 |    value will be passed through as the *base* argument to the | 
 | 132 |    :func:`ExternalEntityRefHandler`, :func:`NotationDeclHandler`, and | 
 | 133 |    :func:`UnparsedEntityDeclHandler` functions. | 
 | 134 |  | 
 | 135 |  | 
 | 136 | .. method:: xmlparser.GetBase() | 
 | 137 |  | 
 | 138 |    Returns a string containing the base set by a previous call to :meth:`SetBase`, | 
 | 139 |    or ``None`` if  :meth:`SetBase` hasn't been called. | 
 | 140 |  | 
 | 141 |  | 
 | 142 | .. method:: xmlparser.GetInputContext() | 
 | 143 |  | 
 | 144 |    Returns the input data that generated the current event as a string. The data is | 
 | 145 |    in the encoding of the entity which contains the text. When called while an | 
 | 146 |    event handler is not active, the return value is ``None``. | 
 | 147 |  | 
 | 148 |    .. versionadded:: 2.1 | 
 | 149 |  | 
 | 150 |  | 
 | 151 | .. method:: xmlparser.ExternalEntityParserCreate(context[, encoding]) | 
 | 152 |  | 
 | 153 |    Create a "child" parser which can be used to parse an external parsed entity | 
 | 154 |    referred to by content parsed by the parent parser.  The *context* parameter | 
 | 155 |    should be the string passed to the :meth:`ExternalEntityRefHandler` handler | 
 | 156 |    function, described below. The child parser is created with the | 
 | 157 |    :attr:`ordered_attributes`, :attr:`returns_unicode` and | 
 | 158 |    :attr:`specified_attributes` set to the values of this parser. | 
 | 159 |  | 
 | 160 |  | 
 | 161 | .. method:: xmlparser.UseForeignDTD([flag]) | 
 | 162 |  | 
 | 163 |    Calling this with a true value for *flag* (the default) will cause Expat to call | 
 | 164 |    the :attr:`ExternalEntityRefHandler` with :const:`None` for all arguments to | 
 | 165 |    allow an alternate DTD to be loaded.  If the document does not contain a | 
 | 166 |    document type declaration, the :attr:`ExternalEntityRefHandler` will still be | 
 | 167 |    called, but the :attr:`StartDoctypeDeclHandler` and | 
 | 168 |    :attr:`EndDoctypeDeclHandler` will not be called. | 
 | 169 |  | 
 | 170 |    Passing a false value for *flag* will cancel a previous call that passed a true | 
 | 171 |    value, but otherwise has no effect. | 
 | 172 |  | 
 | 173 |    This method can only be called before the :meth:`Parse` or :meth:`ParseFile` | 
 | 174 |    methods are called; calling it after either of those have been called causes | 
 | 175 |    :exc:`ExpatError` to be raised with the :attr:`code` attribute set to | 
 | 176 |    :const:`errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING`. | 
 | 177 |  | 
 | 178 |    .. versionadded:: 2.3 | 
 | 179 |  | 
 | 180 | :class:`xmlparser` objects have the following attributes: | 
 | 181 |  | 
 | 182 |  | 
 | 183 | .. attribute:: xmlparser.buffer_size | 
 | 184 |  | 
| Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 185 |    The size of the buffer used when :attr:`buffer_text` is true. | 
 | 186 |    A new buffer size can be set by assigning a new integer value | 
 | 187 |    to this attribute. | 
| Andrew M. Kuchling | e0a49b6 | 2008-01-08 14:30:55 +0000 | [diff] [blame] | 188 |    When the size is changed, the buffer will be flushed. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 189 |  | 
 | 190 |    .. versionadded:: 2.3 | 
 | 191 |  | 
| Andrew M. Kuchling | e0a49b6 | 2008-01-08 14:30:55 +0000 | [diff] [blame] | 192 |    .. versionchanged:: 2.6 | 
 | 193 |       The buffer size can now be changed. | 
| Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 194 |  | 
 | 195 | .. attribute:: xmlparser.buffer_text | 
 | 196 |  | 
 | 197 |    Setting this to true causes the :class:`xmlparser` object to buffer textual | 
 | 198 |    content returned by Expat to avoid multiple calls to the | 
 | 199 |    :meth:`CharacterDataHandler` callback whenever possible.  This can improve | 
 | 200 |    performance substantially since Expat normally breaks character data into chunks | 
 | 201 |    at every line ending.  This attribute is false by default, and may be changed at | 
 | 202 |    any time. | 
 | 203 |  | 
 | 204 |    .. versionadded:: 2.3 | 
 | 205 |  | 
 | 206 |  | 
 | 207 | .. attribute:: xmlparser.buffer_used | 
 | 208 |  | 
 | 209 |    If :attr:`buffer_text` is enabled, the number of bytes stored in the buffer. | 
 | 210 |    These bytes represent UTF-8 encoded text.  This attribute has no meaningful | 
 | 211 |    interpretation when :attr:`buffer_text` is false. | 
 | 212 |  | 
 | 213 |    .. versionadded:: 2.3 | 
 | 214 |  | 
 | 215 |  | 
 | 216 | .. attribute:: xmlparser.ordered_attributes | 
 | 217 |  | 
 | 218 |    Setting this attribute to a non-zero integer causes the attributes to be | 
 | 219 |    reported as a list rather than a dictionary.  The attributes are presented in | 
 | 220 |    the order found in the document text.  For each attribute, two list entries are | 
 | 221 |    presented: the attribute name and the attribute value.  (Older versions of this | 
 | 222 |    module also used this format.)  By default, this attribute is false; it may be | 
 | 223 |    changed at any time. | 
 | 224 |  | 
 | 225 |    .. versionadded:: 2.1 | 
 | 226 |  | 
 | 227 |  | 
 | 228 | .. attribute:: xmlparser.returns_unicode | 
 | 229 |  | 
 | 230 |    If this attribute is set to a non-zero integer, the handler functions will be | 
 | 231 |    passed Unicode strings.  If :attr:`returns_unicode` is :const:`False`, 8-bit | 
 | 232 |    strings containing UTF-8 encoded data will be passed to the handlers.  This is | 
 | 233 |    :const:`True` by default when Python is built with Unicode support. | 
 | 234 |  | 
 | 235 |    .. versionchanged:: 1.6 | 
 | 236 |       Can be changed at any time to affect the result type. | 
 | 237 |  | 
 | 238 |  | 
 | 239 | .. attribute:: xmlparser.specified_attributes | 
 | 240 |  | 
 | 241 |    If set to a non-zero integer, the parser will report only those attributes which | 
 | 242 |    were specified in the document instance and not those which were derived from | 
 | 243 |    attribute declarations.  Applications which set this need to be especially | 
 | 244 |    careful to use what additional information is available from the declarations as | 
 | 245 |    needed to comply with the standards for the behavior of XML processors.  By | 
 | 246 |    default, this attribute is false; it may be changed at any time. | 
 | 247 |  | 
 | 248 |    .. versionadded:: 2.1 | 
 | 249 |  | 
 | 250 | The following attributes contain values relating to the most recent error | 
 | 251 | encountered by an :class:`xmlparser` object, and will only have correct values | 
 | 252 | once a call to :meth:`Parse` or :meth:`ParseFile` has raised a | 
 | 253 | :exc:`xml.parsers.expat.ExpatError` exception. | 
 | 254 |  | 
 | 255 |  | 
 | 256 | .. attribute:: xmlparser.ErrorByteIndex | 
 | 257 |  | 
 | 258 |    Byte index at which an error occurred. | 
 | 259 |  | 
 | 260 |  | 
 | 261 | .. attribute:: xmlparser.ErrorCode | 
 | 262 |  | 
 | 263 |    Numeric code specifying the problem.  This value can be passed to the | 
 | 264 |    :func:`ErrorString` function, or compared to one of the constants defined in the | 
 | 265 |    ``errors`` object. | 
 | 266 |  | 
 | 267 |  | 
 | 268 | .. attribute:: xmlparser.ErrorColumnNumber | 
 | 269 |  | 
 | 270 |    Column number at which an error occurred. | 
 | 271 |  | 
 | 272 |  | 
 | 273 | .. attribute:: xmlparser.ErrorLineNumber | 
 | 274 |  | 
 | 275 |    Line number at which an error occurred. | 
 | 276 |  | 
 | 277 | The following attributes contain values relating to the current parse location | 
 | 278 | in an :class:`xmlparser` object.  During a callback reporting a parse event they | 
 | 279 | indicate the location of the first of the sequence of characters that generated | 
 | 280 | the event.  When called outside of a callback, the position indicated will be | 
 | 281 | just past the last parse event (regardless of whether there was an associated | 
 | 282 | callback). | 
 | 283 |  | 
 | 284 | .. versionadded:: 2.4 | 
 | 285 |  | 
 | 286 |  | 
 | 287 | .. attribute:: xmlparser.CurrentByteIndex | 
 | 288 |  | 
 | 289 |    Current byte index in the parser input. | 
 | 290 |  | 
 | 291 |  | 
 | 292 | .. attribute:: xmlparser.CurrentColumnNumber | 
 | 293 |  | 
 | 294 |    Current column number in the parser input. | 
 | 295 |  | 
 | 296 |  | 
 | 297 | .. attribute:: xmlparser.CurrentLineNumber | 
 | 298 |  | 
 | 299 |    Current line number in the parser input. | 
 | 300 |  | 
 | 301 | Here is the list of handlers that can be set.  To set a handler on an | 
 | 302 | :class:`xmlparser` object *o*, use ``o.handlername = func``.  *handlername* must | 
 | 303 | be taken from the following list, and *func* must be a callable object accepting | 
 | 304 | the correct number of arguments.  The arguments are all strings, unless | 
 | 305 | otherwise stated. | 
 | 306 |  | 
 | 307 |  | 
 | 308 | .. method:: xmlparser.XmlDeclHandler(version, encoding, standalone) | 
 | 309 |  | 
 | 310 |    Called when the XML declaration is parsed.  The XML declaration is the | 
 | 311 |    (optional) declaration of the applicable version of the XML recommendation, the | 
 | 312 |    encoding of the document text, and an optional "standalone" declaration. | 
 | 313 |    *version* and *encoding* will be strings of the type dictated by the | 
 | 314 |    :attr:`returns_unicode` attribute, and *standalone* will be ``1`` if the | 
 | 315 |    document is declared standalone, ``0`` if it is declared not to be standalone, | 
 | 316 |    or ``-1`` if the standalone clause was omitted. This is only available with | 
 | 317 |    Expat version 1.95.0 or newer. | 
 | 318 |  | 
 | 319 |    .. versionadded:: 2.1 | 
 | 320 |  | 
 | 321 |  | 
 | 322 | .. method:: xmlparser.StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset) | 
 | 323 |  | 
 | 324 |    Called when Expat begins parsing the document type declaration (``<!DOCTYPE | 
 | 325 |    ...``).  The *doctypeName* is provided exactly as presented.  The *systemId* and | 
 | 326 |    *publicId* parameters give the system and public identifiers if specified, or | 
 | 327 |    ``None`` if omitted.  *has_internal_subset* will be true if the document | 
 | 328 |    contains and internal document declaration subset. This requires Expat version | 
 | 329 |    1.2 or newer. | 
 | 330 |  | 
 | 331 |  | 
 | 332 | .. method:: xmlparser.EndDoctypeDeclHandler() | 
 | 333 |  | 
 | 334 |    Called when Expat is done parsing the document type declaration. This requires | 
 | 335 |    Expat version 1.2 or newer. | 
 | 336 |  | 
 | 337 |  | 
 | 338 | .. method:: xmlparser.ElementDeclHandler(name, model) | 
 | 339 |  | 
 | 340 |    Called once for each element type declaration.  *name* is the name of the | 
 | 341 |    element type, and *model* is a representation of the content model. | 
 | 342 |  | 
 | 343 |  | 
 | 344 | .. method:: xmlparser.AttlistDeclHandler(elname, attname, type, default, required) | 
 | 345 |  | 
 | 346 |    Called for each declared attribute for an element type.  If an attribute list | 
 | 347 |    declaration declares three attributes, this handler is called three times, once | 
 | 348 |    for each attribute.  *elname* is the name of the element to which the | 
 | 349 |    declaration applies and *attname* is the name of the attribute declared.  The | 
 | 350 |    attribute type is a string passed as *type*; the possible values are | 
 | 351 |    ``'CDATA'``, ``'ID'``, ``'IDREF'``, ... *default* gives the default value for | 
 | 352 |    the attribute used when the attribute is not specified by the document instance, | 
 | 353 |    or ``None`` if there is no default value (``#IMPLIED`` values).  If the | 
 | 354 |    attribute is required to be given in the document instance, *required* will be | 
 | 355 |    true. This requires Expat version 1.95.0 or newer. | 
 | 356 |  | 
 | 357 |  | 
 | 358 | .. method:: xmlparser.StartElementHandler(name, attributes) | 
 | 359 |  | 
 | 360 |    Called for the start of every element.  *name* is a string containing the | 
 | 361 |    element name, and *attributes* is a dictionary mapping attribute names to their | 
 | 362 |    values. | 
 | 363 |  | 
 | 364 |  | 
 | 365 | .. method:: xmlparser.EndElementHandler(name) | 
 | 366 |  | 
 | 367 |    Called for the end of every element. | 
 | 368 |  | 
 | 369 |  | 
 | 370 | .. method:: xmlparser.ProcessingInstructionHandler(target, data) | 
 | 371 |  | 
 | 372 |    Called for every processing instruction. | 
 | 373 |  | 
 | 374 |  | 
 | 375 | .. method:: xmlparser.CharacterDataHandler(data) | 
 | 376 |  | 
 | 377 |    Called for character data.  This will be called for normal character data, CDATA | 
 | 378 |    marked content, and ignorable whitespace.  Applications which must distinguish | 
 | 379 |    these cases can use the :attr:`StartCdataSectionHandler`, | 
 | 380 |    :attr:`EndCdataSectionHandler`, and :attr:`ElementDeclHandler` callbacks to | 
 | 381 |    collect the required information. | 
 | 382 |  | 
 | 383 |  | 
 | 384 | .. method:: xmlparser.UnparsedEntityDeclHandler(entityName, base, systemId, publicId, notationName) | 
 | 385 |  | 
 | 386 |    Called for unparsed (NDATA) entity declarations.  This is only present for | 
 | 387 |    version 1.2 of the Expat library; for more recent versions, use | 
 | 388 |    :attr:`EntityDeclHandler` instead.  (The underlying function in the Expat | 
 | 389 |    library has been declared obsolete.) | 
 | 390 |  | 
 | 391 |  | 
 | 392 | .. method:: xmlparser.EntityDeclHandler(entityName, is_parameter_entity, value, base, systemId, publicId, notationName) | 
 | 393 |  | 
 | 394 |    Called for all entity declarations.  For parameter and internal entities, | 
 | 395 |    *value* will be a string giving the declared contents of the entity; this will | 
 | 396 |    be ``None`` for external entities.  The *notationName* parameter will be | 
 | 397 |    ``None`` for parsed entities, and the name of the notation for unparsed | 
 | 398 |    entities. *is_parameter_entity* will be true if the entity is a parameter entity | 
 | 399 |    or false for general entities (most applications only need to be concerned with | 
 | 400 |    general entities). This is only available starting with version 1.95.0 of the | 
 | 401 |    Expat library. | 
 | 402 |  | 
 | 403 |    .. versionadded:: 2.1 | 
 | 404 |  | 
 | 405 |  | 
 | 406 | .. method:: xmlparser.NotationDeclHandler(notationName, base, systemId, publicId) | 
 | 407 |  | 
 | 408 |    Called for notation declarations.  *notationName*, *base*, and *systemId*, and | 
 | 409 |    *publicId* are strings if given.  If the public identifier is omitted, | 
 | 410 |    *publicId* will be ``None``. | 
 | 411 |  | 
 | 412 |  | 
 | 413 | .. method:: xmlparser.StartNamespaceDeclHandler(prefix, uri) | 
 | 414 |  | 
 | 415 |    Called when an element contains a namespace declaration.  Namespace declarations | 
 | 416 |    are processed before the :attr:`StartElementHandler` is called for the element | 
 | 417 |    on which declarations are placed. | 
 | 418 |  | 
 | 419 |  | 
 | 420 | .. method:: xmlparser.EndNamespaceDeclHandler(prefix) | 
 | 421 |  | 
 | 422 |    Called when the closing tag is reached for an element  that contained a | 
 | 423 |    namespace declaration.  This is called once for each namespace declaration on | 
 | 424 |    the element in the reverse of the order for which the | 
 | 425 |    :attr:`StartNamespaceDeclHandler` was called to indicate the start of each | 
 | 426 |    namespace declaration's scope.  Calls to this handler are made after the | 
 | 427 |    corresponding :attr:`EndElementHandler` for the end of the element. | 
 | 428 |  | 
 | 429 |  | 
 | 430 | .. method:: xmlparser.CommentHandler(data) | 
 | 431 |  | 
 | 432 |    Called for comments.  *data* is the text of the comment, excluding the leading | 
 | 433 |    '``<!-``\ ``-``' and trailing '``-``\ ``->``'. | 
 | 434 |  | 
 | 435 |  | 
 | 436 | .. method:: xmlparser.StartCdataSectionHandler() | 
 | 437 |  | 
 | 438 |    Called at the start of a CDATA section.  This and :attr:`EndCdataSectionHandler` | 
 | 439 |    are needed to be able to identify the syntactical start and end for CDATA | 
 | 440 |    sections. | 
 | 441 |  | 
 | 442 |  | 
 | 443 | .. method:: xmlparser.EndCdataSectionHandler() | 
 | 444 |  | 
 | 445 |    Called at the end of a CDATA section. | 
 | 446 |  | 
 | 447 |  | 
 | 448 | .. method:: xmlparser.DefaultHandler(data) | 
 | 449 |  | 
 | 450 |    Called for any characters in the XML document for which no applicable handler | 
 | 451 |    has been specified.  This means characters that are part of a construct which | 
 | 452 |    could be reported, but for which no handler has been supplied. | 
 | 453 |  | 
 | 454 |  | 
 | 455 | .. method:: xmlparser.DefaultHandlerExpand(data) | 
 | 456 |  | 
 | 457 |    This is the same as the :func:`DefaultHandler`,  but doesn't inhibit expansion | 
 | 458 |    of internal entities. The entity reference will not be passed to the default | 
 | 459 |    handler. | 
 | 460 |  | 
 | 461 |  | 
 | 462 | .. method:: xmlparser.NotStandaloneHandler() | 
 | 463 |  | 
 | 464 |    Called if the XML document hasn't been declared as being a standalone document. | 
 | 465 |    This happens when there is an external subset or a reference to a parameter | 
 | 466 |    entity, but the XML declaration does not set standalone to ``yes`` in an XML | 
 | 467 |    declaration.  If this handler returns ``0``, then the parser will throw an | 
 | 468 |    :const:`XML_ERROR_NOT_STANDALONE` error.  If this handler is not set, no | 
 | 469 |    exception is raised by the parser for this condition. | 
 | 470 |  | 
 | 471 |  | 
 | 472 | .. method:: xmlparser.ExternalEntityRefHandler(context, base, systemId, publicId) | 
 | 473 |  | 
 | 474 |    Called for references to external entities.  *base* is the current base, as set | 
 | 475 |    by a previous call to :meth:`SetBase`.  The public and system identifiers, | 
 | 476 |    *systemId* and *publicId*, are strings if given; if the public identifier is not | 
 | 477 |    given, *publicId* will be ``None``.  The *context* value is opaque and should | 
 | 478 |    only be used as described below. | 
 | 479 |  | 
 | 480 |    For external entities to be parsed, this handler must be implemented. It is | 
 | 481 |    responsible for creating the sub-parser using | 
 | 482 |    ``ExternalEntityParserCreate(context)``, initializing it with the appropriate | 
 | 483 |    callbacks, and parsing the entity.  This handler should return an integer; if it | 
 | 484 |    returns ``0``, the parser will throw an | 
 | 485 |    :const:`XML_ERROR_EXTERNAL_ENTITY_HANDLING` error, otherwise parsing will | 
 | 486 |    continue. | 
 | 487 |  | 
 | 488 |    If this handler is not provided, external entities are reported by the | 
 | 489 |    :attr:`DefaultHandler` callback, if provided. | 
 | 490 |  | 
 | 491 |  | 
 | 492 | .. _expaterror-objects: | 
 | 493 |  | 
 | 494 | ExpatError Exceptions | 
 | 495 | --------------------- | 
 | 496 |  | 
 | 497 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 498 |  | 
 | 499 |  | 
 | 500 | :exc:`ExpatError` exceptions have a number of interesting attributes: | 
 | 501 |  | 
 | 502 |  | 
 | 503 | .. attribute:: ExpatError.code | 
 | 504 |  | 
 | 505 |    Expat's internal error number for the specific error.  This will match one of | 
 | 506 |    the constants defined in the ``errors`` object from this module. | 
 | 507 |  | 
 | 508 |    .. versionadded:: 2.1 | 
 | 509 |  | 
 | 510 |  | 
 | 511 | .. attribute:: ExpatError.lineno | 
 | 512 |  | 
 | 513 |    Line number on which the error was detected.  The first line is numbered ``1``. | 
 | 514 |  | 
 | 515 |    .. versionadded:: 2.1 | 
 | 516 |  | 
 | 517 |  | 
 | 518 | .. attribute:: ExpatError.offset | 
 | 519 |  | 
 | 520 |    Character offset into the line where the error occurred.  The first column is | 
 | 521 |    numbered ``0``. | 
 | 522 |  | 
 | 523 |    .. versionadded:: 2.1 | 
 | 524 |  | 
 | 525 |  | 
 | 526 | .. _expat-example: | 
 | 527 |  | 
 | 528 | Example | 
 | 529 | ------- | 
 | 530 |  | 
 | 531 | The following program defines three handlers that just print out their | 
 | 532 | arguments. :: | 
 | 533 |  | 
 | 534 |    import xml.parsers.expat | 
 | 535 |  | 
 | 536 |    # 3 handler functions | 
 | 537 |    def start_element(name, attrs): | 
 | 538 |        print 'Start element:', name, attrs | 
 | 539 |    def end_element(name): | 
 | 540 |        print 'End element:', name | 
 | 541 |    def char_data(data): | 
 | 542 |        print 'Character data:', repr(data) | 
 | 543 |  | 
 | 544 |    p = xml.parsers.expat.ParserCreate() | 
 | 545 |  | 
 | 546 |    p.StartElementHandler = start_element | 
 | 547 |    p.EndElementHandler = end_element | 
 | 548 |    p.CharacterDataHandler = char_data | 
 | 549 |  | 
 | 550 |    p.Parse("""<?xml version="1.0"?> | 
 | 551 |    <parent id="top"><child1 name="paul">Text goes here</child1> | 
 | 552 |    <child2 name="fred">More text</child2> | 
 | 553 |    </parent>""", 1) | 
 | 554 |  | 
 | 555 | The output from this program is:: | 
 | 556 |  | 
 | 557 |    Start element: parent {'id': 'top'} | 
 | 558 |    Start element: child1 {'name': 'paul'} | 
 | 559 |    Character data: 'Text goes here' | 
 | 560 |    End element: child1 | 
 | 561 |    Character data: '\n' | 
 | 562 |    Start element: child2 {'name': 'fred'} | 
 | 563 |    Character data: 'More text' | 
 | 564 |    End element: child2 | 
 | 565 |    Character data: '\n' | 
 | 566 |    End element: parent | 
 | 567 |  | 
 | 568 |  | 
 | 569 | .. _expat-content-models: | 
 | 570 |  | 
 | 571 | Content Model Descriptions | 
 | 572 | -------------------------- | 
 | 573 |  | 
 | 574 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 575 |  | 
 | 576 |  | 
 | 577 | Content modules are described using nested tuples.  Each tuple contains four | 
 | 578 | values: the type, the quantifier, the name, and a tuple of children.  Children | 
 | 579 | are simply additional content module descriptions. | 
 | 580 |  | 
 | 581 | The values of the first two fields are constants defined in the ``model`` object | 
 | 582 | of the :mod:`xml.parsers.expat` module.  These constants can be collected in two | 
 | 583 | groups: the model type group and the quantifier group. | 
 | 584 |  | 
 | 585 | The constants in the model type group are: | 
 | 586 |  | 
 | 587 |  | 
 | 588 | .. data:: XML_CTYPE_ANY | 
 | 589 |    :noindex: | 
 | 590 |  | 
 | 591 |    The element named by the model name was declared to have a content model of | 
 | 592 |    ``ANY``. | 
 | 593 |  | 
 | 594 |  | 
 | 595 | .. data:: XML_CTYPE_CHOICE | 
 | 596 |    :noindex: | 
 | 597 |  | 
 | 598 |    The named element allows a choice from a number of options; this is used for | 
 | 599 |    content models such as ``(A | B | C)``. | 
 | 600 |  | 
 | 601 |  | 
 | 602 | .. data:: XML_CTYPE_EMPTY | 
 | 603 |    :noindex: | 
 | 604 |  | 
 | 605 |    Elements which are declared to be ``EMPTY`` have this model type. | 
 | 606 |  | 
 | 607 |  | 
 | 608 | .. data:: XML_CTYPE_MIXED | 
 | 609 |    :noindex: | 
 | 610 |  | 
 | 611 |  | 
 | 612 | .. data:: XML_CTYPE_NAME | 
 | 613 |    :noindex: | 
 | 614 |  | 
 | 615 |  | 
 | 616 | .. data:: XML_CTYPE_SEQ | 
 | 617 |    :noindex: | 
 | 618 |  | 
 | 619 |    Models which represent a series of models which follow one after the other are | 
 | 620 |    indicated with this model type.  This is used for models such as ``(A, B, C)``. | 
 | 621 |  | 
 | 622 | The constants in the quantifier group are: | 
 | 623 |  | 
 | 624 |  | 
 | 625 | .. data:: XML_CQUANT_NONE | 
 | 626 |    :noindex: | 
 | 627 |  | 
 | 628 |    No modifier is given, so it can appear exactly once, as for ``A``. | 
 | 629 |  | 
 | 630 |  | 
 | 631 | .. data:: XML_CQUANT_OPT | 
 | 632 |    :noindex: | 
 | 633 |  | 
 | 634 |    The model is optional: it can appear once or not at all, as for ``A?``. | 
 | 635 |  | 
 | 636 |  | 
 | 637 | .. data:: XML_CQUANT_PLUS | 
 | 638 |    :noindex: | 
 | 639 |  | 
 | 640 |    The model must occur one or more times (like ``A+``). | 
 | 641 |  | 
 | 642 |  | 
 | 643 | .. data:: XML_CQUANT_REP | 
 | 644 |    :noindex: | 
 | 645 |  | 
 | 646 |    The model must occur zero or more times, as for ``A*``. | 
 | 647 |  | 
 | 648 |  | 
 | 649 | .. _expat-errors: | 
 | 650 |  | 
 | 651 | Expat error constants | 
 | 652 | --------------------- | 
 | 653 |  | 
 | 654 | The following constants are provided in the ``errors`` object of the | 
 | 655 | :mod:`xml.parsers.expat` module.  These constants are useful in interpreting | 
 | 656 | some of the attributes of the :exc:`ExpatError` exception objects raised when an | 
 | 657 | error has occurred. | 
 | 658 |  | 
 | 659 | The ``errors`` object has the following attributes: | 
 | 660 |  | 
 | 661 |  | 
 | 662 | .. data:: XML_ERROR_ASYNC_ENTITY | 
 | 663 |    :noindex: | 
 | 664 |  | 
 | 665 |  | 
 | 666 | .. data:: XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF | 
 | 667 |    :noindex: | 
 | 668 |  | 
 | 669 |    An entity reference in an attribute value referred to an external entity instead | 
 | 670 |    of an internal entity. | 
 | 671 |  | 
 | 672 |  | 
 | 673 | .. data:: XML_ERROR_BAD_CHAR_REF | 
 | 674 |    :noindex: | 
 | 675 |  | 
 | 676 |    A character reference referred to a character which is illegal in XML (for | 
 | 677 |    example, character ``0``, or '``�``'). | 
 | 678 |  | 
 | 679 |  | 
 | 680 | .. data:: XML_ERROR_BINARY_ENTITY_REF | 
 | 681 |    :noindex: | 
 | 682 |  | 
 | 683 |    An entity reference referred to an entity which was declared with a notation, so | 
 | 684 |    cannot be parsed. | 
 | 685 |  | 
 | 686 |  | 
 | 687 | .. data:: XML_ERROR_DUPLICATE_ATTRIBUTE | 
 | 688 |    :noindex: | 
 | 689 |  | 
 | 690 |    An attribute was used more than once in a start tag. | 
 | 691 |  | 
 | 692 |  | 
 | 693 | .. data:: XML_ERROR_INCORRECT_ENCODING | 
 | 694 |    :noindex: | 
 | 695 |  | 
 | 696 |  | 
 | 697 | .. data:: XML_ERROR_INVALID_TOKEN | 
 | 698 |    :noindex: | 
 | 699 |  | 
 | 700 |    Raised when an input byte could not properly be assigned to a character; for | 
 | 701 |    example, a NUL byte (value ``0``) in a UTF-8 input stream. | 
 | 702 |  | 
 | 703 |  | 
 | 704 | .. data:: XML_ERROR_JUNK_AFTER_DOC_ELEMENT | 
 | 705 |    :noindex: | 
 | 706 |  | 
 | 707 |    Something other than whitespace occurred after the document element. | 
 | 708 |  | 
 | 709 |  | 
 | 710 | .. data:: XML_ERROR_MISPLACED_XML_PI | 
 | 711 |    :noindex: | 
 | 712 |  | 
 | 713 |    An XML declaration was found somewhere other than the start of the input data. | 
 | 714 |  | 
 | 715 |  | 
 | 716 | .. data:: XML_ERROR_NO_ELEMENTS | 
 | 717 |    :noindex: | 
 | 718 |  | 
 | 719 |    The document contains no elements (XML requires all documents to contain exactly | 
 | 720 |    one top-level element).. | 
 | 721 |  | 
 | 722 |  | 
 | 723 | .. data:: XML_ERROR_NO_MEMORY | 
 | 724 |    :noindex: | 
 | 725 |  | 
 | 726 |    Expat was not able to allocate memory internally. | 
 | 727 |  | 
 | 728 |  | 
 | 729 | .. data:: XML_ERROR_PARAM_ENTITY_REF | 
 | 730 |    :noindex: | 
 | 731 |  | 
 | 732 |    A parameter entity reference was found where it was not allowed. | 
 | 733 |  | 
 | 734 |  | 
 | 735 | .. data:: XML_ERROR_PARTIAL_CHAR | 
 | 736 |    :noindex: | 
 | 737 |  | 
 | 738 |    An incomplete character was found in the input. | 
 | 739 |  | 
 | 740 |  | 
 | 741 | .. data:: XML_ERROR_RECURSIVE_ENTITY_REF | 
 | 742 |    :noindex: | 
 | 743 |  | 
 | 744 |    An entity reference contained another reference to the same entity; possibly via | 
 | 745 |    a different name, and possibly indirectly. | 
 | 746 |  | 
 | 747 |  | 
 | 748 | .. data:: XML_ERROR_SYNTAX | 
 | 749 |    :noindex: | 
 | 750 |  | 
 | 751 |    Some unspecified syntax error was encountered. | 
 | 752 |  | 
 | 753 |  | 
 | 754 | .. data:: XML_ERROR_TAG_MISMATCH | 
 | 755 |    :noindex: | 
 | 756 |  | 
 | 757 |    An end tag did not match the innermost open start tag. | 
 | 758 |  | 
 | 759 |  | 
 | 760 | .. data:: XML_ERROR_UNCLOSED_TOKEN | 
 | 761 |    :noindex: | 
 | 762 |  | 
 | 763 |    Some token (such as a start tag) was not closed before the end of the stream or | 
 | 764 |    the next token was encountered. | 
 | 765 |  | 
 | 766 |  | 
 | 767 | .. data:: XML_ERROR_UNDEFINED_ENTITY | 
 | 768 |    :noindex: | 
 | 769 |  | 
 | 770 |    A reference was made to a entity which was not defined. | 
 | 771 |  | 
 | 772 |  | 
 | 773 | .. data:: XML_ERROR_UNKNOWN_ENCODING | 
 | 774 |    :noindex: | 
 | 775 |  | 
 | 776 |    The document encoding is not supported by Expat. | 
 | 777 |  | 
 | 778 |  | 
 | 779 | .. data:: XML_ERROR_UNCLOSED_CDATA_SECTION | 
 | 780 |    :noindex: | 
 | 781 |  | 
 | 782 |    A CDATA marked section was not closed. | 
 | 783 |  | 
 | 784 |  | 
 | 785 | .. data:: XML_ERROR_EXTERNAL_ENTITY_HANDLING | 
 | 786 |    :noindex: | 
 | 787 |  | 
 | 788 |  | 
 | 789 | .. data:: XML_ERROR_NOT_STANDALONE | 
 | 790 |    :noindex: | 
 | 791 |  | 
 | 792 |    The parser determined that the document was not "standalone" though it declared | 
 | 793 |    itself to be in the XML declaration, and the :attr:`NotStandaloneHandler` was | 
 | 794 |    set and returned ``0``. | 
 | 795 |  | 
 | 796 |  | 
 | 797 | .. data:: XML_ERROR_UNEXPECTED_STATE | 
 | 798 |    :noindex: | 
 | 799 |  | 
 | 800 |  | 
 | 801 | .. data:: XML_ERROR_ENTITY_DECLARED_IN_PE | 
 | 802 |    :noindex: | 
 | 803 |  | 
 | 804 |  | 
 | 805 | .. data:: XML_ERROR_FEATURE_REQUIRES_XML_DTD | 
 | 806 |    :noindex: | 
 | 807 |  | 
 | 808 |    An operation was requested that requires DTD support to be compiled in, but | 
 | 809 |    Expat was configured without DTD support.  This should never be reported by a | 
 | 810 |    standard build of the :mod:`xml.parsers.expat` module. | 
 | 811 |  | 
 | 812 |  | 
 | 813 | .. data:: XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING | 
 | 814 |    :noindex: | 
 | 815 |  | 
 | 816 |    A behavioral change was requested after parsing started that can only be changed | 
 | 817 |    before parsing has started.  This is (currently) only raised by | 
 | 818 |    :meth:`UseForeignDTD`. | 
 | 819 |  | 
 | 820 |  | 
 | 821 | .. data:: XML_ERROR_UNBOUND_PREFIX | 
 | 822 |    :noindex: | 
 | 823 |  | 
 | 824 |    An undeclared prefix was found when namespace processing was enabled. | 
 | 825 |  | 
 | 826 |  | 
 | 827 | .. data:: XML_ERROR_UNDECLARING_PREFIX | 
 | 828 |    :noindex: | 
 | 829 |  | 
 | 830 |    The document attempted to remove the namespace declaration associated with a | 
 | 831 |    prefix. | 
 | 832 |  | 
 | 833 |  | 
 | 834 | .. data:: XML_ERROR_INCOMPLETE_PE | 
 | 835 |    :noindex: | 
 | 836 |  | 
 | 837 |    A parameter entity contained incomplete markup. | 
 | 838 |  | 
 | 839 |  | 
 | 840 | .. data:: XML_ERROR_XML_DECL | 
 | 841 |    :noindex: | 
 | 842 |  | 
 | 843 |    The document contained no document element at all. | 
 | 844 |  | 
 | 845 |  | 
 | 846 | .. data:: XML_ERROR_TEXT_DECL | 
 | 847 |    :noindex: | 
 | 848 |  | 
 | 849 |    There was an error parsing a text declaration in an external entity. | 
 | 850 |  | 
 | 851 |  | 
 | 852 | .. data:: XML_ERROR_PUBLICID | 
 | 853 |    :noindex: | 
 | 854 |  | 
 | 855 |    Characters were found in the public id that are not allowed. | 
 | 856 |  | 
 | 857 |  | 
 | 858 | .. data:: XML_ERROR_SUSPENDED | 
 | 859 |    :noindex: | 
 | 860 |  | 
 | 861 |    The requested operation was made on a suspended parser, but isn't allowed.  This | 
 | 862 |    includes attempts to provide additional input or to stop the parser. | 
 | 863 |  | 
 | 864 |  | 
 | 865 | .. data:: XML_ERROR_NOT_SUSPENDED | 
 | 866 |    :noindex: | 
 | 867 |  | 
 | 868 |    An attempt to resume the parser was made when the parser had not been suspended. | 
 | 869 |  | 
 | 870 |  | 
 | 871 | .. data:: XML_ERROR_ABORTED | 
 | 872 |    :noindex: | 
 | 873 |  | 
 | 874 |    This should not be reported to Python applications. | 
 | 875 |  | 
 | 876 |  | 
 | 877 | .. data:: XML_ERROR_FINISHED | 
 | 878 |    :noindex: | 
 | 879 |  | 
 | 880 |    The requested operation was made on a parser which was finished parsing input, | 
 | 881 |    but isn't allowed.  This includes attempts to provide additional input or to | 
 | 882 |    stop the parser. | 
 | 883 |  | 
 | 884 |  | 
 | 885 | .. data:: XML_ERROR_SUSPEND_PE | 
 | 886 |    :noindex: | 
 | 887 |  | 
| Mark Summerfield | 43da35d | 2008-03-17 08:28:15 +0000 | [diff] [blame] | 888 |  | 
 | 889 | .. rubric:: Footnotes | 
 | 890 |  | 
 | 891 | .. [#] The encoding string included in XML output should conform to the | 
 | 892 |    appropriate standards. For example, "UTF-8" is valid, but "UTF8" is | 
 | 893 |    not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl | 
 | 894 |    and http://www.iana.org/assignments/character-sets . | 
 | 895 |  |