blob: 356e4b74673cf08f3cdb47de386d6f8cbe85d2bb [file] [log] [blame]
Fred Drake7fbc85c2000-09-23 04:47:56 +00001\section{\module{xml.parsers.expat} ---
Fred Drakeefffe8e2000-10-29 05:10:30 +00002 Fast XML parsing using Expat}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +00003
Fred Drake5ed1dac2001-02-08 15:40:33 +00004% Markup notes:
5%
6% Many of the attributes of the XMLParser objects are callbacks.
7% Since signature information must be presented, these are described
8% using the methoddesc environment. Since they are attributes which
9% are set by client code, in-text references to these attributes
10% should be marked using the \member macro and should not include the
11% parentheses used when marking functions and methods.
12
Fred Drake7fbc85c2000-09-23 04:47:56 +000013\declaremodule{standard}{xml.parsers.expat}
14\modulesynopsis{An interface to the Expat non-validating XML parser.}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000015\moduleauthor{Paul Prescod}{paul@prescod.net}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000016
Fred Drake7fbc85c2000-09-23 04:47:56 +000017\versionadded{2.0}
18
Fred Drakeefffe8e2000-10-29 05:10:30 +000019The \module{xml.parsers.expat} module is a Python interface to the
20Expat\index{Expat} non-validating XML parser.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000021The module provides a single extension type, \class{xmlparser}, that
22represents the current state of an XML parser. After an
23\class{xmlparser} object has been created, various attributes of the object
24can be set to handler functions. When an XML document is then fed to
25the parser, the handler functions are called for the character data
26and markup in the XML document.
Fred Drake7fbc85c2000-09-23 04:47:56 +000027
28This module uses the \module{pyexpat}\refbimodindex{pyexpat} module to
29provide access to the Expat parser. Direct use of the
30\module{pyexpat} module is deprecated.
Fred Drakeefffe8e2000-10-29 05:10:30 +000031
32This module provides one exception and one type object:
33
Fred Drake1d8ad2b2001-02-14 18:54:32 +000034\begin{excdesc}{ExpatError}
Fred Drakee0af35e2001-09-20 20:43:28 +000035 The exception raised when Expat reports an error. See section
36 \ref{expaterror-objects}, ``ExpatError Exceptions,'' for more
37 information on interpreting Expat errors.
Fred Drakeefffe8e2000-10-29 05:10:30 +000038\end{excdesc}
39
Fred Drake1d8ad2b2001-02-14 18:54:32 +000040\begin{excdesc}{error}
41 Alias for \exception{ExpatError}.
42\end{excdesc}
43
Fred Drakeefffe8e2000-10-29 05:10:30 +000044\begin{datadesc}{XMLParserType}
45 The type of the return values from the \function{ParserCreate()}
46 function.
47\end{datadesc}
48
49
Fred Drake7fbc85c2000-09-23 04:47:56 +000050The \module{xml.parsers.expat} module contains two functions:
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000051
52\begin{funcdesc}{ErrorString}{errno}
53Returns an explanatory string for a given error number \var{errno}.
54\end{funcdesc}
55
Fred Drakeefffe8e2000-10-29 05:10:30 +000056\begin{funcdesc}{ParserCreate}{\optional{encoding\optional{,
57 namespace_separator}}}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000058Creates and returns a new \class{xmlparser} object.
59\var{encoding}, if specified, must be a string naming the encoding
60used by the XML data. Expat doesn't support as many encodings as
61Python does, and its repertoire of encodings can't be extended; it
Fred Drake5ed1dac2001-02-08 15:40:33 +000062supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
63\var{encoding} is given it will override the implicit or explicit
64encoding of the document.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000065
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000066Expat can optionally do XML namespace processing for you, enabled by
Fred Drakeefffe8e2000-10-29 05:10:30 +000067providing a value for \var{namespace_separator}. The value must be a
68one-character string; a \exception{ValueError} will be raised if the
69string has an illegal length (\code{None} is considered the same as
70omission). When namespace processing is enabled, element type names
71and attribute names that belong to a namespace will be expanded. The
72element name passed to the element handlers
Fred Drake5ed1dac2001-02-08 15:40:33 +000073\member{StartElementHandler} and \member{EndElementHandler}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000074will be the concatenation of the namespace URI, the namespace
75separator character, and the local part of the name. If the namespace
Fred Drakeefffe8e2000-10-29 05:10:30 +000076separator is a zero byte (\code{chr(0)}) then the namespace URI and
Fred Drake5ed1dac2001-02-08 15:40:33 +000077the local part will be concatenated without any separator.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000078
Fred Drake2fef3ab2000-11-28 06:38:22 +000079For example, if \var{namespace_separator} is set to a space character
80(\character{ }) and the following document is parsed:
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000081
82\begin{verbatim}
83<?xml version="1.0"?>
84<root xmlns = "http://default-namespace.org/"
85 xmlns:py = "http://www.python.org/ns/">
86 <py:elem1 />
87 <elem2 xmlns="" />
88</root>
89\end{verbatim}
90
Fred Drake5ed1dac2001-02-08 15:40:33 +000091\member{StartElementHandler} will receive the following strings
Fred Draked79c33a2000-09-25 14:14:30 +000092for each element:
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000093
94\begin{verbatim}
95http://default-namespace.org/ root
96http://www.python.org/ns/ elem1
97elem2
98\end{verbatim}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000099\end{funcdesc}
100
Fred Drakef08cbb12000-12-23 22:19:05 +0000101
102\subsection{XMLParser Objects \label{xmlparser-objects}}
103
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000104\class{xmlparser} objects have the following methods:
105
Fred Drake2fef3ab2000-11-28 06:38:22 +0000106\begin{methoddesc}[xmlparser]{Parse}{data\optional{, isfinal}}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000107Parses the contents of the string \var{data}, calling the appropriate
108handler functions to process the parsed data. \var{isfinal} must be
Fred Drakef08cbb12000-12-23 22:19:05 +0000109true on the final call to this method. \var{data} can be the empty
Fred Drakec05cbb02000-07-05 02:03:34 +0000110string at any time.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000111\end{methoddesc}
112
Fred Drakeefffe8e2000-10-29 05:10:30 +0000113\begin{methoddesc}[xmlparser]{ParseFile}{file}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000114Parse XML data reading from the object \var{file}. \var{file} only
115needs to provide the \method{read(\var{nbytes})} method, returning the
116empty string when there's no more data.
117\end{methoddesc}
118
Fred Drakeefffe8e2000-10-29 05:10:30 +0000119\begin{methoddesc}[xmlparser]{SetBase}{base}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000120Sets the base to be used for resolving relative URIs in system
121identifiers in declarations. Resolving relative identifiers is left
122to the application: this value will be passed through as the
123\var{base} argument to the \function{ExternalEntityRefHandler},
124\function{NotationDeclHandler}, and
125\function{UnparsedEntityDeclHandler} functions.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000126\end{methoddesc}
127
Fred Drakeefffe8e2000-10-29 05:10:30 +0000128\begin{methoddesc}[xmlparser]{GetBase}{}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000129Returns a string containing the base set by a previous call to
130\method{SetBase()}, or \code{None} if
131\method{SetBase()} hasn't been called.
132\end{methoddesc}
133
Fred Drake1d8ad2b2001-02-14 18:54:32 +0000134\begin{methoddesc}[xmlparser]{GetInputContext}{}
135Returns the input data that generated the current event as a string.
136The data is in the encoding of the entity which contains the text.
137When called while an event handler is not active, the return value is
138\code{None}.
139\versionadded{2.1}
140\end{methoddesc}
141
Fred Drakef08cbb12000-12-23 22:19:05 +0000142\begin{methoddesc}[xmlparser]{ExternalEntityParserCreate}{context\optional{,
143 encoding}}
144Create a ``child'' parser which can be used to parse an external
145parsed entity referred to by content parsed by the parent parser. The
Fred Drakeb162d182001-01-04 05:48:08 +0000146\var{context} parameter should be the string passed to the
Fred Drakef08cbb12000-12-23 22:19:05 +0000147\method{ExternalEntityRefHandler()} handler function, described below.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000148The child parser is created with the \member{ordered_attributes},
149\member{returns_unicode} and \member{specified_attributes} set to the
150values of this parser.
Fred Drakef08cbb12000-12-23 22:19:05 +0000151\end{methoddesc}
152
Fred Drakeefffe8e2000-10-29 05:10:30 +0000153
Fred Draked79c33a2000-09-25 14:14:30 +0000154\class{xmlparser} objects have the following attributes:
Andrew M. Kuchling0690c862000-08-17 23:15:21 +0000155
Fred Drake5ed1dac2001-02-08 15:40:33 +0000156\begin{memberdesc}[xmlparser]{ordered_attributes}
157Setting this attribute to a non-zero integer causes the attributes to
158be reported as a list rather than a dictionary. The attributes are
159presented in the order found in the document text. For each
160attribute, two list entries are presented: the attribute name and the
161attribute value. (Older versions of this module also used this
162format.) By default, this attribute is false; it may be changed at
163any time.
164\versionadded{2.1}
165\end{memberdesc}
166
Fred Drakeefffe8e2000-10-29 05:10:30 +0000167\begin{memberdesc}[xmlparser]{returns_unicode}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000168If this attribute is set to a non-zero integer, the handler functions
169will be passed Unicode strings. If \member{returns_unicode} is 0,
1708-bit strings containing UTF-8 encoded data will be passed to the
171handlers.
Fred Drakeb62966c2000-12-07 00:00:21 +0000172\versionchanged[Can be changed at any time to affect the result
Fred Drakee0af35e2001-09-20 20:43:28 +0000173 type]{1.6}
Fred Drakeefffe8e2000-10-29 05:10:30 +0000174\end{memberdesc}
Andrew M. Kuchling0690c862000-08-17 23:15:21 +0000175
Fred Drake5ed1dac2001-02-08 15:40:33 +0000176\begin{memberdesc}[xmlparser]{specified_attributes}
177If set to a non-zero integer, the parser will report only those
178attributes which were specified in the document instance and not those
179which were derived from attribute declarations. Applications which
180set this need to be especially careful to use what additional
181information is available from the declarations as needed to comply
182with the standards for the behavior of XML processors. By default,
183this attribute is false; it may be changed at any time.
184\versionadded{2.1}
185\end{memberdesc}
186
Andrew M. Kuchling0690c862000-08-17 23:15:21 +0000187The following attributes contain values relating to the most recent
188error encountered by an \class{xmlparser} object, and will only have
189correct values once a call to \method{Parse()} or \method{ParseFile()}
Fred Drake523ec572001-02-15 05:37:51 +0000190has raised a \exception{xml.parsers.expat.ExpatError} exception.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000191
Fred Drakeefffe8e2000-10-29 05:10:30 +0000192\begin{memberdesc}[xmlparser]{ErrorByteIndex}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000193Byte index at which an error occurred.
Fred Drakeefffe8e2000-10-29 05:10:30 +0000194\end{memberdesc}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000195
Fred Drakeefffe8e2000-10-29 05:10:30 +0000196\begin{memberdesc}[xmlparser]{ErrorCode}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000197Numeric code specifying the problem. This value can be passed to the
198\function{ErrorString()} function, or compared to one of the constants
Fred Drake523ec572001-02-15 05:37:51 +0000199defined in the \code{errors} object.
Fred Drakeefffe8e2000-10-29 05:10:30 +0000200\end{memberdesc}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000201
Fred Drakeefffe8e2000-10-29 05:10:30 +0000202\begin{memberdesc}[xmlparser]{ErrorColumnNumber}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000203Column number at which an error occurred.
Fred Drakeefffe8e2000-10-29 05:10:30 +0000204\end{memberdesc}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000205
Fred Drakeefffe8e2000-10-29 05:10:30 +0000206\begin{memberdesc}[xmlparser]{ErrorLineNumber}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000207Line number at which an error occurred.
Fred Drakeefffe8e2000-10-29 05:10:30 +0000208\end{memberdesc}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000209
210Here is the list of handlers that can be set. To set a handler on an
Fred Drakec05cbb02000-07-05 02:03:34 +0000211\class{xmlparser} object \var{o}, use
212\code{\var{o}.\var{handlername} = \var{func}}. \var{handlername} must
213be taken from the following list, and \var{func} must be a callable
214object accepting the correct number of arguments. The arguments are
215all strings, unless otherwise stated.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000216
Fred Drake5ed1dac2001-02-08 15:40:33 +0000217\begin{methoddesc}[xmlparser]{XmlDeclHandler}{version, encoding, standalone}
218Called when the XML declaration is parsed. The XML declaration is the
219(optional) declaration of the applicable version of the XML
220recommendation, the encoding of the document text, and an optional
221``standalone'' declaration. \var{version} and \var{encoding} will be
222strings of the type dictated by the \member{returns_unicode}
223attribute, and \var{standalone} will be \code{1} if the document is
224declared standalone, \code{0} if it is declared not to be standalone,
225or \code{-1} if the standalone clause was omitted.
226This is only available with Expat version 1.95.0 or newer.
227\versionadded{2.1}
228\end{methoddesc}
229
230\begin{methoddesc}[xmlparser]{StartDoctypeDeclHandler}{doctypeName,
231 systemId, publicId,
232 has_internal_subset}
233Called when Expat begins parsing the document type declaration
234(\code{<!DOCTYPE \ldots}). The \var{doctypeName} is provided exactly
235as presented. The \var{systemId} and \var{publicId} parameters give
236the system and public identifiers if specified, or \code{None} if
237omitted. \var{has_internal_subset} will be true if the document
238contains and internal document declaration subset.
239This requires Expat version 1.2 or newer.
240\end{methoddesc}
241
242\begin{methoddesc}[xmlparser]{EndDoctypeDeclHandler}{}
243Called when Expat is done parsing the document type delaration.
244This requires Expat version 1.2 or newer.
245\end{methoddesc}
246
247\begin{methoddesc}[xmlparser]{ElementDeclHandler}{name, model}
248Called once for each element type declaration. \var{name} is the name
249of the element type, and \var{model} is a representation of the
250content model.
251\end{methoddesc}
252
253\begin{methoddesc}[xmlparser]{AttlistDeclHandler}{elname, attname,
254 type, default, required}
255Called for each declared attribute for an element type. If an
256attribute list declaration declares three attributes, this handler is
257called three times, once for each attribute. \var{elname} is the name
258of the element to which the declaration applies and \var{attname} is
259the name of the attribute declared. The attribute type is a string
260passed as \var{type}; the possible values are \code{'CDATA'},
261\code{'ID'}, \code{'IDREF'}, ...
262\var{default} gives the default value for the attribute used when the
263attribute is not specified by the document instance, or \code{None} if
264there is no default value (\code{\#IMPLIED} values). If the attribute
265is required to be given in the document instance, \var{required} will
266be true.
267This requires Expat version 1.95.0 or newer.
268\end{methoddesc}
269
Fred Drakeefffe8e2000-10-29 05:10:30 +0000270\begin{methoddesc}[xmlparser]{StartElementHandler}{name, attributes}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000271Called for the start of every element. \var{name} is a string
272containing the element name, and \var{attributes} is a dictionary
273mapping attribute names to their values.
274\end{methoddesc}
275
Fred Drakeefffe8e2000-10-29 05:10:30 +0000276\begin{methoddesc}[xmlparser]{EndElementHandler}{name}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000277Called for the end of every element.
278\end{methoddesc}
279
Fred Drakeefffe8e2000-10-29 05:10:30 +0000280\begin{methoddesc}[xmlparser]{ProcessingInstructionHandler}{target, data}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000281Called for every processing instruction.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000282\end{methoddesc}
283
Fred Drakeefffe8e2000-10-29 05:10:30 +0000284\begin{methoddesc}[xmlparser]{CharacterDataHandler}{data}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000285Called for character data. This will be called for normal character
286data, CDATA marked content, and ignorable whitespace. Applications
287which must distinguish these cases can use the
288\member{StartCdataSectionHandler}, \member{EndCdataSectionHandler},
289and \member{ElementDeclHandler} callbacks to collect the required
290information.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000291\end{methoddesc}
292
Fred Drakeefffe8e2000-10-29 05:10:30 +0000293\begin{methoddesc}[xmlparser]{UnparsedEntityDeclHandler}{entityName, base,
294 systemId, publicId,
295 notationName}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000296Called for unparsed (NDATA) entity declarations. This is only present
297for version 1.2 of the Expat library; for more recent versions, use
298\member{EntityDeclHandler} instead. (The underlying function in the
299Expat library has been declared obsolete.)
300\end{methoddesc}
301
302\begin{methoddesc}[xmlparser]{EntityDeclHandler}{entityName,
303 is_parameter_entity, value,
304 base, systemId,
305 publicId,
306 notationName}
307Called for all entity declarations. For parameter and internal
308entities, \var{value} will be a string giving the declared contents
309of the entity; this will be \code{None} for external entities. The
310\var{notationName} parameter will be \code{None} for parsed entities,
311and the name of the notation for unparsed entities.
312\var{is_parameter_entity} will be true if the entity is a paremeter
313entity or false for general entities (most applications only need to
314be concerned with general entities).
315This is only available starting with version 1.95.0 of the Expat
316library.
317\versionadded{2.1}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000318\end{methoddesc}
319
Fred Drakeefffe8e2000-10-29 05:10:30 +0000320\begin{methoddesc}[xmlparser]{NotationDeclHandler}{notationName, base,
321 systemId, publicId}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000322Called for notation declarations. \var{notationName}, \var{base}, and
323\var{systemId}, and \var{publicId} are strings if given. If the
324public identifier is omitted, \var{publicId} will be \code{None}.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000325\end{methoddesc}
326
Fred Drakeefffe8e2000-10-29 05:10:30 +0000327\begin{methoddesc}[xmlparser]{StartNamespaceDeclHandler}{prefix, uri}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000328Called when an element contains a namespace declaration. Namespace
329declarations are processed before the \member{StartElementHandler} is
330called for the element on which declarations are placed.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000331\end{methoddesc}
332
Fred Drakeefffe8e2000-10-29 05:10:30 +0000333\begin{methoddesc}[xmlparser]{EndNamespaceDeclHandler}{prefix}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000334Called when the closing tag is reached for an element
Fred Drake5ed1dac2001-02-08 15:40:33 +0000335that contained a namespace declaration. This is called once for each
336namespace declaration on the element in the reverse of the order for
337which the \member{StartNamespaceDeclHandler} was called to indicate
338the start of each namespace declaration's scope. Calls to this
339handler are made after the corresponding \member{EndElementHandler}
340for the end of the element.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000341\end{methoddesc}
342
Fred Drakeefffe8e2000-10-29 05:10:30 +0000343\begin{methoddesc}[xmlparser]{CommentHandler}{data}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000344Called for comments. \var{data} is the text of the comment, excluding
Fred Drake523ec572001-02-15 05:37:51 +0000345the leading `\code{<!-}\code{-}' and trailing `\code{-}\code{->}'.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000346\end{methoddesc}
347
Fred Drakeefffe8e2000-10-29 05:10:30 +0000348\begin{methoddesc}[xmlparser]{StartCdataSectionHandler}{}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000349Called at the start of a CDATA section. This and
350\member{StartCdataSectionHandler} are needed to be able to identify
351the syntactical start and end for CDATA sections.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000352\end{methoddesc}
353
Fred Drakeefffe8e2000-10-29 05:10:30 +0000354\begin{methoddesc}[xmlparser]{EndCdataSectionHandler}{}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000355Called at the end of a CDATA section.
356\end{methoddesc}
357
Fred Drakeefffe8e2000-10-29 05:10:30 +0000358\begin{methoddesc}[xmlparser]{DefaultHandler}{data}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000359Called for any characters in the XML document for
360which no applicable handler has been specified. This means
361characters that are part of a construct which could be reported, but
362for which no handler has been supplied.
363\end{methoddesc}
364
Fred Drakeefffe8e2000-10-29 05:10:30 +0000365\begin{methoddesc}[xmlparser]{DefaultHandlerExpand}{data}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000366This is the same as the \function{DefaultHandler},
367but doesn't inhibit expansion of internal entities.
368The entity reference will not be passed to the default handler.
369\end{methoddesc}
370
Fred Drake5ed1dac2001-02-08 15:40:33 +0000371\begin{methoddesc}[xmlparser]{NotStandaloneHandler}{} Called if the
372XML document hasn't been declared as being a standalone document.
373This happens when there is an external subset or a reference to a
374parameter entity, but the XML declaration does not set standalone to
375\code{yes} in an XML declaration. If this handler returns \code{0},
376then the parser will throw an \constant{XML_ERROR_NOT_STANDALONE}
377error. If this handler is not set, no exception is raised by the
378parser for this condition.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000379\end{methoddesc}
380
Fred Drakeefffe8e2000-10-29 05:10:30 +0000381\begin{methoddesc}[xmlparser]{ExternalEntityRefHandler}{context, base,
382 systemId, publicId}
Fred Drake5ed1dac2001-02-08 15:40:33 +0000383Called for references to external entities. \var{base} is the current
384base, as set by a previous call to \method{SetBase()}. The public and
385system identifiers, \var{systemId} and \var{publicId}, are strings if
386given; if the public identifier is not given, \var{publicId} will be
Fred Drake523ec572001-02-15 05:37:51 +0000387\code{None}. The \var{context} value is opaque and should only be
388used as described below.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000389
390For external entities to be parsed, this handler must be implemented.
391It is responsible for creating the sub-parser using
Fred Drake523ec572001-02-15 05:37:51 +0000392\code{ExternalEntityParserCreate(\var{context})}, initializing it with
393the appropriate callbacks, and parsing the entity. This handler
394should return an integer; if it returns \code{0}, the parser will
395throw an \constant{XML_ERROR_EXTERNAL_ENTITY_HANDLING} error,
396otherwise parsing will continue.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000397
398If this handler is not provided, external entities are reported by the
399\member{DefaultHandler} callback, if provided.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000400\end{methoddesc}
401
402
Fred Drake1d8ad2b2001-02-14 18:54:32 +0000403\subsection{ExpatError Exceptions \label{expaterror-objects}}
404\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
405
406\exception{ExpatError} exceptions have a number of interesting
407attributes:
408
409\begin{memberdesc}[ExpatError]{code}
410 Expat's internal error number for the specific error. This will
411 match one of the constants defined in the \code{errors} object from
412 this module.
413 \versionadded{2.1}
414\end{memberdesc}
415
416\begin{memberdesc}[ExpatError]{lineno}
417 Line number on which the error was detected. The first line is
418 numbered \code{1}.
419 \versionadded{2.1}
420\end{memberdesc}
421
422\begin{memberdesc}[ExpatError]{offset}
423 Character offset into the line where the error occurred. The first
424 column is numbered \code{0}.
425 \versionadded{2.1}
426\end{memberdesc}
427
428
Fred Drake7fbc85c2000-09-23 04:47:56 +0000429\subsection{Example \label{expat-example}}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000430
Fred Drakec05cbb02000-07-05 02:03:34 +0000431The following program defines three handlers that just print out their
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000432arguments.
433
434\begin{verbatim}
Fred Drake7fbc85c2000-09-23 04:47:56 +0000435import xml.parsers.expat
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000436
437# 3 handler functions
438def start_element(name, attrs):
439 print 'Start element:', name, attrs
440def end_element(name):
441 print 'End element:', name
442def char_data(data):
443 print 'Character data:', repr(data)
444
Fred Drake7fbc85c2000-09-23 04:47:56 +0000445p = xml.parsers.expat.ParserCreate()
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000446
447p.StartElementHandler = start_element
Fred Drake7fbc85c2000-09-23 04:47:56 +0000448p.EndElementHandler = end_element
449p.CharacterDataHandler = char_data
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000450
451p.Parse("""<?xml version="1.0"?>
452<parent id="top"><child1 name="paul">Text goes here</child1>
453<child2 name="fred">More text</child2>
454</parent>""")
455\end{verbatim}
456
457The output from this program is:
458
459\begin{verbatim}
460Start element: parent {'id': 'top'}
461Start element: child1 {'name': 'paul'}
462Character data: 'Text goes here'
463End element: child1
Ka-Ping Yeefa004ad2001-01-24 17:19:08 +0000464Character data: '\n'
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000465Start element: child2 {'name': 'fred'}
466Character data: 'More text'
467End element: child2
Ka-Ping Yeefa004ad2001-01-24 17:19:08 +0000468Character data: '\n'
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000469End element: parent
470\end{verbatim}
Fred Drakec05cbb02000-07-05 02:03:34 +0000471
472
Fred Drake5ed1dac2001-02-08 15:40:33 +0000473\subsection{Content Model Descriptions \label{expat-content-models}}
474\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
475
476Content modules are described using nested tuples. Each tuple
477contains four values: the type, the quantifier, the name, and a tuple
478of children. Children are simply additional content module
479descriptions.
480
481The values of the first two fields are constants defined in the
482\code{model} object of the \module{xml.parsers.expat} module. These
483constants can be collected in two groups: the model type group and the
484quantifier group.
485
486The constants in the model type group are:
487
488\begin{datadescni}{XML_CTYPE_ANY}
489The element named by the model name was declared to have a content
490model of \code{ANY}.
491\end{datadescni}
492
493\begin{datadescni}{XML_CTYPE_CHOICE}
494The named element allows a choice from a number of options; this is
495used for content models such as \code{(A | B | C)}.
496\end{datadescni}
497
498\begin{datadescni}{XML_CTYPE_EMPTY}
499Elements which are declared to be \code{EMPTY} have this model type.
500\end{datadescni}
501
502\begin{datadescni}{XML_CTYPE_MIXED}
503\end{datadescni}
504
505\begin{datadescni}{XML_CTYPE_NAME}
506\end{datadescni}
507
508\begin{datadescni}{XML_CTYPE_SEQ}
509Models which represent a series of models which follow one after the
510other are indicated with this model type. This is used for models
511such as \code{(A, B, C)}.
512\end{datadescni}
513
514
515The constants in the quantifier group are:
516
517\begin{datadescni}{XML_CQUANT_NONE}
Fred Drakee0af35e2001-09-20 20:43:28 +0000518No modifier is given, so it can appear exactly once, as for \code{A}.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000519\end{datadescni}
520
521\begin{datadescni}{XML_CQUANT_OPT}
Fred Drakee0af35e2001-09-20 20:43:28 +0000522The model is optional: it can appear once or not at all, as for
Fred Drake5ed1dac2001-02-08 15:40:33 +0000523\code{A?}.
524\end{datadescni}
525
526\begin{datadescni}{XML_CQUANT_PLUS}
Fred Drakee0af35e2001-09-20 20:43:28 +0000527The model must occur one or more times (like \code{A+}).
Fred Drake5ed1dac2001-02-08 15:40:33 +0000528\end{datadescni}
529
530\begin{datadescni}{XML_CQUANT_REP}
531The model must occur zero or more times, as for \code{A*}.
532\end{datadescni}
533
534
Fred Drake7fbc85c2000-09-23 04:47:56 +0000535\subsection{Expat error constants \label{expat-errors}}
Fred Drakec05cbb02000-07-05 02:03:34 +0000536
Fred Drake1d8ad2b2001-02-14 18:54:32 +0000537The following constants are provided in the \code{errors} object of
538the \refmodule{xml.parsers.expat} module. These constants are useful
539in interpreting some of the attributes of the \exception{ExpatError}
540exception objects raised when an error has occurred.
Fred Drakec05cbb02000-07-05 02:03:34 +0000541
Fred Drake7fbc85c2000-09-23 04:47:56 +0000542The \code{errors} object has the following attributes:
Fred Drakec05cbb02000-07-05 02:03:34 +0000543
Fred Drake5ed1dac2001-02-08 15:40:33 +0000544\begin{datadescni}{XML_ERROR_ASYNC_ENTITY}
545\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000546
Fred Drake5ed1dac2001-02-08 15:40:33 +0000547\begin{datadescni}{XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF}
548An entity reference in an attribute value referred to an external
549entity instead of an internal entity.
550\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000551
Fred Drake5ed1dac2001-02-08 15:40:33 +0000552\begin{datadescni}{XML_ERROR_BAD_CHAR_REF}
Fred Drakee0af35e2001-09-20 20:43:28 +0000553A character reference referred to a character which is illegal in XML
554(for example, character \code{0}, or `\code{\&\#0;}'.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000555\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000556
Fred Drake5ed1dac2001-02-08 15:40:33 +0000557\begin{datadescni}{XML_ERROR_BINARY_ENTITY_REF}
Fred Drakee0af35e2001-09-20 20:43:28 +0000558An entity reference referred to an entity which was declared with a
559notation, so cannot be parsed.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000560\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000561
Fred Drake5ed1dac2001-02-08 15:40:33 +0000562\begin{datadescni}{XML_ERROR_DUPLICATE_ATTRIBUTE}
Fred Drakeacab3d62000-07-11 16:30:30 +0000563An attribute was used more than once in a start tag.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000564\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000565
Fred Drake5ed1dac2001-02-08 15:40:33 +0000566\begin{datadescni}{XML_ERROR_INCORRECT_ENCODING}
567\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000568
Fred Drake5ed1dac2001-02-08 15:40:33 +0000569\begin{datadescni}{XML_ERROR_INVALID_TOKEN}
Fred Drakee0af35e2001-09-20 20:43:28 +0000570Raised when an input byte could not properly be assigned to a
571character; for example, a NUL byte (value \code{0}) in a UTF-8 input
572stream.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000573\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000574
Fred Drake5ed1dac2001-02-08 15:40:33 +0000575\begin{datadescni}{XML_ERROR_JUNK_AFTER_DOC_ELEMENT}
Fred Drakeacab3d62000-07-11 16:30:30 +0000576Something other than whitespace occurred after the document element.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000577\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000578
Fred Drake5ed1dac2001-02-08 15:40:33 +0000579\begin{datadescni}{XML_ERROR_MISPLACED_XML_PI}
Fred Drakee0af35e2001-09-20 20:43:28 +0000580An XML declaration was found somewhere other than the start of the
581input data.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000582\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000583
Fred Drake5ed1dac2001-02-08 15:40:33 +0000584\begin{datadescni}{XML_ERROR_NO_ELEMENTS}
Fred Drakee0af35e2001-09-20 20:43:28 +0000585The document contains no elements (XML requires all documents to
586contain exactly one top-level element)..
Fred Drake5ed1dac2001-02-08 15:40:33 +0000587\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000588
Fred Drake5ed1dac2001-02-08 15:40:33 +0000589\begin{datadescni}{XML_ERROR_NO_MEMORY}
Fred Drakeacab3d62000-07-11 16:30:30 +0000590Expat was not able to allocate memory internally.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000591\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000592
Fred Drake5ed1dac2001-02-08 15:40:33 +0000593\begin{datadescni}{XML_ERROR_PARAM_ENTITY_REF}
Fred Drakee0af35e2001-09-20 20:43:28 +0000594A parameter entity reference was found where it was not allowed.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000595\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000596
Fred Drake5ed1dac2001-02-08 15:40:33 +0000597\begin{datadescni}{XML_ERROR_PARTIAL_CHAR}
Fred Drakee0af35e2001-09-20 20:43:28 +0000598
Fred Drake5ed1dac2001-02-08 15:40:33 +0000599\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000600
Fred Drake5ed1dac2001-02-08 15:40:33 +0000601\begin{datadescni}{XML_ERROR_RECURSIVE_ENTITY_REF}
Fred Drakee0af35e2001-09-20 20:43:28 +0000602An entity reference contained another reference to the same entity;
603possibly via a different name, and possibly indirectly.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000604\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000605
Fred Drake5ed1dac2001-02-08 15:40:33 +0000606\begin{datadescni}{XML_ERROR_SYNTAX}
Fred Drakeacab3d62000-07-11 16:30:30 +0000607Some unspecified syntax error was encountered.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000608\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000609
Fred Drake5ed1dac2001-02-08 15:40:33 +0000610\begin{datadescni}{XML_ERROR_TAG_MISMATCH}
Fred Drakeacab3d62000-07-11 16:30:30 +0000611An end tag did not match the innermost open start tag.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000612\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000613
Fred Drake5ed1dac2001-02-08 15:40:33 +0000614\begin{datadescni}{XML_ERROR_UNCLOSED_TOKEN}
Fred Drakee0af35e2001-09-20 20:43:28 +0000615Some token (such as a start tag) was not closed before the end of the
616stream or the next token was encountered.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000617\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000618
Fred Drake5ed1dac2001-02-08 15:40:33 +0000619\begin{datadescni}{XML_ERROR_UNDEFINED_ENTITY}
Fred Drakeacab3d62000-07-11 16:30:30 +0000620A reference was made to a entity which was not defined.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000621\end{datadescni}
Fred Drakeacab3d62000-07-11 16:30:30 +0000622
Fred Drake5ed1dac2001-02-08 15:40:33 +0000623\begin{datadescni}{XML_ERROR_UNKNOWN_ENCODING}
Fred Drakeacab3d62000-07-11 16:30:30 +0000624The document encoding is not supported by Expat.
Fred Drake5ed1dac2001-02-08 15:40:33 +0000625\end{datadescni}