blob: bef6486ee2a5c403c56951a363ecd078c721ce53 [file] [log] [blame]
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +00001\section{\module{pyexpat} ---
2 Fast XML parsing using the Expat C library}
3
4\declaremodule{builtin}{pyexpat}
5\modulesynopsis{An interface to the Expat XML parser.}
6\moduleauthor{Paul Prescod}{paul@prescod.net}
7\sectionauthor{A.M. Kuchling}{amk1@bigfoot.com}
8
9The \module{pyexpat} module is a Python interface to the Expat
10non-validating XML parser.
11The module provides a single extension type, \class{xmlparser}, that
12represents the current state of an XML parser. After an
13\class{xmlparser} object has been created, various attributes of the object
14can be set to handler functions. When an XML document is then fed to
15the parser, the handler functions are called for the character data
16and markup in the XML document.
17
18The \module{pyexpat} module contains two functions:
19
20\begin{funcdesc}{ErrorString}{errno}
21Returns an explanatory string for a given error number \var{errno}.
22\end{funcdesc}
23
24\begin{funcdesc}{ParserCreate}{\optional{encoding, namespace_separator}}
25Creates and returns a new \class{xmlparser} object.
26\var{encoding}, if specified, must be a string naming the encoding
27used by the XML data. Expat doesn't support as many encodings as
28Python does, and its repertoire of encodings can't be extended; it
29supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII.
30
31% XXX pyexpat.c should only allow a 1-char string for this parameter
32Expat can optionally do XML namespace processing for you, enabled by
33providing a value for \var{namespace_separator}. When namespace
34processing is enabled, element type names and attribute names that
35belong to a namespace will be expanded. The element name
36passed to the element handlers
37\function{StartElementHandler()} and \function{EndElementHandler()}
38will be the concatenation of the namespace URI, the namespace
39separator character, and the local part of the name. If the namespace
40separator is a zero byte (\code{chr(0)})
41then the namespace URI and the local part will be
42concatenated without any separator.
43
44For example, if \var{namespace_separator} is set to
45\samp{ }, and the following document is parsed:
46
47\begin{verbatim}
48<?xml version="1.0"?>
49<root xmlns = "http://default-namespace.org/"
50 xmlns:py = "http://www.python.org/ns/">
51 <py:elem1 />
52 <elem2 xmlns="" />
53</root>
54\end{verbatim}
55
56\function{StartElementHandler()} will receive the following strings for each element:
57
58\begin{verbatim}
59http://default-namespace.org/ root
60http://www.python.org/ns/ elem1
61elem2
62\end{verbatim}
63
64\end{funcdesc}
65
66\class{xmlparser} objects have the following methods:
67
68\begin{methoddesc}{Parse}{data \optional{, isfinal}}
69Parses the contents of the string \var{data}, calling the appropriate
70handler functions to process the parsed data. \var{isfinal} must be
Fred Drakec05cbb02000-07-05 02:03:34 +000071true on the final call to this method. \var{data} can be the empty
72string at any time.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +000073\end{methoddesc}
74
75\begin{methoddesc}{ParseFile}{file}
76Parse XML data reading from the object \var{file}. \var{file} only
77needs to provide the \method{read(\var{nbytes})} method, returning the
78empty string when there's no more data.
79\end{methoddesc}
80
81\begin{methoddesc}{SetBase}{base}
82Sets the base to be used for resolving relative URIs in system identifiers in
83declarations. Resolving relative identifiers is left to the application:
84this value will be passed through as the base argument to the
85\function{ExternalEntityRefHandler}, \function{NotationDeclHandler},
86and \function{UnparsedEntityDeclHandler} functions.
87\end{methoddesc}
88
89\begin{methoddesc}{GetBase}{}
90Returns a string containing the base set by a previous call to
91\method{SetBase()}, or \code{None} if
92\method{SetBase()} hasn't been called.
93\end{methoddesc}
94
95\class{xmlparser} objects have the following attributes, containing
96values relating to the most recent error encountered by an
97\class{xmlparser} object. These attributes will only have correct
98values once a call to \method{Parse()} or \method{ParseFile()}
99has raised a \exception{pyexpat.error} exception.
100
101\begin{datadesc}{ErrorByteIndex}
102Byte index at which an error occurred.
103\end{datadesc}
104
105\begin{datadesc}{ErrorCode}
106Numeric code specifying the problem. This value can be passed to the
107\function{ErrorString()} function, or compared to one of the constants
108defined in the \module{pyexpat.errors} submodule.
109\end{datadesc}
110
111\begin{datadesc}{ErrorColumnNumber}
112Column number at which an error occurred.
113\end{datadesc}
114
115\begin{datadesc}{ErrorLineNumber}
116Line number at which an error occurred.
117\end{datadesc}
118
119Here is the list of handlers that can be set. To set a handler on an
Fred Drakec05cbb02000-07-05 02:03:34 +0000120\class{xmlparser} object \var{o}, use
121\code{\var{o}.\var{handlername} = \var{func}}. \var{handlername} must
122be taken from the following list, and \var{func} must be a callable
123object accepting the correct number of arguments. The arguments are
124all strings, unless otherwise stated.
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000125
126\begin{methoddesc}{StartElementHandler}{name, attributes}
127Called for the start of every element. \var{name} is a string
128containing the element name, and \var{attributes} is a dictionary
129mapping attribute names to their values.
130\end{methoddesc}
131
132\begin{methoddesc}{EndElementHandler}{name}
133Called for the end of every element.
134\end{methoddesc}
135
136\begin{methoddesc}{ProcessingInstructionHandler}{target, data}
137Called for every processing instruction.
138\end{methoddesc}
139
140\begin{methoddesc}{CharacterDataHandler}{\var{data}}
141Called for character data.
142\end{methoddesc}
143
144\begin{methoddesc}{UnparsedEntityDeclHandler}{entityName, base, systemId, publicId, notationName}
145Called for unparsed (NDATA) entity declarations.
146\end{methoddesc}
147
148\begin{methoddesc}{NotationDeclHandler}{notationName, base, systemId, publicId}
149Called for notation declarations.
150\end{methoddesc}
151
152\begin{methoddesc}{StartNamespaceDeclHandler}{prefix, uri}
153Called when an element contains a namespace declaration.
154\end{methoddesc}
155
156\begin{methoddesc}{EndNamespaceDeclHandler}{prefix}
157Called when the closing tag is reached for an element
158that contained a namespace declaration.
159\end{methoddesc}
160
161\begin{methoddesc}{CommentHandler}{data}
162Called for comments.
163\end{methoddesc}
164
165\begin{methoddesc}{StartCdataSectionHandler}{}
166Called at the start of a CDATA section.
167\end{methoddesc}
168
169\begin{methoddesc}{EndCdataSectionHandler}{}
170Called at the end of a CDATA section.
171\end{methoddesc}
172
173\begin{methoddesc}{DefaultHandler}{data}
174Called for any characters in the XML document for
175which no applicable handler has been specified. This means
176characters that are part of a construct which could be reported, but
177for which no handler has been supplied.
178\end{methoddesc}
179
180\begin{methoddesc}{DefaultHandlerExpand}{data}
181This is the same as the \function{DefaultHandler},
182but doesn't inhibit expansion of internal entities.
183The entity reference will not be passed to the default handler.
184\end{methoddesc}
185
186\begin{methoddesc}{NotStandaloneHandler}{}
187Called if the XML document hasn't been declared as being a standalone document.
188\end{methoddesc}
189
190\begin{methoddesc}{ExternalEntityRefHandler}{context, base, systemId, publicId}
191Called for references to external entities.
192\end{methoddesc}
193
194
Fred Drakec05cbb02000-07-05 02:03:34 +0000195\subsection{Example \label{pyexpat-example}}
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000196
Fred Drakec05cbb02000-07-05 02:03:34 +0000197The following program defines three handlers that just print out their
Andrew M. Kuchling6b14eeb2000-06-11 02:42:07 +0000198arguments.
199
200\begin{verbatim}
201
202import pyexpat
203
204# 3 handler functions
205def start_element(name, attrs):
206 print 'Start element:', name, attrs
207def end_element(name):
208 print 'End element:', name
209def char_data(data):
210 print 'Character data:', repr(data)
211
212p=pyexpat.ParserCreate()
213
214p.StartElementHandler = start_element
215p.EndElementHandler = end_element
216p.CharacterDataHandler= char_data
217
218p.Parse("""<?xml version="1.0"?>
219<parent id="top"><child1 name="paul">Text goes here</child1>
220<child2 name="fred">More text</child2>
221</parent>""")
222\end{verbatim}
223
224The output from this program is:
225
226\begin{verbatim}
227Start element: parent {'id': 'top'}
228Start element: child1 {'name': 'paul'}
229Character data: 'Text goes here'
230End element: child1
231Character data: '\012'
232Start element: child2 {'name': 'fred'}
233Character data: 'More text'
234End element: child2
235Character data: '\012'
236End element: parent
237\end{verbatim}
Fred Drakec05cbb02000-07-05 02:03:34 +0000238
239
240\section{\module{pyexpat.errors} --- Error constants}
241
242\declaremodule{builtin}{pyexpat.errors}
243\modulesynopsis{Error constants defined for the Expat parser}
244\moduleauthor{Paul Prescod}{paul@prescod.net}
245\sectionauthor{A.M. Kuchling}{amk1@bigfoot.com}
246
247The following table lists the error constants in the
248\module{pyexpat.errors} submodule, available once the
249\refmodule{pyexpat} module has been imported.
250
251Note that this module cannot be imported directly until
252\refmodule{pyexpat} has been imported.
253
254The following constants are defined:
255
256\begin{tableii}{l|l}{code}{Constants}{}
257 \lineii{XML_ERROR_ASYNC_ENTITY}
258 {XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF}
259 \lineii{XML_ERROR_BAD_CHAR_REF}
260 {XML_ERROR_BINARY_ENTITY_REF}
261 \lineii{XML_ERROR_DUPLICATE_ATTRIBUTE}
262 {XML_ERROR_INCORRECT_ENCODING}
263 \lineii{XML_ERROR_INVALID_TOKEN}
264 {XML_ERROR_JUNK_AFTER_DOC_ELEMENT}
265 \lineii{XML_ERROR_MISPLACED_XML_PI}
266 {XML_ERROR_NO_ELEMENTS}
267 \lineii{XML_ERROR_NO_MEMORY}
268 {XML_ERROR_PARAM_ENTITY_REF}
269 \lineii{XML_ERROR_PARTIAL_CHAR}
270 {XML_ERROR_RECURSIVE_ENTITY_REF}
271 \lineii{XML_ERROR_SYNTAX}
272 {XML_ERROR_TAG_MISMATCH}
273 \lineii{XML_ERROR_UNCLOSED_TOKEN}
274 {XML_ERROR_UNDEFINED_ENTITY}
275 \lineii{XML_ERROR_UNKNOWN_ENCODING}{}
276\end{tableii}