blob: 26f1fbe7a17858e4514634f9643098c4deac76d6 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Eli Benderskyc1d98692012-03-30 11:44:15 +030015Tutorial
16--------
Georg Brandl116aa622007-08-15 14:28:22 +000017
Eli Benderskyc1d98692012-03-30 11:44:15 +030018This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
19short). The goal is to demonstrate some of the building blocks and basic
20concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020021
Eli Benderskyc1d98692012-03-30 11:44:15 +030022XML tree and elements
23^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025XML is an inherently hierarchical data format, and the most natural way to
26represent it is with a tree. ``ET`` has two classes for this purpose -
27:class:`ElementTree` represents the whole XML document as a tree, and
28:class:`Element` represents a single node in this tree. Interactions with
29the whole document (reading and writing to/from files) are usually done
30on the :class:`ElementTree` level. Interactions with a single XML element
31and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020032
Eli Benderskyc1d98692012-03-30 11:44:15 +030033.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020034
Eli Benderskyc1d98692012-03-30 11:44:15 +030035Parsing XML
36^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020037
Eli Bendersky0f4e9342012-08-14 07:19:33 +030038We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Bendersky0f4e9342012-08-14 07:19:33 +030040.. code-block:: xml
41
42 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020043 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030044 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020045 <rank>1</rank>
46 <year>2008</year>
47 <gdppc>141100</gdppc>
48 <neighbor name="Austria" direction="E"/>
49 <neighbor name="Switzerland" direction="W"/>
50 </country>
51 <country name="Singapore">
52 <rank>4</rank>
53 <year>2011</year>
54 <gdppc>59900</gdppc>
55 <neighbor name="Malaysia" direction="N"/>
56 </country>
57 <country name="Panama">
58 <rank>68</rank>
59 <year>2011</year>
60 <gdppc>13600</gdppc>
61 <neighbor name="Costa Rica" direction="W"/>
62 <neighbor name="Colombia" direction="E"/>
63 </country>
64 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020065
Eli Bendersky0f4e9342012-08-14 07:19:33 +030066We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030067
68 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030069 tree = ET.parse('country_data.xml')
70 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030071
Eli Bendersky0f4e9342012-08-14 07:19:33 +030072Or directly from a string::
73
74 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030075
76:func:`fromstring` parses XML from a string directly into an :class:`Element`,
77which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030078create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030079
80As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
81
82 >>> root.tag
83 'data'
84 >>> root.attrib
85 {}
86
87It also has children nodes over which we can iterate::
88
89 >>> for child in root:
90 ... print(child.tag, child.attrib)
91 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030092 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +030093 country {'name': 'Singapore'}
94 country {'name': 'Panama'}
95
96Children are nested, and we can access specific child nodes by index::
97
98 >>> root[0][1].text
99 '2008'
100
101Finding interesting elements
102^^^^^^^^^^^^^^^^^^^^^^^^^^^^
103
104:class:`Element` has some useful methods that help iterate recursively over all
105the sub-tree below it (its children, their children, and so on). For example,
106:meth:`Element.iter`::
107
108 >>> for neighbor in root.iter('neighbor'):
109 ... print(neighbor.attrib)
110 ...
111 {'name': 'Austria', 'direction': 'E'}
112 {'name': 'Switzerland', 'direction': 'W'}
113 {'name': 'Malaysia', 'direction': 'N'}
114 {'name': 'Costa Rica', 'direction': 'W'}
115 {'name': 'Colombia', 'direction': 'E'}
116
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300117:meth:`Element.findall` finds only elements with a tag which are direct
118children of the current element. :meth:`Element.find` finds the *first* child
119with a particular tag, and :meth:`Element.text` accesses the element's text
120content. :meth:`Element.get` accesses the element's attributes::
121
122 >>> for country in root.findall('country'):
123 ... rank = country.find('rank').text
124 ... name = country.get('name')
125 ... print(name, rank)
126 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300127 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300128 Singapore 4
129 Panama 68
130
Eli Benderskyc1d98692012-03-30 11:44:15 +0300131More sophisticated specification of which elements to look for is possible by
132using :ref:`XPath <elementtree-xpath>`.
133
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300134Modifying an XML File
135^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300136
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300137:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300138The :meth:`ElementTree.write` method serves this purpose.
139
140Once created, an :class:`Element` object may be manipulated by directly changing
141its fields (such as :attr:`Element.text`), adding and modifying attributes
142(:meth:`Element.set` method), as well as adding new children (for example
143with :meth:`Element.append`).
144
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300145Let's say we want to add one to each country's rank, and add an ``updated``
146attribute to the rank element::
147
148 >>> for rank in root.iter('rank'):
149 ... new_rank = int(rank.text) + 1
150 ... rank.text = str(new_rank)
151 ... rank.set('updated', 'yes')
152 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300153 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300154
155Our XML now looks like this:
156
157.. code-block:: xml
158
159 <?xml version="1.0"?>
160 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300161 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300162 <rank updated="yes">2</rank>
163 <year>2008</year>
164 <gdppc>141100</gdppc>
165 <neighbor name="Austria" direction="E"/>
166 <neighbor name="Switzerland" direction="W"/>
167 </country>
168 <country name="Singapore">
169 <rank updated="yes">5</rank>
170 <year>2011</year>
171 <gdppc>59900</gdppc>
172 <neighbor name="Malaysia" direction="N"/>
173 </country>
174 <country name="Panama">
175 <rank updated="yes">69</rank>
176 <year>2011</year>
177 <gdppc>13600</gdppc>
178 <neighbor name="Costa Rica" direction="W"/>
179 <neighbor name="Colombia" direction="E"/>
180 </country>
181 </data>
182
183We can remove elements using :meth:`Element.remove`. Let's say we want to
184remove all countries with a rank higher than 50::
185
186 >>> for country in root.findall('country'):
187 ... rank = int(country.find('rank').text)
188 ... if rank > 50:
189 ... root.remove(country)
190 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300191 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300192
193Our XML now looks like this:
194
195.. code-block:: xml
196
197 <?xml version="1.0"?>
198 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300199 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300200 <rank updated="yes">2</rank>
201 <year>2008</year>
202 <gdppc>141100</gdppc>
203 <neighbor name="Austria" direction="E"/>
204 <neighbor name="Switzerland" direction="W"/>
205 </country>
206 <country name="Singapore">
207 <rank updated="yes">5</rank>
208 <year>2011</year>
209 <gdppc>59900</gdppc>
210 <neighbor name="Malaysia" direction="N"/>
211 </country>
212 </data>
213
214Building XML documents
215^^^^^^^^^^^^^^^^^^^^^^
216
Eli Benderskyc1d98692012-03-30 11:44:15 +0300217The :func:`SubElement` function also provides a convenient way to create new
218sub-elements for a given element::
219
220 >>> a = ET.Element('a')
221 >>> b = ET.SubElement(a, 'b')
222 >>> c = ET.SubElement(a, 'c')
223 >>> d = ET.SubElement(c, 'd')
224 >>> ET.dump(a)
225 <a><b /><c><d /></c></a>
226
227Additional resources
228^^^^^^^^^^^^^^^^^^^^
229
230See http://effbot.org/zone/element-index.htm for tutorials and links to other
231docs.
232
233
234.. _elementtree-xpath:
235
236XPath support
237-------------
238
239This module provides limited support for
240`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
241tree. The goal is to support a small subset of the abbreviated syntax; a full
242XPath engine is outside the scope of the module.
243
244Example
245^^^^^^^
246
247Here's an example that demonstrates some of the XPath capabilities of the
248module. We'll be using the ``countrydata`` XML document from the
249:ref:`Parsing XML <elementtree-parsing-xml>` section::
250
251 import xml.etree.ElementTree as ET
252
253 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200254
255 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300256 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200257
258 # All 'neighbor' grand-children of 'country' children of the top-level
259 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300260 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200261
262 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300263 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200264
265 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300266 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200267
268 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300269 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200270
271Supported XPath syntax
272^^^^^^^^^^^^^^^^^^^^^^
273
274+-----------------------+------------------------------------------------------+
275| Syntax | Meaning |
276+=======================+======================================================+
277| ``tag`` | Selects all child elements with the given tag. |
278| | For example, ``spam`` selects all child elements |
279| | named ``spam``, ``spam/egg`` selects all |
280| | grandchildren named ``egg`` in all children named |
281| | ``spam``. |
282+-----------------------+------------------------------------------------------+
283| ``*`` | Selects all child elements. For example, ``*/egg`` |
284| | selects all grandchildren named ``egg``. |
285+-----------------------+------------------------------------------------------+
286| ``.`` | Selects the current node. This is mostly useful |
287| | at the beginning of the path, to indicate that it's |
288| | a relative path. |
289+-----------------------+------------------------------------------------------+
290| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200291| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200292| | all ``egg`` elements in the entire tree. |
293+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700294| ``..`` | Selects the parent element. Returns ``None`` if the |
295| | path attempts to reach the ancestors of the start |
296| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200297+-----------------------+------------------------------------------------------+
298| ``[@attrib]`` | Selects all elements that have the given attribute. |
299+-----------------------+------------------------------------------------------+
300| ``[@attrib='value']`` | Selects all elements for which the given attribute |
301| | has the given value. The value cannot contain |
302| | quotes. |
303+-----------------------+------------------------------------------------------+
304| ``[tag]`` | Selects all elements that have a child named |
305| | ``tag``. Only immediate children are supported. |
306+-----------------------+------------------------------------------------------+
307| ``[position]`` | Selects all elements that are located at the given |
308| | position. The position can be either an integer |
309| | (1 is the first position), the expression ``last()`` |
310| | (for the last position), or a position relative to |
311| | the last position (e.g. ``last()-1``). |
312+-----------------------+------------------------------------------------------+
313
314Predicates (expressions within square brackets) must be preceded by a tag
315name, an asterisk, or another predicate. ``position`` predicates must be
316preceded by a tag name.
317
318Reference
319---------
320
Georg Brandl116aa622007-08-15 14:28:22 +0000321.. _elementtree-functions:
322
323Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200324^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000325
326
Georg Brandl7f01a132009-09-16 15:58:14 +0000327.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000328
Georg Brandlf6945182008-02-01 11:56:49 +0000329 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000330 that will be serialized as an XML comment by the standard serializer. The
331 comment string can be either a bytestring or a Unicode string. *text* is a
332 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000333 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000334
335
336.. function:: dump(elem)
337
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000338 Writes an element tree or element structure to sys.stdout. This function
339 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000340
341 The exact output format is implementation dependent. In this version, it's
342 written as an ordinary XML file.
343
344 *elem* is an element tree or an individual element.
345
346
Georg Brandl116aa622007-08-15 14:28:22 +0000347.. function:: fromstring(text)
348
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000349 Parses an XML section from a string constant. Same as :func:`XML`. *text*
350 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000351
352
353.. function:: fromstringlist(sequence, parser=None)
354
355 Parses an XML document from a sequence of string fragments. *sequence* is a
356 list or other sequence containing XML data fragments. *parser* is an
357 optional parser instance. If not given, the standard :class:`XMLParser`
358 parser is used. Returns an :class:`Element` instance.
359
Ezio Melottif8754a62010-03-21 07:16:43 +0000360 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000361
362
363.. function:: iselement(element)
364
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000365 Checks if an object appears to be a valid element object. *element* is an
366 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000367
368
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000369.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200372 going on to the user. *source* is a filename or :term:`file object`
373 containing XML data. *events* is a list of events to report back. The
374 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"``
375 and ``"end-ns"`` (the "ns" events are used to get detailed namespace
376 information). If *events* is omitted, only ``"end"`` events are reported.
377 *parser* is an optional parser instance. If not given, the standard
378 :class:`XMLParser` parser is used. Returns an :term:`iterator` providing
379 ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000380
Benjamin Peterson75edad02009-01-01 15:05:06 +0000381 .. note::
382
383 :func:`iterparse` only guarantees that it has seen the ">"
384 character of a starting tag when it emits a "start" event, so the
385 attributes are defined, but the contents of the text and tail attributes
386 are undefined at that point. The same applies to the element children;
387 they may or may not be present.
388
389 If you need a fully populated element, look for "end" events instead.
390
Georg Brandl116aa622007-08-15 14:28:22 +0000391
Georg Brandl7f01a132009-09-16 15:58:14 +0000392.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000393
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000394 Parses an XML section into an element tree. *source* is a filename or file
395 object containing XML data. *parser* is an optional parser instance. If
396 not given, the standard :class:`XMLParser` parser is used. Returns an
397 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000398
399
Georg Brandl7f01a132009-09-16 15:58:14 +0000400.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000401
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000402 PI element factory. This factory function creates a special element that
403 will be serialized as an XML processing instruction. *target* is a string
404 containing the PI target. *text* is a string containing the PI contents, if
405 given. Returns an element instance, representing a processing instruction.
406
407
408.. function:: register_namespace(prefix, uri)
409
410 Registers a namespace prefix. The registry is global, and any existing
411 mapping for either the given prefix or the namespace URI will be removed.
412 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
413 attributes in this namespace will be serialized with the given prefix, if at
414 all possible.
415
Ezio Melottif8754a62010-03-21 07:16:43 +0000416 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000417
418
Georg Brandl7f01a132009-09-16 15:58:14 +0000419.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000420
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000421 Subelement factory. This function creates an element instance, and appends
422 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000423
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000424 The element name, attribute names, and attribute values can be either
425 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
426 the subelement name. *attrib* is an optional dictionary, containing element
427 attributes. *extra* contains additional attributes, given as keyword
428 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000429
430
Florent Xiclunac17f1722010-08-08 19:48:29 +0000431.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl116aa622007-08-15 14:28:22 +0000432
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000433 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000434 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000435 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700436 generate a Unicode string (otherwise, a bytestring is generated). *method*
437 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
438 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000439
440
Florent Xiclunac17f1722010-08-08 19:48:29 +0000441.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000442
443 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000444 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000445 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700446 generate a Unicode string (otherwise, a bytestring is generated). *method*
447 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
448 Returns a list of (optionally) encoded strings containing the XML data.
449 It does not guarantee any specific sequence, except that
450 ``"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000451
Ezio Melottif8754a62010-03-21 07:16:43 +0000452 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000453
454
455.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000456
457 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000458 embed "XML literals" in Python code. *text* is a string containing XML
459 data. *parser* is an optional parser instance. If not given, the standard
460 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000461
462
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000463.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000464
465 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000466 which maps from element id:s to elements. *text* is a string containing XML
467 data. *parser* is an optional parser instance. If not given, the standard
468 :class:`XMLParser` parser is used. Returns a tuple containing an
469 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000470
471
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000472.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000473
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000474Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200475^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000476
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000477.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000478
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000479 Element class. This class defines the Element interface, and provides a
480 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000481
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000482 The element name, attribute names, and attribute values can be either
483 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
484 an optional dictionary, containing element attributes. *extra* contains
485 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000486
487
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000488 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000489
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000490 A string identifying what kind of data this element represents (the
491 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000492
493
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000494 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000495
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000496 The *text* attribute can be used to hold additional data associated with
497 the element. As the name implies this attribute is usually a string but
498 may be any application-specific object. If the element is created from
499 an XML file the attribute will contain any text found between the element
500 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000501
502
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000503 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000504
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000505 The *tail* attribute can be used to hold additional data associated with
506 the element. This attribute is usually a string but may be any
507 application-specific object. If the element is created from an XML file
508 the attribute will contain any text found after the element's end tag and
509 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000510
Georg Brandl116aa622007-08-15 14:28:22 +0000511
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000512 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000513
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000514 A dictionary containing the element's attributes. Note that while the
515 *attrib* value is always a real mutable Python dictionary, an ElementTree
516 implementation may choose to use another internal representation, and
517 create the dictionary only if someone asks for it. To take advantage of
518 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000519
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000520 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000521
522
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000523 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000524
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000525 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700526 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000527
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000528
529 .. method:: get(key, default=None)
530
531 Gets the element attribute named *key*.
532
533 Returns the attribute value, or *default* if the attribute was not found.
534
535
536 .. method:: items()
537
538 Returns the element attributes as a sequence of (name, value) pairs. The
539 attributes are returned in an arbitrary order.
540
541
542 .. method:: keys()
543
544 Returns the elements attribute names as a list. The names are returned
545 in an arbitrary order.
546
547
548 .. method:: set(key, value)
549
550 Set the attribute *key* on the element to *value*.
551
552 The following methods work on the element's children (subelements).
553
554
555 .. method:: append(subelement)
556
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200557 Adds the element *subelement* to the end of this element's internal list
558 of subelements. Raises :exc:`TypeError` if *subelement* is not an
559 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000560
561
562 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000563
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000564 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200565 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000566
Ezio Melottif8754a62010-03-21 07:16:43 +0000567 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000568
Georg Brandl116aa622007-08-15 14:28:22 +0000569
Eli Bendersky737b1732012-05-29 06:02:56 +0300570 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000571
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000572 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200573 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300574 or ``None``. *namespaces* is an optional mapping from namespace prefix
575 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000576
Georg Brandl116aa622007-08-15 14:28:22 +0000577
Eli Bendersky737b1732012-05-29 06:02:56 +0300578 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000579
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200580 Finds all matching subelements, by tag name or
581 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300582 elements in document order. *namespaces* is an optional mapping from
583 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000584
Georg Brandl116aa622007-08-15 14:28:22 +0000585
Eli Bendersky737b1732012-05-29 06:02:56 +0300586 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000587
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000588 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200589 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
590 of the first matching element, or *default* if no element was found.
591 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300592 is returned. *namespaces* is an optional mapping from namespace prefix
593 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000594
Georg Brandl116aa622007-08-15 14:28:22 +0000595
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000596 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000597
Georg Brandl67b21b72010-08-17 15:07:14 +0000598 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000599 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000600
Georg Brandl116aa622007-08-15 14:28:22 +0000601
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000602 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000603
Georg Brandl67b21b72010-08-17 15:07:14 +0000604 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000605 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000606
Georg Brandl116aa622007-08-15 14:28:22 +0000607
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200608 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000609
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200610 Inserts *subelement* at the given position in this element. Raises
611 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000612
Georg Brandl116aa622007-08-15 14:28:22 +0000613
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000614 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000615
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000616 Creates a tree :term:`iterator` with the current element as the root.
617 The iterator iterates over this element and all elements below it, in
618 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
619 elements whose tag equals *tag* are returned from the iterator. If the
620 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Ezio Melotti138fc892011-10-10 00:02:03 +0300622 .. versionadded:: 3.2
623
Georg Brandl116aa622007-08-15 14:28:22 +0000624
Eli Bendersky737b1732012-05-29 06:02:56 +0300625 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000626
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200627 Finds all matching subelements, by tag name or
628 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300629 matching elements in document order. *namespaces* is an optional mapping
630 from namespace prefix to full name.
631
Georg Brandl116aa622007-08-15 14:28:22 +0000632
Ezio Melottif8754a62010-03-21 07:16:43 +0000633 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000636 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000637
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000638 Creates a text iterator. The iterator loops over this element and all
639 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000640
Ezio Melottif8754a62010-03-21 07:16:43 +0000641 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000642
643
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000644 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000645
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000646 Creates a new element object of the same type as this element. Do not
647 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000648
649
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000650 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000651
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000652 Removes *subelement* from the element. Unlike the find\* methods this
653 method compares elements based on the instance identity, not on tag value
654 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000656 :class:`Element` objects also support the following sequence type methods
657 for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
658 :meth:`__setitem__`, :meth:`__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000659
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000660 Caution: Elements with no subelements will test as ``False``. This behavior
661 will change in future versions. Use specific ``len(elem)`` or ``elem is
662 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000663
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000664 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000665
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000666 if not element: # careful!
667 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000669 if element is None:
670 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000671
672
673.. _elementtree-elementtree-objects:
674
675ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200676^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000677
678
Georg Brandl7f01a132009-09-16 15:58:14 +0000679.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000680
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000681 ElementTree wrapper class. This class represents an entire element
682 hierarchy, and adds some extra support for serialization to and from
683 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000684
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000685 *element* is the root element. The tree is initialized with the contents
686 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000687
688
Benjamin Petersone41251e2008-04-25 01:59:09 +0000689 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000690
Benjamin Petersone41251e2008-04-25 01:59:09 +0000691 Replaces the root element for this tree. This discards the current
692 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000693 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000694
695
Eli Bendersky737b1732012-05-29 06:02:56 +0300696 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000697
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200698 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000699
700
Eli Bendersky737b1732012-05-29 06:02:56 +0300701 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200703 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000704
705
Eli Bendersky737b1732012-05-29 06:02:56 +0300706 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000707
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200708 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000709
710
Georg Brandl7f01a132009-09-16 15:58:14 +0000711 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000712
Georg Brandl67b21b72010-08-17 15:07:14 +0000713 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000714 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000715
716
Benjamin Petersone41251e2008-04-25 01:59:09 +0000717 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000718
Benjamin Petersone41251e2008-04-25 01:59:09 +0000719 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000720
721
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000722 .. method:: iter(tag=None)
723
724 Creates and returns a tree iterator for the root element. The iterator
725 loops over all elements in this tree, in section order. *tag* is the tag
726 to look for (default is to return all elements)
727
728
Eli Bendersky737b1732012-05-29 06:02:56 +0300729 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000730
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200731 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000732
Ezio Melottif8754a62010-03-21 07:16:43 +0000733 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000734
735
Georg Brandl7f01a132009-09-16 15:58:14 +0000736 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000737
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000738 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000739 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300740 If not given, the standard :class:`XMLParser` parser is used. Returns the
741 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000742
743
Eli Benderskyf96cf912012-07-15 06:19:44 +0300744 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
745 method="xml")
Georg Brandl116aa622007-08-15 14:28:22 +0000746
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000747 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300748 :term:`file object` opened for writing. *encoding* [1]_ is the output
749 encoding (default is US-ASCII).
750 *xml_declaration* controls if an XML declaration should be added to the
751 file. Use ``False`` for never, ``True`` for always, ``None``
752 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
753 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
754 ``"xml"``).
755
756 The output is either a string (:class:`str`) or binary (:class:`bytes`).
757 This is controlled by the *encoding* argument. If *encoding* is
758 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
759 this may conflict with the type of *file* if it's an open
760 :term:`file object`; make sure you do not try to write a string to a
761 binary stream and vice versa.
762
Georg Brandl116aa622007-08-15 14:28:22 +0000763
Christian Heimesd8654cf2007-12-02 15:22:16 +0000764This is the XML file that is going to be manipulated::
765
766 <html>
767 <head>
768 <title>Example page</title>
769 </head>
770 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000771 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000772 or <a href="http://example.com/">example.com</a>.</p>
773 </body>
774 </html>
775
776Example of changing the attribute "target" of every link in first paragraph::
777
778 >>> from xml.etree.ElementTree import ElementTree
779 >>> tree = ElementTree()
780 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000781 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000782 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
783 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000784 <Element 'p' at 0xb77ec26c>
785 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000786 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000787 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000788 >>> for i in links: # Iterates through all found links
789 ... i.attrib["target"] = "blank"
790 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000791
792.. _elementtree-qname-objects:
793
794QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200795^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000796
797
Georg Brandl7f01a132009-09-16 15:58:14 +0000798.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000799
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000800 QName wrapper. This can be used to wrap a QName attribute value, in order
801 to get proper namespace handling on output. *text_or_uri* is a string
802 containing the QName value, in the form {uri}local, or, if the tag argument
803 is given, the URI part of a QName. If *tag* is given, the first argument is
804 interpreted as an URI, and this argument is interpreted as a local name.
805 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000806
807
808.. _elementtree-treebuilder-objects:
809
810TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200811^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000812
813
Georg Brandl7f01a132009-09-16 15:58:14 +0000814.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000815
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000816 Generic element structure builder. This builder converts a sequence of
817 start, data, and end method calls to a well-formed element structure. You
818 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300819 or a parser for some other XML-like format. *element_factory*, when given,
820 must be a callable accepting two positional arguments: a tag and
821 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000822
Benjamin Petersone41251e2008-04-25 01:59:09 +0000823 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000824
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000825 Flushes the builder buffers, and returns the toplevel document
826 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000827
828
Benjamin Petersone41251e2008-04-25 01:59:09 +0000829 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000830
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000831 Adds text to the current element. *data* is a string. This should be
832 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +0000833
834
Benjamin Petersone41251e2008-04-25 01:59:09 +0000835 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +0000836
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000837 Closes the current element. *tag* is the element name. Returns the
838 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +0000839
840
Benjamin Petersone41251e2008-04-25 01:59:09 +0000841 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000842
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000843 Opens a new element. *tag* is the element name. *attrs* is a dictionary
844 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +0000845
846
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000847 In addition, a custom :class:`TreeBuilder` object can provide the
848 following method:
Georg Brandl116aa622007-08-15 14:28:22 +0000849
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000850 .. method:: doctype(name, pubid, system)
851
852 Handles a doctype declaration. *name* is the doctype name. *pubid* is
853 the public identifier. *system* is the system identifier. This method
854 does not exist on the default :class:`TreeBuilder` class.
855
Ezio Melottif8754a62010-03-21 07:16:43 +0000856 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000857
858
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000859.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000860
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000861XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200862^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000863
864
865.. class:: XMLParser(html=0, target=None, encoding=None)
866
867 :class:`Element` structure builder for XML source data, based on the expat
868 parser. *html* are predefined HTML entities. This flag is not supported by
869 the current implementation. *target* is the target object. If omitted, the
Eli Bendersky1bf23942012-06-01 07:15:00 +0300870 builder uses an instance of the standard :class:`TreeBuilder` class.
Eli Bendersky52467b12012-06-01 07:13:08 +0300871 *encoding* [1]_ is optional. If given, the value overrides the encoding
872 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +0000873
874
Benjamin Petersone41251e2008-04-25 01:59:09 +0000875 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000876
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000877 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl116aa622007-08-15 14:28:22 +0000878
879
Benjamin Petersone41251e2008-04-25 01:59:09 +0000880 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +0000881
Georg Brandl67b21b72010-08-17 15:07:14 +0000882 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000883 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
884 target.
Georg Brandl116aa622007-08-15 14:28:22 +0000885
886
Benjamin Petersone41251e2008-04-25 01:59:09 +0000887 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000888
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000889 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +0000890
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000891:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Christian Heimesd8654cf2007-12-02 15:22:16 +0000892for each opening tag, its :meth:`end` method for each closing tag,
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000893and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandl48310cd2009-01-03 21:18:54 +0000894calls *target*\'s method :meth:`close`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000895:class:`XMLParser` can be used not only for building a tree structure.
Christian Heimesd8654cf2007-12-02 15:22:16 +0000896This is an example of counting the maximum depth of an XML file::
897
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000898 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +0000899 >>> class MaxDepth: # The target object of the parser
900 ... maxDepth = 0
901 ... depth = 0
902 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +0000903 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +0000904 ... if self.depth > self.maxDepth:
905 ... self.maxDepth = self.depth
906 ... def end(self, tag): # Called for each closing tag.
907 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +0000908 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +0000909 ... pass # We do not need to do anything with data.
910 ... def close(self): # Called when all data has been parsed.
911 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +0000912 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +0000913 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000914 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +0000915 >>> exampleXml = """
916 ... <a>
917 ... <b>
918 ... </b>
919 ... <b>
920 ... <c>
921 ... <d>
922 ... </d>
923 ... </c>
924 ... </b>
925 ... </a>"""
926 >>> parser.feed(exampleXml)
927 >>> parser.close()
928 4
Christian Heimesb186d002008-03-18 15:15:01 +0000929
Eli Bendersky5b77d812012-03-16 08:20:05 +0200930Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200931^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +0200932
933.. class:: ParseError
934
935 XML parse error, raised by the various parsing methods in this module when
936 parsing fails. The string representation of an instance of this exception
937 will contain a user-friendly error message. In addition, it will have
938 the following attributes available:
939
940 .. attribute:: code
941
942 A numeric error code from the expat parser. See the documentation of
943 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
944
945 .. attribute:: position
946
947 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +0000948
949.. rubric:: Footnotes
950
951.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000952 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
953 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Benjamin Petersonad3d5c22009-02-26 03:38:59 +0000954 and http://www.iana.org/assignments/character-sets.