blob: 07ace48b7712fbc52311a7356a09710bd7744c9f [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
108Finding interesting elements
109^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110
111:class:`Element` has some useful methods that help iterate recursively over all
112the sub-tree below it (its children, their children, and so on). For example,
113:meth:`Element.iter`::
114
115 >>> for neighbor in root.iter('neighbor'):
116 ... print(neighbor.attrib)
117 ...
118 {'name': 'Austria', 'direction': 'E'}
119 {'name': 'Switzerland', 'direction': 'W'}
120 {'name': 'Malaysia', 'direction': 'N'}
121 {'name': 'Costa Rica', 'direction': 'W'}
122 {'name': 'Colombia', 'direction': 'E'}
123
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300124:meth:`Element.findall` finds only elements with a tag which are direct
125children of the current element. :meth:`Element.find` finds the *first* child
126with a particular tag, and :meth:`Element.text` accesses the element's text
127content. :meth:`Element.get` accesses the element's attributes::
128
129 >>> for country in root.findall('country'):
130 ... rank = country.find('rank').text
131 ... name = country.get('name')
132 ... print(name, rank)
133 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300134 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300135 Singapore 4
136 Panama 68
137
Eli Benderskyc1d98692012-03-30 11:44:15 +0300138More sophisticated specification of which elements to look for is possible by
139using :ref:`XPath <elementtree-xpath>`.
140
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300141Modifying an XML File
142^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300143
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300144:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300145The :meth:`ElementTree.write` method serves this purpose.
146
147Once created, an :class:`Element` object may be manipulated by directly changing
148its fields (such as :attr:`Element.text`), adding and modifying attributes
149(:meth:`Element.set` method), as well as adding new children (for example
150with :meth:`Element.append`).
151
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300152Let's say we want to add one to each country's rank, and add an ``updated``
153attribute to the rank element::
154
155 >>> for rank in root.iter('rank'):
156 ... new_rank = int(rank.text) + 1
157 ... rank.text = str(new_rank)
158 ... rank.set('updated', 'yes')
159 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300160 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300161
162Our XML now looks like this:
163
164.. code-block:: xml
165
166 <?xml version="1.0"?>
167 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300168 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300169 <rank updated="yes">2</rank>
170 <year>2008</year>
171 <gdppc>141100</gdppc>
172 <neighbor name="Austria" direction="E"/>
173 <neighbor name="Switzerland" direction="W"/>
174 </country>
175 <country name="Singapore">
176 <rank updated="yes">5</rank>
177 <year>2011</year>
178 <gdppc>59900</gdppc>
179 <neighbor name="Malaysia" direction="N"/>
180 </country>
181 <country name="Panama">
182 <rank updated="yes">69</rank>
183 <year>2011</year>
184 <gdppc>13600</gdppc>
185 <neighbor name="Costa Rica" direction="W"/>
186 <neighbor name="Colombia" direction="E"/>
187 </country>
188 </data>
189
190We can remove elements using :meth:`Element.remove`. Let's say we want to
191remove all countries with a rank higher than 50::
192
193 >>> for country in root.findall('country'):
194 ... rank = int(country.find('rank').text)
195 ... if rank > 50:
196 ... root.remove(country)
197 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300198 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300199
200Our XML now looks like this:
201
202.. code-block:: xml
203
204 <?xml version="1.0"?>
205 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300206 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300207 <rank updated="yes">2</rank>
208 <year>2008</year>
209 <gdppc>141100</gdppc>
210 <neighbor name="Austria" direction="E"/>
211 <neighbor name="Switzerland" direction="W"/>
212 </country>
213 <country name="Singapore">
214 <rank updated="yes">5</rank>
215 <year>2011</year>
216 <gdppc>59900</gdppc>
217 <neighbor name="Malaysia" direction="N"/>
218 </country>
219 </data>
220
221Building XML documents
222^^^^^^^^^^^^^^^^^^^^^^
223
Eli Benderskyc1d98692012-03-30 11:44:15 +0300224The :func:`SubElement` function also provides a convenient way to create new
225sub-elements for a given element::
226
227 >>> a = ET.Element('a')
228 >>> b = ET.SubElement(a, 'b')
229 >>> c = ET.SubElement(a, 'c')
230 >>> d = ET.SubElement(c, 'd')
231 >>> ET.dump(a)
232 <a><b /><c><d /></c></a>
233
234Additional resources
235^^^^^^^^^^^^^^^^^^^^
236
237See http://effbot.org/zone/element-index.htm for tutorials and links to other
238docs.
239
240
241.. _elementtree-xpath:
242
243XPath support
244-------------
245
246This module provides limited support for
247`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
248tree. The goal is to support a small subset of the abbreviated syntax; a full
249XPath engine is outside the scope of the module.
250
251Example
252^^^^^^^
253
254Here's an example that demonstrates some of the XPath capabilities of the
255module. We'll be using the ``countrydata`` XML document from the
256:ref:`Parsing XML <elementtree-parsing-xml>` section::
257
258 import xml.etree.ElementTree as ET
259
260 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200261
262 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300263 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200264
265 # All 'neighbor' grand-children of 'country' children of the top-level
266 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300267 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200268
269 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300270 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200271
272 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300273 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200274
275 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300276 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200277
278Supported XPath syntax
279^^^^^^^^^^^^^^^^^^^^^^
280
Georg Brandl44ea77b2013-03-28 13:28:44 +0100281.. tabularcolumns:: |l|L|
282
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200283+-----------------------+------------------------------------------------------+
284| Syntax | Meaning |
285+=======================+======================================================+
286| ``tag`` | Selects all child elements with the given tag. |
287| | For example, ``spam`` selects all child elements |
288| | named ``spam``, ``spam/egg`` selects all |
289| | grandchildren named ``egg`` in all children named |
290| | ``spam``. |
291+-----------------------+------------------------------------------------------+
292| ``*`` | Selects all child elements. For example, ``*/egg`` |
293| | selects all grandchildren named ``egg``. |
294+-----------------------+------------------------------------------------------+
295| ``.`` | Selects the current node. This is mostly useful |
296| | at the beginning of the path, to indicate that it's |
297| | a relative path. |
298+-----------------------+------------------------------------------------------+
299| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200300| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200301| | all ``egg`` elements in the entire tree. |
302+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700303| ``..`` | Selects the parent element. Returns ``None`` if the |
304| | path attempts to reach the ancestors of the start |
305| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200306+-----------------------+------------------------------------------------------+
307| ``[@attrib]`` | Selects all elements that have the given attribute. |
308+-----------------------+------------------------------------------------------+
309| ``[@attrib='value']`` | Selects all elements for which the given attribute |
310| | has the given value. The value cannot contain |
311| | quotes. |
312+-----------------------+------------------------------------------------------+
313| ``[tag]`` | Selects all elements that have a child named |
314| | ``tag``. Only immediate children are supported. |
315+-----------------------+------------------------------------------------------+
316| ``[position]`` | Selects all elements that are located at the given |
317| | position. The position can be either an integer |
318| | (1 is the first position), the expression ``last()`` |
319| | (for the last position), or a position relative to |
320| | the last position (e.g. ``last()-1``). |
321+-----------------------+------------------------------------------------------+
322
323Predicates (expressions within square brackets) must be preceded by a tag
324name, an asterisk, or another predicate. ``position`` predicates must be
325preceded by a tag name.
326
327Reference
328---------
329
Georg Brandl116aa622007-08-15 14:28:22 +0000330.. _elementtree-functions:
331
332Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200333^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000334
335
Georg Brandl7f01a132009-09-16 15:58:14 +0000336.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000337
Georg Brandlf6945182008-02-01 11:56:49 +0000338 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000339 that will be serialized as an XML comment by the standard serializer. The
340 comment string can be either a bytestring or a Unicode string. *text* is a
341 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000342 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000343
344
345.. function:: dump(elem)
346
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000347 Writes an element tree or element structure to sys.stdout. This function
348 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000349
350 The exact output format is implementation dependent. In this version, it's
351 written as an ordinary XML file.
352
353 *elem* is an element tree or an individual element.
354
355
Georg Brandl116aa622007-08-15 14:28:22 +0000356.. function:: fromstring(text)
357
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000358 Parses an XML section from a string constant. Same as :func:`XML`. *text*
359 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000360
361
362.. function:: fromstringlist(sequence, parser=None)
363
364 Parses an XML document from a sequence of string fragments. *sequence* is a
365 list or other sequence containing XML data fragments. *parser* is an
366 optional parser instance. If not given, the standard :class:`XMLParser`
367 parser is used. Returns an :class:`Element` instance.
368
Ezio Melottif8754a62010-03-21 07:16:43 +0000369 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000370
371
372.. function:: iselement(element)
373
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000374 Checks if an object appears to be a valid element object. *element* is an
375 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000376
377
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000378.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000379
380 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200381 going on to the user. *source* is a filename or :term:`file object`
382 containing XML data. *events* is a list of events to report back. The
383 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"``
384 and ``"end-ns"`` (the "ns" events are used to get detailed namespace
385 information). If *events* is omitted, only ``"end"`` events are reported.
386 *parser* is an optional parser instance. If not given, the standard
387 :class:`XMLParser` parser is used. Returns an :term:`iterator` providing
388 ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000389
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700390 Note that while :func:`iterparse` builds the tree incrementally, it issues
391 blocking reads on *source* (or the file it names). As such, it's unsuitable
392 for asynchronous applications where blocking reads can't be made. For fully
393 asynchronous parsing, see :class:`IncrementalParser`.
394
Benjamin Peterson75edad02009-01-01 15:05:06 +0000395 .. note::
396
397 :func:`iterparse` only guarantees that it has seen the ">"
398 character of a starting tag when it emits a "start" event, so the
399 attributes are defined, but the contents of the text and tail attributes
400 are undefined at that point. The same applies to the element children;
401 they may or may not be present.
402
403 If you need a fully populated element, look for "end" events instead.
404
Georg Brandl7f01a132009-09-16 15:58:14 +0000405.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000406
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000407 Parses an XML section into an element tree. *source* is a filename or file
408 object containing XML data. *parser* is an optional parser instance. If
409 not given, the standard :class:`XMLParser` parser is used. Returns an
410 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412
Georg Brandl7f01a132009-09-16 15:58:14 +0000413.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000414
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000415 PI element factory. This factory function creates a special element that
416 will be serialized as an XML processing instruction. *target* is a string
417 containing the PI target. *text* is a string containing the PI contents, if
418 given. Returns an element instance, representing a processing instruction.
419
420
421.. function:: register_namespace(prefix, uri)
422
423 Registers a namespace prefix. The registry is global, and any existing
424 mapping for either the given prefix or the namespace URI will be removed.
425 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
426 attributes in this namespace will be serialized with the given prefix, if at
427 all possible.
428
Ezio Melottif8754a62010-03-21 07:16:43 +0000429 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000430
431
Georg Brandl7f01a132009-09-16 15:58:14 +0000432.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000433
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000434 Subelement factory. This function creates an element instance, and appends
435 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000436
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000437 The element name, attribute names, and attribute values can be either
438 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
439 the subelement name. *attrib* is an optional dictionary, containing element
440 attributes. *extra* contains additional attributes, given as keyword
441 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000442
443
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200444.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800445 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000446
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000447 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000448 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000449 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700450 generate a Unicode string (otherwise, a bytestring is generated). *method*
451 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800452 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700453 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000454
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800455 .. versionadded:: 3.4
456 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000457
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800458
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200459.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800460 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000461
462 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000463 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000464 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700465 generate a Unicode string (otherwise, a bytestring is generated). *method*
466 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800467 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700468 Returns a list of (optionally) encoded strings containing the XML data.
469 It does not guarantee any specific sequence, except that
470 ``"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000471
Ezio Melottif8754a62010-03-21 07:16:43 +0000472 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000473
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800474 .. versionadded:: 3.4
475 The *short_empty_elements* parameter.
476
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000477
478.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000479
480 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000481 embed "XML literals" in Python code. *text* is a string containing XML
482 data. *parser* is an optional parser instance. If not given, the standard
483 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000484
485
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000486.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000487
488 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000489 which maps from element id:s to elements. *text* is a string containing XML
490 data. *parser* is an optional parser instance. If not given, the standard
491 :class:`XMLParser` parser is used. Returns a tuple containing an
492 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000493
494
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000495.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000496
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000497Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200498^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000499
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000500.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000501
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000502 Element class. This class defines the Element interface, and provides a
503 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000504
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000505 The element name, attribute names, and attribute values can be either
506 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
507 an optional dictionary, containing element attributes. *extra* contains
508 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000509
510
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000511 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000512
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000513 A string identifying what kind of data this element represents (the
514 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000515
516
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000517 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000518
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000519 The *text* attribute can be used to hold additional data associated with
520 the element. As the name implies this attribute is usually a string but
521 may be any application-specific object. If the element is created from
522 an XML file the attribute will contain any text found between the element
523 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000524
525
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000526 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000527
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000528 The *tail* attribute can be used to hold additional data associated with
529 the element. This attribute is usually a string but may be any
530 application-specific object. If the element is created from an XML file
531 the attribute will contain any text found after the element's end tag and
532 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000533
Georg Brandl116aa622007-08-15 14:28:22 +0000534
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000535 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000536
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000537 A dictionary containing the element's attributes. Note that while the
538 *attrib* value is always a real mutable Python dictionary, an ElementTree
539 implementation may choose to use another internal representation, and
540 create the dictionary only if someone asks for it. To take advantage of
541 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000542
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000543 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000544
545
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000546 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000547
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000548 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700549 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000550
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000551
552 .. method:: get(key, default=None)
553
554 Gets the element attribute named *key*.
555
556 Returns the attribute value, or *default* if the attribute was not found.
557
558
559 .. method:: items()
560
561 Returns the element attributes as a sequence of (name, value) pairs. The
562 attributes are returned in an arbitrary order.
563
564
565 .. method:: keys()
566
567 Returns the elements attribute names as a list. The names are returned
568 in an arbitrary order.
569
570
571 .. method:: set(key, value)
572
573 Set the attribute *key* on the element to *value*.
574
575 The following methods work on the element's children (subelements).
576
577
578 .. method:: append(subelement)
579
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200580 Adds the element *subelement* to the end of this element's internal list
581 of subelements. Raises :exc:`TypeError` if *subelement* is not an
582 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000583
584
585 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000586
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000587 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200588 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000589
Ezio Melottif8754a62010-03-21 07:16:43 +0000590 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000591
Georg Brandl116aa622007-08-15 14:28:22 +0000592
Eli Bendersky737b1732012-05-29 06:02:56 +0300593 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000594
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000595 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200596 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300597 or ``None``. *namespaces* is an optional mapping from namespace prefix
598 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000599
Georg Brandl116aa622007-08-15 14:28:22 +0000600
Eli Bendersky737b1732012-05-29 06:02:56 +0300601 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000602
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200603 Finds all matching subelements, by tag name or
604 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300605 elements in document order. *namespaces* is an optional mapping from
606 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000607
Georg Brandl116aa622007-08-15 14:28:22 +0000608
Eli Bendersky737b1732012-05-29 06:02:56 +0300609 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000610
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000611 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200612 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
613 of the first matching element, or *default* if no element was found.
614 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300615 is returned. *namespaces* is an optional mapping from namespace prefix
616 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000617
Georg Brandl116aa622007-08-15 14:28:22 +0000618
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000619 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000620
Georg Brandl67b21b72010-08-17 15:07:14 +0000621 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000622 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000623
Georg Brandl116aa622007-08-15 14:28:22 +0000624
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000625 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000626
Georg Brandl67b21b72010-08-17 15:07:14 +0000627 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000628 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000629
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200631 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000632
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200633 Inserts *subelement* at the given position in this element. Raises
634 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Georg Brandl116aa622007-08-15 14:28:22 +0000636
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000637 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000638
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000639 Creates a tree :term:`iterator` with the current element as the root.
640 The iterator iterates over this element and all elements below it, in
641 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
642 elements whose tag equals *tag* are returned from the iterator. If the
643 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000644
Ezio Melotti138fc892011-10-10 00:02:03 +0300645 .. versionadded:: 3.2
646
Georg Brandl116aa622007-08-15 14:28:22 +0000647
Eli Bendersky737b1732012-05-29 06:02:56 +0300648 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000649
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200650 Finds all matching subelements, by tag name or
651 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300652 matching elements in document order. *namespaces* is an optional mapping
653 from namespace prefix to full name.
654
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Ezio Melottif8754a62010-03-21 07:16:43 +0000656 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000657
Georg Brandl116aa622007-08-15 14:28:22 +0000658
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000659 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000660
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000661 Creates a text iterator. The iterator loops over this element and all
662 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000663
Ezio Melottif8754a62010-03-21 07:16:43 +0000664 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000665
666
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000667 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000669 Creates a new element object of the same type as this element. Do not
670 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000671
672
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000673 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000674
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000675 Removes *subelement* from the element. Unlike the find\* methods this
676 method compares elements based on the instance identity, not on tag value
677 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000678
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000679 :class:`Element` objects also support the following sequence type methods
680 for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
681 :meth:`__setitem__`, :meth:`__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000682
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000683 Caution: Elements with no subelements will test as ``False``. This behavior
684 will change in future versions. Use specific ``len(elem)`` or ``elem is
685 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000686
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000687 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000688
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000689 if not element: # careful!
690 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000691
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000692 if element is None:
693 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000694
695
696.. _elementtree-elementtree-objects:
697
698ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200699^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000700
701
Georg Brandl7f01a132009-09-16 15:58:14 +0000702.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000703
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000704 ElementTree wrapper class. This class represents an entire element
705 hierarchy, and adds some extra support for serialization to and from
706 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000707
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000708 *element* is the root element. The tree is initialized with the contents
709 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000710
711
Benjamin Petersone41251e2008-04-25 01:59:09 +0000712 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000713
Benjamin Petersone41251e2008-04-25 01:59:09 +0000714 Replaces the root element for this tree. This discards the current
715 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000716 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000717
718
Eli Bendersky737b1732012-05-29 06:02:56 +0300719 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200721 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000722
723
Eli Bendersky737b1732012-05-29 06:02:56 +0300724 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000725
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200726 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000727
728
Eli Bendersky737b1732012-05-29 06:02:56 +0300729 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000730
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200731 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000732
733
Georg Brandl7f01a132009-09-16 15:58:14 +0000734 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000735
Georg Brandl67b21b72010-08-17 15:07:14 +0000736 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000737 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000738
739
Benjamin Petersone41251e2008-04-25 01:59:09 +0000740 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000741
Benjamin Petersone41251e2008-04-25 01:59:09 +0000742 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000743
744
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000745 .. method:: iter(tag=None)
746
747 Creates and returns a tree iterator for the root element. The iterator
748 loops over all elements in this tree, in section order. *tag* is the tag
749 to look for (default is to return all elements)
750
751
Eli Bendersky737b1732012-05-29 06:02:56 +0300752 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000753
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200754 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000755
Ezio Melottif8754a62010-03-21 07:16:43 +0000756 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000757
758
Georg Brandl7f01a132009-09-16 15:58:14 +0000759 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000760
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000761 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000762 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300763 If not given, the standard :class:`XMLParser` parser is used. Returns the
764 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000765
766
Eli Benderskyf96cf912012-07-15 06:19:44 +0300767 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200768 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800769 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000770
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000771 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300772 :term:`file object` opened for writing. *encoding* [1]_ is the output
773 encoding (default is US-ASCII).
774 *xml_declaration* controls if an XML declaration should be added to the
775 file. Use ``False`` for never, ``True`` for always, ``None``
776 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200777 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300778 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
779 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800780 The keyword-only *short_empty_elements* parameter controls the formatting
781 of elements that contain no content. If *True* (the default), they are
782 emitted as a single self-closed tag, otherwise they are emitted as a pair
783 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300784
785 The output is either a string (:class:`str`) or binary (:class:`bytes`).
786 This is controlled by the *encoding* argument. If *encoding* is
787 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
788 this may conflict with the type of *file* if it's an open
789 :term:`file object`; make sure you do not try to write a string to a
790 binary stream and vice versa.
791
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800792 .. versionadded:: 3.4
793 The *short_empty_elements* parameter.
794
Georg Brandl116aa622007-08-15 14:28:22 +0000795
Christian Heimesd8654cf2007-12-02 15:22:16 +0000796This is the XML file that is going to be manipulated::
797
798 <html>
799 <head>
800 <title>Example page</title>
801 </head>
802 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000803 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000804 or <a href="http://example.com/">example.com</a>.</p>
805 </body>
806 </html>
807
808Example of changing the attribute "target" of every link in first paragraph::
809
810 >>> from xml.etree.ElementTree import ElementTree
811 >>> tree = ElementTree()
812 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000813 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000814 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
815 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000816 <Element 'p' at 0xb77ec26c>
817 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000818 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000819 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000820 >>> for i in links: # Iterates through all found links
821 ... i.attrib["target"] = "blank"
822 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000823
824.. _elementtree-qname-objects:
825
826QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200827^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000828
829
Georg Brandl7f01a132009-09-16 15:58:14 +0000830.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000831
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000832 QName wrapper. This can be used to wrap a QName attribute value, in order
833 to get proper namespace handling on output. *text_or_uri* is a string
834 containing the QName value, in the form {uri}local, or, if the tag argument
835 is given, the URI part of a QName. If *tag* is given, the first argument is
836 interpreted as an URI, and this argument is interpreted as a local name.
837 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000838
839
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200840IncrementalParser Objects
841^^^^^^^^^^^^^^^^^^^^^^^^^
842
843
844.. class:: IncrementalParser(events=None, parser=None)
845
846 An incremental, event-driven parser suitable for non-blocking applications.
847 *events* is a list of events to report back. The supported events are the
848 strings ``"start"``, ``"end"``, ``"start-ns"`` and ``"end-ns"`` (the "ns"
849 events are used to get detailed namespace information). If *events* is
850 omitted, only ``"end"`` events are reported. *parser* is an optional
851 parser instance. If not given, the standard :class:`XMLParser` parser is
852 used.
853
854 .. method:: data_received(data)
855
856 Feed the given bytes data to the incremental parser.
857
858 .. method:: eof_received()
859
860 Signal the incremental parser that the data stream is terminated.
861
862 .. method:: events()
863
864 Iterate over the events which have been encountered in the data fed
865 to the parser. This method yields ``(event, elem)`` pairs, where
866 *event* is a string representing the type of event (e.g. ``"end"``)
867 and *elem* is the encountered :class:`Element` object.
868
869 .. note::
870
871 :class:`IncrementalParser` only guarantees that it has seen the ">"
872 character of a starting tag when it emits a "start" event, so the
873 attributes are defined, but the contents of the text and tail attributes
874 are undefined at that point. The same applies to the element children;
875 they may or may not be present.
876
877 If you need a fully populated element, look for "end" events instead.
878
879 .. versionadded:: 3.4
880
881
Georg Brandl116aa622007-08-15 14:28:22 +0000882.. _elementtree-treebuilder-objects:
883
884TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200885^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000886
887
Georg Brandl7f01a132009-09-16 15:58:14 +0000888.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000889
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000890 Generic element structure builder. This builder converts a sequence of
891 start, data, and end method calls to a well-formed element structure. You
892 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300893 or a parser for some other XML-like format. *element_factory*, when given,
894 must be a callable accepting two positional arguments: a tag and
895 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000896
Benjamin Petersone41251e2008-04-25 01:59:09 +0000897 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000898
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000899 Flushes the builder buffers, and returns the toplevel document
900 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000901
902
Benjamin Petersone41251e2008-04-25 01:59:09 +0000903 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000904
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000905 Adds text to the current element. *data* is a string. This should be
906 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +0000907
908
Benjamin Petersone41251e2008-04-25 01:59:09 +0000909 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +0000910
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000911 Closes the current element. *tag* is the element name. Returns the
912 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +0000913
914
Benjamin Petersone41251e2008-04-25 01:59:09 +0000915 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000916
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000917 Opens a new element. *tag* is the element name. *attrs* is a dictionary
918 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +0000919
920
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000921 In addition, a custom :class:`TreeBuilder` object can provide the
922 following method:
Georg Brandl116aa622007-08-15 14:28:22 +0000923
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000924 .. method:: doctype(name, pubid, system)
925
926 Handles a doctype declaration. *name* is the doctype name. *pubid* is
927 the public identifier. *system* is the system identifier. This method
928 does not exist on the default :class:`TreeBuilder` class.
929
Ezio Melottif8754a62010-03-21 07:16:43 +0000930 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000931
932
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000933.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000934
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000935XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200936^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000937
938
939.. class:: XMLParser(html=0, target=None, encoding=None)
940
941 :class:`Element` structure builder for XML source data, based on the expat
942 parser. *html* are predefined HTML entities. This flag is not supported by
943 the current implementation. *target* is the target object. If omitted, the
Eli Bendersky1bf23942012-06-01 07:15:00 +0300944 builder uses an instance of the standard :class:`TreeBuilder` class.
Eli Bendersky52467b12012-06-01 07:13:08 +0300945 *encoding* [1]_ is optional. If given, the value overrides the encoding
946 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +0000947
948
Benjamin Petersone41251e2008-04-25 01:59:09 +0000949 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000950
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000951 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl116aa622007-08-15 14:28:22 +0000952
953
Benjamin Petersone41251e2008-04-25 01:59:09 +0000954 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +0000955
Georg Brandl67b21b72010-08-17 15:07:14 +0000956 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000957 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
958 target.
Georg Brandl116aa622007-08-15 14:28:22 +0000959
960
Benjamin Petersone41251e2008-04-25 01:59:09 +0000961 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000962
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000963 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +0000964
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000965:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Christian Heimesd8654cf2007-12-02 15:22:16 +0000966for each opening tag, its :meth:`end` method for each closing tag,
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000967and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandl48310cd2009-01-03 21:18:54 +0000968calls *target*\'s method :meth:`close`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000969:class:`XMLParser` can be used not only for building a tree structure.
Christian Heimesd8654cf2007-12-02 15:22:16 +0000970This is an example of counting the maximum depth of an XML file::
971
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000972 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +0000973 >>> class MaxDepth: # The target object of the parser
974 ... maxDepth = 0
975 ... depth = 0
976 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +0000977 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +0000978 ... if self.depth > self.maxDepth:
979 ... self.maxDepth = self.depth
980 ... def end(self, tag): # Called for each closing tag.
981 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +0000982 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +0000983 ... pass # We do not need to do anything with data.
984 ... def close(self): # Called when all data has been parsed.
985 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +0000986 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +0000987 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000988 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +0000989 >>> exampleXml = """
990 ... <a>
991 ... <b>
992 ... </b>
993 ... <b>
994 ... <c>
995 ... <d>
996 ... </d>
997 ... </c>
998 ... </b>
999 ... </a>"""
1000 >>> parser.feed(exampleXml)
1001 >>> parser.close()
1002 4
Christian Heimesb186d002008-03-18 15:15:01 +00001003
Eli Bendersky5b77d812012-03-16 08:20:05 +02001004Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001005^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001006
1007.. class:: ParseError
1008
1009 XML parse error, raised by the various parsing methods in this module when
1010 parsing fails. The string representation of an instance of this exception
1011 will contain a user-friendly error message. In addition, it will have
1012 the following attributes available:
1013
1014 .. attribute:: code
1015
1016 A numeric error code from the expat parser. See the documentation of
1017 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1018
1019 .. attribute:: position
1020
1021 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001022
1023.. rubric:: Footnotes
1024
1025.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001026 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1027 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Benjamin Petersonad3d5c22009-02-26 03:38:59 +00001028 and http://www.iana.org/assignments/character-sets.