blob: 3263dc2a028e5ed03a98281a5545a4e7a3af3066 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
R David Murray410d3202014-01-04 23:52:50 -0500108
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700109.. note::
110
111 Not all elements of the XML input will end up as elements of the
112 parsed tree. Currently, this module skips over any XML comments,
113 processing instructions, and document type declarations in the
114 input. Nevertheless, trees built using this module's API rather
115 than parsing from XML text can have comments and processing
116 instructions in them; they will be included when generating XML
117 output. A document type declaration may be accessed by passing a
118 custom :class:`TreeBuilder` instance to the :class:`XMLParser`
119 constructor.
120
121
R David Murray410d3202014-01-04 23:52:50 -0500122.. _elementtree-pull-parsing:
123
Eli Bendersky2c68e302013-08-31 07:37:23 -0700124Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700125^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700126
R David Murray410d3202014-01-04 23:52:50 -0500127Most parsing functions provided by this module require the whole document
128to be read at once before returning any result. It is possible to use an
129:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700130calls methods on a callback target, which is too low-level and inconvenient for
131most needs. Sometimes what the user really wants is to be able to parse XML
132incrementally, without blocking operations, while enjoying the convenience of
133fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700134
Eli Benderskyb5869342013-08-30 05:51:20 -0700135The most powerful tool for doing this is :class:`XMLPullParser`. It does not
136require a blocking read to obtain the XML data, and is instead fed with data
137incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500138elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700139
Eli Bendersky2c68e302013-08-31 07:37:23 -0700140 >>> parser = ET.XMLPullParser(['start', 'end'])
141 >>> parser.feed('<mytag>sometext')
142 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700143 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700144 >>> parser.feed(' more text</mytag>')
145 >>> for event, elem in parser.read_events():
Eli Benderskyb5869342013-08-30 05:51:20 -0700146 ... print(event)
147 ... print(elem.tag, 'text=', elem.text)
148 ...
149 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700150
Eli Bendersky2c68e302013-08-31 07:37:23 -0700151The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700152where the XML data is being received from a socket or read incrementally from
153some storage device. In such cases, blocking reads are unacceptable.
154
Eli Benderskyb5869342013-08-30 05:51:20 -0700155Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
156simpler use-cases. If you don't mind your application blocking on reading XML
157data but would still like to have incremental parsing capabilities, take a look
158at :func:`iterparse`. It can be useful when you're reading a large XML document
159and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700160
Eli Benderskyc1d98692012-03-30 11:44:15 +0300161Finding interesting elements
162^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163
164:class:`Element` has some useful methods that help iterate recursively over all
165the sub-tree below it (its children, their children, and so on). For example,
166:meth:`Element.iter`::
167
168 >>> for neighbor in root.iter('neighbor'):
169 ... print(neighbor.attrib)
170 ...
171 {'name': 'Austria', 'direction': 'E'}
172 {'name': 'Switzerland', 'direction': 'W'}
173 {'name': 'Malaysia', 'direction': 'N'}
174 {'name': 'Costa Rica', 'direction': 'W'}
175 {'name': 'Colombia', 'direction': 'E'}
176
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300177:meth:`Element.findall` finds only elements with a tag which are direct
178children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200179with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300180content. :meth:`Element.get` accesses the element's attributes::
181
182 >>> for country in root.findall('country'):
183 ... rank = country.find('rank').text
184 ... name = country.get('name')
185 ... print(name, rank)
186 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300187 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300188 Singapore 4
189 Panama 68
190
Eli Benderskyc1d98692012-03-30 11:44:15 +0300191More sophisticated specification of which elements to look for is possible by
192using :ref:`XPath <elementtree-xpath>`.
193
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300194Modifying an XML File
195^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300196
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300197:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300198The :meth:`ElementTree.write` method serves this purpose.
199
200Once created, an :class:`Element` object may be manipulated by directly changing
201its fields (such as :attr:`Element.text`), adding and modifying attributes
202(:meth:`Element.set` method), as well as adding new children (for example
203with :meth:`Element.append`).
204
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300205Let's say we want to add one to each country's rank, and add an ``updated``
206attribute to the rank element::
207
208 >>> for rank in root.iter('rank'):
209 ... new_rank = int(rank.text) + 1
210 ... rank.text = str(new_rank)
211 ... rank.set('updated', 'yes')
212 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300213 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300214
215Our XML now looks like this:
216
217.. code-block:: xml
218
219 <?xml version="1.0"?>
220 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300221 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300222 <rank updated="yes">2</rank>
223 <year>2008</year>
224 <gdppc>141100</gdppc>
225 <neighbor name="Austria" direction="E"/>
226 <neighbor name="Switzerland" direction="W"/>
227 </country>
228 <country name="Singapore">
229 <rank updated="yes">5</rank>
230 <year>2011</year>
231 <gdppc>59900</gdppc>
232 <neighbor name="Malaysia" direction="N"/>
233 </country>
234 <country name="Panama">
235 <rank updated="yes">69</rank>
236 <year>2011</year>
237 <gdppc>13600</gdppc>
238 <neighbor name="Costa Rica" direction="W"/>
239 <neighbor name="Colombia" direction="E"/>
240 </country>
241 </data>
242
243We can remove elements using :meth:`Element.remove`. Let's say we want to
244remove all countries with a rank higher than 50::
245
246 >>> for country in root.findall('country'):
247 ... rank = int(country.find('rank').text)
248 ... if rank > 50:
249 ... root.remove(country)
250 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300251 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300252
253Our XML now looks like this:
254
255.. code-block:: xml
256
257 <?xml version="1.0"?>
258 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300259 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300260 <rank updated="yes">2</rank>
261 <year>2008</year>
262 <gdppc>141100</gdppc>
263 <neighbor name="Austria" direction="E"/>
264 <neighbor name="Switzerland" direction="W"/>
265 </country>
266 <country name="Singapore">
267 <rank updated="yes">5</rank>
268 <year>2011</year>
269 <gdppc>59900</gdppc>
270 <neighbor name="Malaysia" direction="N"/>
271 </country>
272 </data>
273
274Building XML documents
275^^^^^^^^^^^^^^^^^^^^^^
276
Eli Benderskyc1d98692012-03-30 11:44:15 +0300277The :func:`SubElement` function also provides a convenient way to create new
278sub-elements for a given element::
279
280 >>> a = ET.Element('a')
281 >>> b = ET.SubElement(a, 'b')
282 >>> c = ET.SubElement(a, 'c')
283 >>> d = ET.SubElement(c, 'd')
284 >>> ET.dump(a)
285 <a><b /><c><d /></c></a>
286
287Additional resources
288^^^^^^^^^^^^^^^^^^^^
289
290See http://effbot.org/zone/element-index.htm for tutorials and links to other
291docs.
292
293
294.. _elementtree-xpath:
295
296XPath support
297-------------
298
299This module provides limited support for
300`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
301tree. The goal is to support a small subset of the abbreviated syntax; a full
302XPath engine is outside the scope of the module.
303
304Example
305^^^^^^^
306
307Here's an example that demonstrates some of the XPath capabilities of the
308module. We'll be using the ``countrydata`` XML document from the
309:ref:`Parsing XML <elementtree-parsing-xml>` section::
310
311 import xml.etree.ElementTree as ET
312
313 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200314
315 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300316 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200317
318 # All 'neighbor' grand-children of 'country' children of the top-level
319 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300320 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200321
322 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300323 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200324
325 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300326 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200327
328 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300329 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200330
331Supported XPath syntax
332^^^^^^^^^^^^^^^^^^^^^^
333
Georg Brandl44ea77b2013-03-28 13:28:44 +0100334.. tabularcolumns:: |l|L|
335
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200336+-----------------------+------------------------------------------------------+
337| Syntax | Meaning |
338+=======================+======================================================+
339| ``tag`` | Selects all child elements with the given tag. |
340| | For example, ``spam`` selects all child elements |
Raymond Hettinger1e1e6012014-03-29 11:50:08 -0700341| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200342| | grandchildren named ``egg`` in all children named |
343| | ``spam``. |
344+-----------------------+------------------------------------------------------+
345| ``*`` | Selects all child elements. For example, ``*/egg`` |
346| | selects all grandchildren named ``egg``. |
347+-----------------------+------------------------------------------------------+
348| ``.`` | Selects the current node. This is mostly useful |
349| | at the beginning of the path, to indicate that it's |
350| | a relative path. |
351+-----------------------+------------------------------------------------------+
352| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200353| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200354| | all ``egg`` elements in the entire tree. |
355+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700356| ``..`` | Selects the parent element. Returns ``None`` if the |
357| | path attempts to reach the ancestors of the start |
358| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200359+-----------------------+------------------------------------------------------+
360| ``[@attrib]`` | Selects all elements that have the given attribute. |
361+-----------------------+------------------------------------------------------+
362| ``[@attrib='value']`` | Selects all elements for which the given attribute |
363| | has the given value. The value cannot contain |
364| | quotes. |
365+-----------------------+------------------------------------------------------+
366| ``[tag]`` | Selects all elements that have a child named |
367| | ``tag``. Only immediate children are supported. |
368+-----------------------+------------------------------------------------------+
369| ``[position]`` | Selects all elements that are located at the given |
370| | position. The position can be either an integer |
371| | (1 is the first position), the expression ``last()`` |
372| | (for the last position), or a position relative to |
373| | the last position (e.g. ``last()-1``). |
374+-----------------------+------------------------------------------------------+
375
376Predicates (expressions within square brackets) must be preceded by a tag
377name, an asterisk, or another predicate. ``position`` predicates must be
378preceded by a tag name.
379
380Reference
381---------
382
Georg Brandl116aa622007-08-15 14:28:22 +0000383.. _elementtree-functions:
384
385Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200386^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000387
388
Georg Brandl7f01a132009-09-16 15:58:14 +0000389.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000390
Georg Brandlf6945182008-02-01 11:56:49 +0000391 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000392 that will be serialized as an XML comment by the standard serializer. The
393 comment string can be either a bytestring or a Unicode string. *text* is a
394 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000395 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000396
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700397 Note that :class:`XMLParser` skips over comments in the input
398 instead of creating comment objects for them. An :class:`ElementTree` will
399 only contain comment nodes if they have been inserted into to
400 the tree using one of the :class:`Element` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000401
402.. function:: dump(elem)
403
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000404 Writes an element tree or element structure to sys.stdout. This function
405 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000406
407 The exact output format is implementation dependent. In this version, it's
408 written as an ordinary XML file.
409
410 *elem* is an element tree or an individual element.
411
412
Georg Brandl116aa622007-08-15 14:28:22 +0000413.. function:: fromstring(text)
414
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000415 Parses an XML section from a string constant. Same as :func:`XML`. *text*
416 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000417
418
419.. function:: fromstringlist(sequence, parser=None)
420
421 Parses an XML document from a sequence of string fragments. *sequence* is a
422 list or other sequence containing XML data fragments. *parser* is an
423 optional parser instance. If not given, the standard :class:`XMLParser`
424 parser is used. Returns an :class:`Element` instance.
425
Ezio Melottif8754a62010-03-21 07:16:43 +0000426 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000427
428
429.. function:: iselement(element)
430
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000431 Checks if an object appears to be a valid element object. *element* is an
432 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000433
434
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000435.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000436
437 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200438 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700439 containing XML data. *events* is a sequence of events to report back. The
Eli Benderskyb5869342013-08-30 05:51:20 -0700440 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
441 ``"end-ns"`` (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200442 information). If *events* is omitted, only ``"end"`` events are reported.
443 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700444 :class:`XMLParser` parser is used. *parser* must be a subclass of
445 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
446 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000447
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700448 Note that while :func:`iterparse` builds the tree incrementally, it issues
449 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700450 for applications where blocking reads can't be made. For fully non-blocking
451 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700452
Benjamin Peterson75edad02009-01-01 15:05:06 +0000453 .. note::
454
Eli Benderskyb5869342013-08-30 05:51:20 -0700455 :func:`iterparse` only guarantees that it has seen the ">" character of a
456 starting tag when it emits a "start" event, so the attributes are defined,
457 but the contents of the text and tail attributes are undefined at that
458 point. The same applies to the element children; they may or may not be
459 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000460
461 If you need a fully populated element, look for "end" events instead.
462
Eli Benderskyb5869342013-08-30 05:51:20 -0700463 .. deprecated:: 3.4
464 The *parser* argument.
465
Georg Brandl7f01a132009-09-16 15:58:14 +0000466.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000467
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000468 Parses an XML section into an element tree. *source* is a filename or file
469 object containing XML data. *parser* is an optional parser instance. If
470 not given, the standard :class:`XMLParser` parser is used. Returns an
471 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000472
473
Georg Brandl7f01a132009-09-16 15:58:14 +0000474.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000475
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000476 PI element factory. This factory function creates a special element that
477 will be serialized as an XML processing instruction. *target* is a string
478 containing the PI target. *text* is a string containing the PI contents, if
479 given. Returns an element instance, representing a processing instruction.
480
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700481 Note that :class:`XMLParser` skips over processing instructions
482 in the input instead of creating comment objects for them. An
483 :class:`ElementTree` will only contain processing instruction nodes if
484 they have been inserted into to the tree using one of the
485 :class:`Element` methods.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000486
487.. function:: register_namespace(prefix, uri)
488
489 Registers a namespace prefix. The registry is global, and any existing
490 mapping for either the given prefix or the namespace URI will be removed.
491 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
492 attributes in this namespace will be serialized with the given prefix, if at
493 all possible.
494
Ezio Melottif8754a62010-03-21 07:16:43 +0000495 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000496
497
Georg Brandl7f01a132009-09-16 15:58:14 +0000498.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000499
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000500 Subelement factory. This function creates an element instance, and appends
501 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000502
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000503 The element name, attribute names, and attribute values can be either
504 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
505 the subelement name. *attrib* is an optional dictionary, containing element
506 attributes. *extra* contains additional attributes, given as keyword
507 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000508
509
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200510.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800511 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000512
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000513 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000514 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000515 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700516 generate a Unicode string (otherwise, a bytestring is generated). *method*
517 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800518 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700519 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000520
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800521 .. versionadded:: 3.4
522 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000523
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800524
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200525.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800526 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000527
528 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000529 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000530 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700531 generate a Unicode string (otherwise, a bytestring is generated). *method*
532 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800533 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700534 Returns a list of (optionally) encoded strings containing the XML data.
535 It does not guarantee any specific sequence, except that
Serhiy Storchaka5e028ae2014-02-06 21:10:41 +0200536 ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000537
Ezio Melottif8754a62010-03-21 07:16:43 +0000538 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000539
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800540 .. versionadded:: 3.4
541 The *short_empty_elements* parameter.
542
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000543
544.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000545
546 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000547 embed "XML literals" in Python code. *text* is a string containing XML
548 data. *parser* is an optional parser instance. If not given, the standard
549 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000550
551
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000552.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000553
554 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000555 which maps from element id:s to elements. *text* is a string containing XML
556 data. *parser* is an optional parser instance. If not given, the standard
557 :class:`XMLParser` parser is used. Returns a tuple containing an
558 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000561.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000562
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000563Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200564^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000565
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000566.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000567
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000568 Element class. This class defines the Element interface, and provides a
569 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000570
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000571 The element name, attribute names, and attribute values can be either
572 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
573 an optional dictionary, containing element attributes. *extra* contains
574 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000575
576
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000577 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000578
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000579 A string identifying what kind of data this element represents (the
580 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000581
582
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000583 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000584
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000585 The *text* attribute can be used to hold additional data associated with
586 the element. As the name implies this attribute is usually a string but
587 may be any application-specific object. If the element is created from
588 an XML file the attribute will contain any text found between the element
589 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000590
591
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000592 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000593
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000594 The *tail* attribute can be used to hold additional data associated with
595 the element. This attribute is usually a string but may be any
596 application-specific object. If the element is created from an XML file
597 the attribute will contain any text found after the element's end tag and
598 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000599
Georg Brandl116aa622007-08-15 14:28:22 +0000600
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000601 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000602
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000603 A dictionary containing the element's attributes. Note that while the
604 *attrib* value is always a real mutable Python dictionary, an ElementTree
605 implementation may choose to use another internal representation, and
606 create the dictionary only if someone asks for it. To take advantage of
607 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000608
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000609 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000610
611
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000612 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000613
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000614 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700615 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000616
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000617
618 .. method:: get(key, default=None)
619
620 Gets the element attribute named *key*.
621
622 Returns the attribute value, or *default* if the attribute was not found.
623
624
625 .. method:: items()
626
627 Returns the element attributes as a sequence of (name, value) pairs. The
628 attributes are returned in an arbitrary order.
629
630
631 .. method:: keys()
632
633 Returns the elements attribute names as a list. The names are returned
634 in an arbitrary order.
635
636
637 .. method:: set(key, value)
638
639 Set the attribute *key* on the element to *value*.
640
641 The following methods work on the element's children (subelements).
642
643
644 .. method:: append(subelement)
645
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200646 Adds the element *subelement* to the end of this element's internal list
647 of subelements. Raises :exc:`TypeError` if *subelement* is not an
648 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000649
650
651 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000652
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000653 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200654 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Ezio Melottif8754a62010-03-21 07:16:43 +0000656 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000657
Georg Brandl116aa622007-08-15 14:28:22 +0000658
Eli Bendersky737b1732012-05-29 06:02:56 +0300659 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000660
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000661 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200662 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300663 or ``None``. *namespaces* is an optional mapping from namespace prefix
664 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000665
Georg Brandl116aa622007-08-15 14:28:22 +0000666
Eli Bendersky737b1732012-05-29 06:02:56 +0300667 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200669 Finds all matching subelements, by tag name or
670 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300671 elements in document order. *namespaces* is an optional mapping from
672 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000673
Georg Brandl116aa622007-08-15 14:28:22 +0000674
Eli Bendersky737b1732012-05-29 06:02:56 +0300675 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000676
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000677 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200678 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
679 of the first matching element, or *default* if no element was found.
680 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300681 is returned. *namespaces* is an optional mapping from namespace prefix
682 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000683
Georg Brandl116aa622007-08-15 14:28:22 +0000684
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000685 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000686
Georg Brandl67b21b72010-08-17 15:07:14 +0000687 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000688 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000689
Georg Brandl116aa622007-08-15 14:28:22 +0000690
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000691 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000692
Georg Brandl67b21b72010-08-17 15:07:14 +0000693 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000694 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000695
Georg Brandl116aa622007-08-15 14:28:22 +0000696
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200697 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000698
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200699 Inserts *subelement* at the given position in this element. Raises
700 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000701
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000703 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000704
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000705 Creates a tree :term:`iterator` with the current element as the root.
706 The iterator iterates over this element and all elements below it, in
707 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
708 elements whose tag equals *tag* are returned from the iterator. If the
709 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000710
Ezio Melotti138fc892011-10-10 00:02:03 +0300711 .. versionadded:: 3.2
712
Georg Brandl116aa622007-08-15 14:28:22 +0000713
Eli Bendersky737b1732012-05-29 06:02:56 +0300714 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000715
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200716 Finds all matching subelements, by tag name or
717 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300718 matching elements in document order. *namespaces* is an optional mapping
719 from namespace prefix to full name.
720
Georg Brandl116aa622007-08-15 14:28:22 +0000721
Ezio Melottif8754a62010-03-21 07:16:43 +0000722 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000723
Georg Brandl116aa622007-08-15 14:28:22 +0000724
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000725 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000726
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000727 Creates a text iterator. The iterator loops over this element and all
728 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000729
Ezio Melottif8754a62010-03-21 07:16:43 +0000730 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000731
732
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000733 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000734
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000735 Creates a new element object of the same type as this element. Do not
736 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000737
738
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000739 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000740
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000741 Removes *subelement* from the element. Unlike the find\* methods this
742 method compares elements based on the instance identity, not on tag value
743 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000744
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000745 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300746 for working with subelements: :meth:`~object.__delitem__`,
747 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
748 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000749
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000750 Caution: Elements with no subelements will test as ``False``. This behavior
751 will change in future versions. Use specific ``len(elem)`` or ``elem is
752 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000753
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000754 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000755
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000756 if not element: # careful!
757 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000758
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000759 if element is None:
760 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000761
762
763.. _elementtree-elementtree-objects:
764
765ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200766^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000767
768
Georg Brandl7f01a132009-09-16 15:58:14 +0000769.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000770
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000771 ElementTree wrapper class. This class represents an entire element
772 hierarchy, and adds some extra support for serialization to and from
773 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000774
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000775 *element* is the root element. The tree is initialized with the contents
776 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000777
778
Benjamin Petersone41251e2008-04-25 01:59:09 +0000779 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000780
Benjamin Petersone41251e2008-04-25 01:59:09 +0000781 Replaces the root element for this tree. This discards the current
782 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000783 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000784
785
Eli Bendersky737b1732012-05-29 06:02:56 +0300786 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000787
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200788 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000789
790
Eli Bendersky737b1732012-05-29 06:02:56 +0300791 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000792
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200793 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000794
795
Eli Bendersky737b1732012-05-29 06:02:56 +0300796 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000797
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200798 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000799
800
Georg Brandl7f01a132009-09-16 15:58:14 +0000801 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000802
Georg Brandl67b21b72010-08-17 15:07:14 +0000803 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000804 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000805
806
Benjamin Petersone41251e2008-04-25 01:59:09 +0000807 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000808
Benjamin Petersone41251e2008-04-25 01:59:09 +0000809 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000810
811
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000812 .. method:: iter(tag=None)
813
814 Creates and returns a tree iterator for the root element. The iterator
815 loops over all elements in this tree, in section order. *tag* is the tag
816 to look for (default is to return all elements)
817
818
Eli Bendersky737b1732012-05-29 06:02:56 +0300819 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000820
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200821 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000822
Ezio Melottif8754a62010-03-21 07:16:43 +0000823 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000824
825
Georg Brandl7f01a132009-09-16 15:58:14 +0000826 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000827
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000828 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000829 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300830 If not given, the standard :class:`XMLParser` parser is used. Returns the
831 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000832
833
Eli Benderskyf96cf912012-07-15 06:19:44 +0300834 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200835 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800836 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000837
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000838 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300839 :term:`file object` opened for writing. *encoding* [1]_ is the output
840 encoding (default is US-ASCII).
841 *xml_declaration* controls if an XML declaration should be added to the
842 file. Use ``False`` for never, ``True`` for always, ``None``
843 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200844 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300845 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
846 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800847 The keyword-only *short_empty_elements* parameter controls the formatting
848 of elements that contain no content. If *True* (the default), they are
849 emitted as a single self-closed tag, otherwise they are emitted as a pair
850 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300851
852 The output is either a string (:class:`str`) or binary (:class:`bytes`).
853 This is controlled by the *encoding* argument. If *encoding* is
854 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
855 this may conflict with the type of *file* if it's an open
856 :term:`file object`; make sure you do not try to write a string to a
857 binary stream and vice versa.
858
R David Murray575fb312013-12-25 23:21:03 -0500859 .. versionadded:: 3.4
860 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800861
Georg Brandl116aa622007-08-15 14:28:22 +0000862
Christian Heimesd8654cf2007-12-02 15:22:16 +0000863This is the XML file that is going to be manipulated::
864
865 <html>
866 <head>
867 <title>Example page</title>
868 </head>
869 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000870 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000871 or <a href="http://example.com/">example.com</a>.</p>
872 </body>
873 </html>
874
875Example of changing the attribute "target" of every link in first paragraph::
876
877 >>> from xml.etree.ElementTree import ElementTree
878 >>> tree = ElementTree()
879 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000880 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000881 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
882 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000883 <Element 'p' at 0xb77ec26c>
884 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000885 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000886 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000887 >>> for i in links: # Iterates through all found links
888 ... i.attrib["target"] = "blank"
889 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000890
891.. _elementtree-qname-objects:
892
893QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200894^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000895
896
Georg Brandl7f01a132009-09-16 15:58:14 +0000897.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000898
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000899 QName wrapper. This can be used to wrap a QName attribute value, in order
900 to get proper namespace handling on output. *text_or_uri* is a string
901 containing the QName value, in the form {uri}local, or, if the tag argument
902 is given, the URI part of a QName. If *tag* is given, the first argument is
903 interpreted as an URI, and this argument is interpreted as a local name.
904 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000905
906
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200907
Georg Brandl116aa622007-08-15 14:28:22 +0000908.. _elementtree-treebuilder-objects:
909
910TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200911^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000912
913
Georg Brandl7f01a132009-09-16 15:58:14 +0000914.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000915
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000916 Generic element structure builder. This builder converts a sequence of
917 start, data, and end method calls to a well-formed element structure. You
918 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300919 or a parser for some other XML-like format. *element_factory*, when given,
920 must be a callable accepting two positional arguments: a tag and
921 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000922
Benjamin Petersone41251e2008-04-25 01:59:09 +0000923 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000924
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000925 Flushes the builder buffers, and returns the toplevel document
926 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000927
928
Benjamin Petersone41251e2008-04-25 01:59:09 +0000929 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000930
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000931 Adds text to the current element. *data* is a string. This should be
932 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +0000933
934
Benjamin Petersone41251e2008-04-25 01:59:09 +0000935 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +0000936
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000937 Closes the current element. *tag* is the element name. Returns the
938 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +0000939
940
Benjamin Petersone41251e2008-04-25 01:59:09 +0000941 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000942
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000943 Opens a new element. *tag* is the element name. *attrs* is a dictionary
944 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +0000945
946
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000947 In addition, a custom :class:`TreeBuilder` object can provide the
948 following method:
Georg Brandl116aa622007-08-15 14:28:22 +0000949
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000950 .. method:: doctype(name, pubid, system)
951
952 Handles a doctype declaration. *name* is the doctype name. *pubid* is
953 the public identifier. *system* is the system identifier. This method
954 does not exist on the default :class:`TreeBuilder` class.
955
Ezio Melottif8754a62010-03-21 07:16:43 +0000956 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000957
958
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000959.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000960
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000961XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200962^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000963
964
965.. class:: XMLParser(html=0, target=None, encoding=None)
966
Eli Benderskyb5869342013-08-30 05:51:20 -0700967 This class is the low-level building block of the module. It uses
968 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
969 be fed XML data incrementall with the :meth:`feed` method, and parsing events
970 are translated to a push API - by invoking callbacks on the *target* object.
971 If *target* is omitted, the standard :class:`TreeBuilder` is used. The
972 *html* argument was historically used for backwards compatibility and is now
973 deprecated. If *encoding* [1]_ is given, the value overrides the encoding
Eli Bendersky52467b12012-06-01 07:13:08 +0300974 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +0000975
Eli Benderskyb5869342013-08-30 05:51:20 -0700976 .. deprecated:: 3.4
Larry Hastings3732ed22014-03-15 21:13:56 -0700977 The *html* argument. The remaining arguments should be passed via
978 keywword to prepare for the removal of the *html* argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000979
Benjamin Petersone41251e2008-04-25 01:59:09 +0000980 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000981
Eli Benderskybfd78372013-08-24 15:11:44 -0700982 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -0700983 ``close()`` method of the *target* passed during construction; by default,
984 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +0000985
986
Benjamin Petersone41251e2008-04-25 01:59:09 +0000987 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +0000988
Georg Brandl67b21b72010-08-17 15:07:14 +0000989 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000990 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
991 target.
Georg Brandl116aa622007-08-15 14:28:22 +0000992
993
Benjamin Petersone41251e2008-04-25 01:59:09 +0000994 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000995
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000996 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +0000997
Eli Benderskyb5869342013-08-30 05:51:20 -0700998 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
999 for each opening tag, its ``end(tag)`` method for each closing tag, and data
1000 is processed by method ``data(data)``. :meth:`XMLParser.close` calls
1001 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1002 building a tree structure. This is an example of counting the maximum depth
1003 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +00001004
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001005 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001006 >>> class MaxDepth: # The target object of the parser
1007 ... maxDepth = 0
1008 ... depth = 0
1009 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001010 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001011 ... if self.depth > self.maxDepth:
1012 ... self.maxDepth = self.depth
1013 ... def end(self, tag): # Called for each closing tag.
1014 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001015 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001016 ... pass # We do not need to do anything with data.
1017 ... def close(self): # Called when all data has been parsed.
1018 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001019 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001020 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001021 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001022 >>> exampleXml = """
1023 ... <a>
1024 ... <b>
1025 ... </b>
1026 ... <b>
1027 ... <c>
1028 ... <d>
1029 ... </d>
1030 ... </c>
1031 ... </b>
1032 ... </a>"""
1033 >>> parser.feed(exampleXml)
1034 >>> parser.close()
1035 4
Christian Heimesb186d002008-03-18 15:15:01 +00001036
Eli Benderskyb5869342013-08-30 05:51:20 -07001037
1038.. _elementtree-xmlpullparser-objects:
1039
1040XMLPullParser Objects
1041^^^^^^^^^^^^^^^^^^^^^
1042
1043.. class:: XMLPullParser(events=None)
1044
Eli Bendersky2c68e302013-08-31 07:37:23 -07001045 A pull parser suitable for non-blocking applications. Its input-side API is
1046 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1047 callback target, :class:`XMLPullParser` collects an internal list of parsing
1048 events and lets the user read from it. *events* is a sequence of events to
1049 report back. The supported events are the strings ``"start"``, ``"end"``,
1050 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
1051 namespace information). If *events* is omitted, only ``"end"`` events are
1052 reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001053
1054 .. method:: feed(data)
1055
1056 Feed the given bytes data to the parser.
1057
1058 .. method:: close()
1059
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001060 Signal the parser that the data stream is terminated. Unlike
1061 :meth:`XMLParser.close`, this method always returns :const:`None`.
1062 Any events not yet retrieved when the parser is closed can still be
1063 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001064
1065 .. method:: read_events()
1066
R David Murray410d3202014-01-04 23:52:50 -05001067 Return an iterator over the events which have been encountered in the
1068 data fed to the
1069 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001070 string representing the type of event (e.g. ``"end"``) and *elem* is the
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001071 encountered :class:`Element` object.
1072
1073 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001074 yielded again. Events are consumed from the internal queue only when
1075 they are retrieved from the iterator, so multiple readers iterating in
1076 parallel over iterators obtained from :meth:`read_events` will have
1077 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001078
1079 .. note::
1080
1081 :class:`XMLPullParser` only guarantees that it has seen the ">"
1082 character of a starting tag when it emits a "start" event, so the
1083 attributes are defined, but the contents of the text and tail attributes
1084 are undefined at that point. The same applies to the element children;
1085 they may or may not be present.
1086
1087 If you need a fully populated element, look for "end" events instead.
1088
1089 .. versionadded:: 3.4
1090
Eli Bendersky5b77d812012-03-16 08:20:05 +02001091Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001092^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001093
1094.. class:: ParseError
1095
1096 XML parse error, raised by the various parsing methods in this module when
1097 parsing fails. The string representation of an instance of this exception
1098 will contain a user-friendly error message. In addition, it will have
1099 the following attributes available:
1100
1101 .. attribute:: code
1102
1103 A numeric error code from the expat parser. See the documentation of
1104 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1105
1106 .. attribute:: position
1107
1108 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001109
1110.. rubric:: Footnotes
1111
1112.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001113 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1114 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandlb7354a62014-10-29 10:57:37 +01001115 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.