blob: 44ac52aa83079aa575cbd36200bec6b1231f62e9 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04009**Source code:** :source:`Lib/xml/etree/ElementTree.py`
10
11--------------
12
Eli Benderskyc1d98692012-03-30 11:44:15 +030013The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
14for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000015
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010016.. versionchanged:: 3.3
17 This module will use a fast implementation whenever available.
18 The :mod:`xml.etree.cElementTree` module is deprecated.
19
Christian Heimes7380a672013-03-26 17:35:55 +010020
21.. warning::
22
23 The :mod:`xml.etree.ElementTree` module is not secure against
24 maliciously constructed data. If you need to parse untrusted or
25 unauthenticated data see :ref:`xml-vulnerabilities`.
26
Eli Benderskyc1d98692012-03-30 11:44:15 +030027Tutorial
28--------
Georg Brandl116aa622007-08-15 14:28:22 +000029
Eli Benderskyc1d98692012-03-30 11:44:15 +030030This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
31short). The goal is to demonstrate some of the building blocks and basic
32concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020033
Eli Benderskyc1d98692012-03-30 11:44:15 +030034XML tree and elements
35^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020036
Eli Benderskyc1d98692012-03-30 11:44:15 +030037XML is an inherently hierarchical data format, and the most natural way to
38represent it is with a tree. ``ET`` has two classes for this purpose -
39:class:`ElementTree` represents the whole XML document as a tree, and
40:class:`Element` represents a single node in this tree. Interactions with
41the whole document (reading and writing to/from files) are usually done
42on the :class:`ElementTree` level. Interactions with a single XML element
43and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Benderskyc1d98692012-03-30 11:44:15 +030045.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Benderskyc1d98692012-03-30 11:44:15 +030047Parsing XML
48^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020049
Eli Bendersky0f4e9342012-08-14 07:19:33 +030050We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020051
Eli Bendersky0f4e9342012-08-14 07:19:33 +030052.. code-block:: xml
53
54 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020055 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030056 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020057 <rank>1</rank>
58 <year>2008</year>
59 <gdppc>141100</gdppc>
60 <neighbor name="Austria" direction="E"/>
61 <neighbor name="Switzerland" direction="W"/>
62 </country>
63 <country name="Singapore">
64 <rank>4</rank>
65 <year>2011</year>
66 <gdppc>59900</gdppc>
67 <neighbor name="Malaysia" direction="N"/>
68 </country>
69 <country name="Panama">
70 <rank>68</rank>
71 <year>2011</year>
72 <gdppc>13600</gdppc>
73 <neighbor name="Costa Rica" direction="W"/>
74 <neighbor name="Colombia" direction="E"/>
75 </country>
76 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020077
Eli Bendersky0f4e9342012-08-14 07:19:33 +030078We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030079
80 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030081 tree = ET.parse('country_data.xml')
82 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030083
Eli Bendersky0f4e9342012-08-14 07:19:33 +030084Or directly from a string::
85
86 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030087
88:func:`fromstring` parses XML from a string directly into an :class:`Element`,
89which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030090create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030091
92As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
93
94 >>> root.tag
95 'data'
96 >>> root.attrib
97 {}
98
99It also has children nodes over which we can iterate::
100
101 >>> for child in root:
Serhiy Storchakadba90392016-05-10 12:01:23 +0300102 ... print(child.tag, child.attrib)
Eli Benderskyc1d98692012-03-30 11:44:15 +0300103 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300104 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300105 country {'name': 'Singapore'}
106 country {'name': 'Panama'}
107
108Children are nested, and we can access specific child nodes by index::
109
110 >>> root[0][1].text
111 '2008'
112
R David Murray410d3202014-01-04 23:52:50 -0500113
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700114.. note::
115
116 Not all elements of the XML input will end up as elements of the
117 parsed tree. Currently, this module skips over any XML comments,
118 processing instructions, and document type declarations in the
119 input. Nevertheless, trees built using this module's API rather
120 than parsing from XML text can have comments and processing
121 instructions in them; they will be included when generating XML
122 output. A document type declaration may be accessed by passing a
123 custom :class:`TreeBuilder` instance to the :class:`XMLParser`
124 constructor.
125
126
R David Murray410d3202014-01-04 23:52:50 -0500127.. _elementtree-pull-parsing:
128
Eli Bendersky2c68e302013-08-31 07:37:23 -0700129Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700130^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700131
R David Murray410d3202014-01-04 23:52:50 -0500132Most parsing functions provided by this module require the whole document
133to be read at once before returning any result. It is possible to use an
134:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700135calls methods on a callback target, which is too low-level and inconvenient for
136most needs. Sometimes what the user really wants is to be able to parse XML
137incrementally, without blocking operations, while enjoying the convenience of
138fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700139
Eli Benderskyb5869342013-08-30 05:51:20 -0700140The most powerful tool for doing this is :class:`XMLPullParser`. It does not
141require a blocking read to obtain the XML data, and is instead fed with data
142incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500143elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700144
Eli Bendersky2c68e302013-08-31 07:37:23 -0700145 >>> parser = ET.XMLPullParser(['start', 'end'])
146 >>> parser.feed('<mytag>sometext')
147 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700148 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700149 >>> parser.feed(' more text</mytag>')
150 >>> for event, elem in parser.read_events():
Serhiy Storchakadba90392016-05-10 12:01:23 +0300151 ... print(event)
152 ... print(elem.tag, 'text=', elem.text)
Eli Benderskyb5869342013-08-30 05:51:20 -0700153 ...
154 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700155
Eli Bendersky2c68e302013-08-31 07:37:23 -0700156The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700157where the XML data is being received from a socket or read incrementally from
158some storage device. In such cases, blocking reads are unacceptable.
159
Eli Benderskyb5869342013-08-30 05:51:20 -0700160Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
161simpler use-cases. If you don't mind your application blocking on reading XML
162data but would still like to have incremental parsing capabilities, take a look
163at :func:`iterparse`. It can be useful when you're reading a large XML document
164and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700165
Eli Benderskyc1d98692012-03-30 11:44:15 +0300166Finding interesting elements
167^^^^^^^^^^^^^^^^^^^^^^^^^^^^
168
169:class:`Element` has some useful methods that help iterate recursively over all
170the sub-tree below it (its children, their children, and so on). For example,
171:meth:`Element.iter`::
172
173 >>> for neighbor in root.iter('neighbor'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300174 ... print(neighbor.attrib)
Eli Benderskyc1d98692012-03-30 11:44:15 +0300175 ...
176 {'name': 'Austria', 'direction': 'E'}
177 {'name': 'Switzerland', 'direction': 'W'}
178 {'name': 'Malaysia', 'direction': 'N'}
179 {'name': 'Costa Rica', 'direction': 'W'}
180 {'name': 'Colombia', 'direction': 'E'}
181
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300182:meth:`Element.findall` finds only elements with a tag which are direct
183children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200184with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300185content. :meth:`Element.get` accesses the element's attributes::
186
187 >>> for country in root.findall('country'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300188 ... rank = country.find('rank').text
189 ... name = country.get('name')
190 ... print(name, rank)
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300191 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300192 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300193 Singapore 4
194 Panama 68
195
Eli Benderskyc1d98692012-03-30 11:44:15 +0300196More sophisticated specification of which elements to look for is possible by
197using :ref:`XPath <elementtree-xpath>`.
198
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300199Modifying an XML File
200^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300201
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300202:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300203The :meth:`ElementTree.write` method serves this purpose.
204
205Once created, an :class:`Element` object may be manipulated by directly changing
206its fields (such as :attr:`Element.text`), adding and modifying attributes
207(:meth:`Element.set` method), as well as adding new children (for example
208with :meth:`Element.append`).
209
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300210Let's say we want to add one to each country's rank, and add an ``updated``
211attribute to the rank element::
212
213 >>> for rank in root.iter('rank'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300214 ... new_rank = int(rank.text) + 1
215 ... rank.text = str(new_rank)
216 ... rank.set('updated', 'yes')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300217 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300218 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300219
220Our XML now looks like this:
221
222.. code-block:: xml
223
224 <?xml version="1.0"?>
225 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300226 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300227 <rank updated="yes">2</rank>
228 <year>2008</year>
229 <gdppc>141100</gdppc>
230 <neighbor name="Austria" direction="E"/>
231 <neighbor name="Switzerland" direction="W"/>
232 </country>
233 <country name="Singapore">
234 <rank updated="yes">5</rank>
235 <year>2011</year>
236 <gdppc>59900</gdppc>
237 <neighbor name="Malaysia" direction="N"/>
238 </country>
239 <country name="Panama">
240 <rank updated="yes">69</rank>
241 <year>2011</year>
242 <gdppc>13600</gdppc>
243 <neighbor name="Costa Rica" direction="W"/>
244 <neighbor name="Colombia" direction="E"/>
245 </country>
246 </data>
247
248We can remove elements using :meth:`Element.remove`. Let's say we want to
249remove all countries with a rank higher than 50::
250
251 >>> for country in root.findall('country'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300252 ... rank = int(country.find('rank').text)
253 ... if rank > 50:
254 ... root.remove(country)
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300255 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300256 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300257
258Our XML now looks like this:
259
260.. code-block:: xml
261
262 <?xml version="1.0"?>
263 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300264 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300265 <rank updated="yes">2</rank>
266 <year>2008</year>
267 <gdppc>141100</gdppc>
268 <neighbor name="Austria" direction="E"/>
269 <neighbor name="Switzerland" direction="W"/>
270 </country>
271 <country name="Singapore">
272 <rank updated="yes">5</rank>
273 <year>2011</year>
274 <gdppc>59900</gdppc>
275 <neighbor name="Malaysia" direction="N"/>
276 </country>
277 </data>
278
279Building XML documents
280^^^^^^^^^^^^^^^^^^^^^^
281
Eli Benderskyc1d98692012-03-30 11:44:15 +0300282The :func:`SubElement` function also provides a convenient way to create new
283sub-elements for a given element::
284
285 >>> a = ET.Element('a')
286 >>> b = ET.SubElement(a, 'b')
287 >>> c = ET.SubElement(a, 'c')
288 >>> d = ET.SubElement(c, 'd')
289 >>> ET.dump(a)
290 <a><b /><c><d /></c></a>
291
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700292Parsing XML with Namespaces
293^^^^^^^^^^^^^^^^^^^^^^^^^^^
294
295If the XML input has `namespaces
296<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
297with prefixes in the form ``prefix:sometag`` get expanded to
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700298``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
299Also, if there is a `default namespace
sblondon8d1f2f42018-02-10 23:39:43 +0100300<https://www.w3.org/TR/xml-names/#defaulting>`__,
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700301that full URI gets prepended to all of the non-prefixed tags.
302
303Here is an XML example that incorporates two namespaces, one with the
304prefix "fictional" and the other serving as the default namespace:
305
306.. code-block:: xml
307
308 <?xml version="1.0"?>
309 <actors xmlns:fictional="http://characters.example.com"
310 xmlns="http://people.example.com">
311 <actor>
312 <name>John Cleese</name>
313 <fictional:character>Lancelot</fictional:character>
314 <fictional:character>Archie Leach</fictional:character>
315 </actor>
316 <actor>
317 <name>Eric Idle</name>
318 <fictional:character>Sir Robin</fictional:character>
319 <fictional:character>Gunther</fictional:character>
320 <fictional:character>Commander Clement</fictional:character>
321 </actor>
322 </actors>
323
324One way to search and explore this XML example is to manually add the
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700325URI to every tag or attribute in the xpath of a
326:meth:`~Element.find` or :meth:`~Element.findall`::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700327
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700328 root = fromstring(xml_text)
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700329 for actor in root.findall('{http://people.example.com}actor'):
330 name = actor.find('{http://people.example.com}name')
331 print(name.text)
332 for char in actor.findall('{http://characters.example.com}character'):
333 print(' |-->', char.text)
334
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700335A better way to search the namespaced XML example is to create a
336dictionary with your own prefixes and use those in the search functions::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700337
338 ns = {'real_person': 'http://people.example.com',
339 'role': 'http://characters.example.com'}
340
341 for actor in root.findall('real_person:actor', ns):
342 name = actor.find('real_person:name', ns)
343 print(name.text)
344 for char in actor.findall('role:character', ns):
345 print(' |-->', char.text)
346
347These two approaches both output::
348
349 John Cleese
350 |--> Lancelot
351 |--> Archie Leach
352 Eric Idle
353 |--> Sir Robin
354 |--> Gunther
355 |--> Commander Clement
356
357
Eli Benderskyc1d98692012-03-30 11:44:15 +0300358Additional resources
359^^^^^^^^^^^^^^^^^^^^
360
361See http://effbot.org/zone/element-index.htm for tutorials and links to other
362docs.
363
364
365.. _elementtree-xpath:
366
367XPath support
368-------------
369
370This module provides limited support for
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300371`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a
Eli Benderskyc1d98692012-03-30 11:44:15 +0300372tree. The goal is to support a small subset of the abbreviated syntax; a full
373XPath engine is outside the scope of the module.
374
375Example
376^^^^^^^
377
378Here's an example that demonstrates some of the XPath capabilities of the
379module. We'll be using the ``countrydata`` XML document from the
380:ref:`Parsing XML <elementtree-parsing-xml>` section::
381
382 import xml.etree.ElementTree as ET
383
384 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200385
386 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300387 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200388
389 # All 'neighbor' grand-children of 'country' children of the top-level
390 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300391 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200392
393 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300394 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200395
396 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300397 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200398
399 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300400 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200401
Stefan Behnel47541682019-05-03 20:58:16 +0200402For XML with namespaces, use the usual qualified ``{namespace}tag`` notation::
403
404 # All dublin-core "title" tags in the document
405 root.findall(".//{http://purl.org/dc/elements/1.1/}title")
406
407
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200408Supported XPath syntax
409^^^^^^^^^^^^^^^^^^^^^^
410
Georg Brandl44ea77b2013-03-28 13:28:44 +0100411.. tabularcolumns:: |l|L|
412
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200413+-----------------------+------------------------------------------------------+
414| Syntax | Meaning |
415+=======================+======================================================+
416| ``tag`` | Selects all child elements with the given tag. |
417| | For example, ``spam`` selects all child elements |
Raymond Hettinger1e1e6012014-03-29 11:50:08 -0700418| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200419| | grandchildren named ``egg`` in all children named |
Stefan Behnel47541682019-05-03 20:58:16 +0200420| | ``spam``. ``{namespace}*`` selects all tags in the |
421| | given namespace, ``{*}spam`` selects tags named |
422| | ``spam`` in any (or no) namespace, and ``{}*`` |
423| | only selects tags that are not in a namespace. |
424| | |
425| | .. versionchanged:: 3.8 |
426| | Support for star-wildcards was added. |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200427+-----------------------+------------------------------------------------------+
Stefan Behnel47541682019-05-03 20:58:16 +0200428| ``*`` | Selects all child elements, including comments and |
429| | processing instructions. For example, ``*/egg`` |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200430| | selects all grandchildren named ``egg``. |
431+-----------------------+------------------------------------------------------+
432| ``.`` | Selects the current node. This is mostly useful |
433| | at the beginning of the path, to indicate that it's |
434| | a relative path. |
435+-----------------------+------------------------------------------------------+
436| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200437| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200438| | all ``egg`` elements in the entire tree. |
439+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700440| ``..`` | Selects the parent element. Returns ``None`` if the |
441| | path attempts to reach the ancestors of the start |
442| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200443+-----------------------+------------------------------------------------------+
444| ``[@attrib]`` | Selects all elements that have the given attribute. |
445+-----------------------+------------------------------------------------------+
446| ``[@attrib='value']`` | Selects all elements for which the given attribute |
447| | has the given value. The value cannot contain |
448| | quotes. |
449+-----------------------+------------------------------------------------------+
450| ``[tag]`` | Selects all elements that have a child named |
451| | ``tag``. Only immediate children are supported. |
452+-----------------------+------------------------------------------------------+
scoder101a5e82017-09-30 15:35:21 +0200453| ``[.='text']`` | Selects all elements whose complete text content, |
454| | including descendants, equals the given ``text``. |
455| | |
456| | .. versionadded:: 3.7 |
457+-----------------------+------------------------------------------------------+
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700458| ``[tag='text']`` | Selects all elements that have a child named |
459| | ``tag`` whose complete text content, including |
460| | descendants, equals the given ``text``. |
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700461+-----------------------+------------------------------------------------------+
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200462| ``[position]`` | Selects all elements that are located at the given |
463| | position. The position can be either an integer |
464| | (1 is the first position), the expression ``last()`` |
465| | (for the last position), or a position relative to |
466| | the last position (e.g. ``last()-1``). |
467+-----------------------+------------------------------------------------------+
468
469Predicates (expressions within square brackets) must be preceded by a tag
470name, an asterisk, or another predicate. ``position`` predicates must be
471preceded by a tag name.
472
473Reference
474---------
475
Georg Brandl116aa622007-08-15 14:28:22 +0000476.. _elementtree-functions:
477
478Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200479^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000480
Stefan Behnele1d5dd62019-05-01 22:34:13 +0200481.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)
482
483 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.
484
485 Canonicalization is a way to normalise XML output in a way that allows
486 byte-by-byte comparisons and digital signatures. It reduced the freedom
487 that XML serializers have and instead generates a more constrained XML
488 representation. The main restrictions regard the placement of namespace
489 declarations, the ordering of attributes, and ignorable whitespace.
490
491 This function takes an XML data string (*xml_data*) or a file path or
492 file-like object (*from_file*) as input, converts it to the canonical
493 form, and writes it out using the *out* file(-like) object, if provided,
494 or returns it as a text string if not. The output file receives text,
495 not bytes. It should therefore be opened in text mode with ``utf-8``
496 encoding.
497
498 Typical uses::
499
500 xml_data = "<root>...</root>"
501 print(canonicalize(xml_data))
502
503 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
504 canonicalize(xml_data, out=out_file)
505
506 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
507 canonicalize(from_file="inputfile.xml", out=out_file)
508
509 The configuration *options* are as follows:
510
511 - *with_comments*: set to true to include comments (default: false)
512 - *strip_text*: set to true to strip whitespace before and after text content
513 (default: false)
514 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
515 (default: false)
516 - *qname_aware_tags*: a set of qname aware tag names in which prefixes
517 should be replaced in text content (default: empty)
518 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
519 should be replaced in text content (default: empty)
520 - *exclude_attrs*: a set of attribute names that should not be serialised
521 - *exclude_tags*: a set of tag names that should not be serialised
522
523 In the option list above, "a set" refers to any collection or iterable of
524 strings, no ordering is expected.
525
526 .. versionadded:: 3.8
527
Georg Brandl116aa622007-08-15 14:28:22 +0000528
Georg Brandl7f01a132009-09-16 15:58:14 +0000529.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000530
Georg Brandlf6945182008-02-01 11:56:49 +0000531 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000532 that will be serialized as an XML comment by the standard serializer. The
533 comment string can be either a bytestring or a Unicode string. *text* is a
534 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000535 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000536
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700537 Note that :class:`XMLParser` skips over comments in the input
538 instead of creating comment objects for them. An :class:`ElementTree` will
539 only contain comment nodes if they have been inserted into to
540 the tree using one of the :class:`Element` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000541
542.. function:: dump(elem)
543
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000544 Writes an element tree or element structure to sys.stdout. This function
545 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000546
547 The exact output format is implementation dependent. In this version, it's
548 written as an ordinary XML file.
549
550 *elem* is an element tree or an individual element.
551
Raymond Hettingere3685fd2018-10-28 11:18:22 -0700552 .. versionchanged:: 3.8
553 The :func:`dump` function now preserves the attribute order specified
554 by the user.
555
Georg Brandl116aa622007-08-15 14:28:22 +0000556
Manjusakae5458bd2019-02-22 08:33:57 +0800557.. function:: fromstring(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000558
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000559 Parses an XML section from a string constant. Same as :func:`XML`. *text*
Manjusakae5458bd2019-02-22 08:33:57 +0800560 is a string containing XML data. *parser* is an optional parser instance.
561 If not given, the standard :class:`XMLParser` parser is used.
562 Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000563
564
565.. function:: fromstringlist(sequence, parser=None)
566
567 Parses an XML document from a sequence of string fragments. *sequence* is a
568 list or other sequence containing XML data fragments. *parser* is an
569 optional parser instance. If not given, the standard :class:`XMLParser`
570 parser is used. Returns an :class:`Element` instance.
571
Ezio Melottif8754a62010-03-21 07:16:43 +0000572 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000573
574
575.. function:: iselement(element)
576
Serhiy Storchakad3603462019-11-12 18:54:10 +0200577 Check if an object appears to be a valid element object. *element* is an
578 element instance. Return ``True`` if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000579
580
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000581.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000582
583 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200584 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700585 containing XML data. *events* is a sequence of events to report back. The
Stefan Behnel43851a22019-05-01 21:20:38 +0200586 supported events are the strings ``"start"``, ``"end"``, ``"comment"``,
587 ``"pi"``, ``"start-ns"`` and ``"end-ns"``
588 (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200589 information). If *events* is omitted, only ``"end"`` events are reported.
590 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700591 :class:`XMLParser` parser is used. *parser* must be a subclass of
592 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
593 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000594
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700595 Note that while :func:`iterparse` builds the tree incrementally, it issues
596 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700597 for applications where blocking reads can't be made. For fully non-blocking
598 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700599
Benjamin Peterson75edad02009-01-01 15:05:06 +0000600 .. note::
601
Eli Benderskyb5869342013-08-30 05:51:20 -0700602 :func:`iterparse` only guarantees that it has seen the ">" character of a
603 starting tag when it emits a "start" event, so the attributes are defined,
604 but the contents of the text and tail attributes are undefined at that
605 point. The same applies to the element children; they may or may not be
606 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000607
608 If you need a fully populated element, look for "end" events instead.
609
Eli Benderskyb5869342013-08-30 05:51:20 -0700610 .. deprecated:: 3.4
611 The *parser* argument.
612
Stefan Behnel43851a22019-05-01 21:20:38 +0200613 .. versionchanged:: 3.8
614 The ``comment`` and ``pi`` events were added.
615
616
Georg Brandl7f01a132009-09-16 15:58:14 +0000617.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000618
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000619 Parses an XML section into an element tree. *source* is a filename or file
620 object containing XML data. *parser* is an optional parser instance. If
621 not given, the standard :class:`XMLParser` parser is used. Returns an
622 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000623
624
Georg Brandl7f01a132009-09-16 15:58:14 +0000625.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000626
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000627 PI element factory. This factory function creates a special element that
628 will be serialized as an XML processing instruction. *target* is a string
629 containing the PI target. *text* is a string containing the PI contents, if
630 given. Returns an element instance, representing a processing instruction.
631
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700632 Note that :class:`XMLParser` skips over processing instructions
633 in the input instead of creating comment objects for them. An
634 :class:`ElementTree` will only contain processing instruction nodes if
635 they have been inserted into to the tree using one of the
636 :class:`Element` methods.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000637
638.. function:: register_namespace(prefix, uri)
639
640 Registers a namespace prefix. The registry is global, and any existing
641 mapping for either the given prefix or the namespace URI will be removed.
642 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
643 attributes in this namespace will be serialized with the given prefix, if at
644 all possible.
645
Ezio Melottif8754a62010-03-21 07:16:43 +0000646 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000647
648
Georg Brandl7f01a132009-09-16 15:58:14 +0000649.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000650
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000651 Subelement factory. This function creates an element instance, and appends
652 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000653
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000654 The element name, attribute names, and attribute values can be either
655 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
656 the subelement name. *attrib* is an optional dictionary, containing element
657 attributes. *extra* contains additional attributes, given as keyword
658 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000659
660
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200661.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Miss Islington (bot)63673912019-07-24 11:32:56 -0700662 xml_declaration=None, default_namespace=None, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800663 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000664
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000665 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000666 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000667 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700668 generate a Unicode string (otherwise, a bytestring is generated). *method*
669 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200670 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
671 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string
672 containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000673
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800674 .. versionadded:: 3.4
675 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000676
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200677 .. versionadded:: 3.8
678 The *xml_declaration* and *default_namespace* parameters.
679
Miss Islington (bot)63673912019-07-24 11:32:56 -0700680 .. versionchanged:: 3.8
681 The :func:`tostring` function now preserves the attribute order
682 specified by the user.
683
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800684
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200685.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Miss Islington (bot)63673912019-07-24 11:32:56 -0700686 xml_declaration=None, default_namespace=None, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800687 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000688
689 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000690 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000691 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700692 generate a Unicode string (otherwise, a bytestring is generated). *method*
693 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200694 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
695 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded
696 strings containing the XML data. It does not guarantee any specific sequence,
697 except that ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000698
Ezio Melottif8754a62010-03-21 07:16:43 +0000699 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000700
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800701 .. versionadded:: 3.4
702 The *short_empty_elements* parameter.
703
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200704 .. versionadded:: 3.8
705 The *xml_declaration* and *default_namespace* parameters.
706
Miss Islington (bot)63673912019-07-24 11:32:56 -0700707 .. versionchanged:: 3.8
708 The :func:`tostringlist` function now preserves the attribute order
709 specified by the user.
710
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000711
712.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000713
714 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000715 embed "XML literals" in Python code. *text* is a string containing XML
716 data. *parser* is an optional parser instance. If not given, the standard
717 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000718
719
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000720.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000721
722 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000723 which maps from element id:s to elements. *text* is a string containing XML
724 data. *parser* is an optional parser instance. If not given, the standard
725 :class:`XMLParser` parser is used. Returns a tuple containing an
726 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000727
728
Miss Islington (bot)6cf0ba82019-09-11 07:48:33 -0700729.. _elementtree-xinclude:
730
731XInclude support
732----------------
733
734This module provides limited support for
735`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree.
736
737Example
738^^^^^^^
739
740Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include.
741
742.. code-block:: xml
743
744 <?xml version="1.0"?>
745 <document xmlns:xi="http://www.w3.org/2001/XInclude">
746 <xi:include href="source.xml" parse="xml" />
747 </document>
748
749By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
750
751To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module:
752
753.. code-block:: python
754
755 from xml.etree import ElementTree, ElementInclude
756
757 tree = ElementTree.parse("document.xml")
758 root = tree.getroot()
759
760 ElementInclude.include(root)
761
762The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this:
763
764.. code-block:: xml
765
766 <document xmlns:xi="http://www.w3.org/2001/XInclude">
767 <para>This is a paragraph.</para>
768 </document>
769
770If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required.
771
772To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text":
773
774.. code-block:: xml
775
776 <?xml version="1.0"?>
777 <document xmlns:xi="http://www.w3.org/2001/XInclude">
778 Copyright (c) <xi:include href="year.txt" parse="text" />.
779 </document>
780
781The result might look something like:
782
783.. code-block:: xml
784
785 <document xmlns:xi="http://www.w3.org/2001/XInclude">
786 Copyright (c) 2003.
787 </document>
788
789Reference
790---------
791
792.. _elementinclude-functions:
793
794Functions
795^^^^^^^^^
796
797.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None)
798
799 Default loader. This default loader reads an included resource from disk. *href* is a URL.
800 *parse* is for parse mode either "xml" or "text". *encoding*
801 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the
802 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree
803 instance. If the parse mode is "text", this is a Unicode string. If the
804 loader fails, it can return None or raise an exception.
805
806
807.. function:: xml.etree.ElementInclude.include( elem, loader=None)
808
809 This function expands XInclude directives. *elem* is the root element. *loader* is
810 an optional resource loader. If omitted, it defaults to :func:`default_loader`.
811 If given, it should be a callable that implements the same interface as
812 :func:`default_loader`. Returns the expanded resource. If the parse mode is
813 ``"xml"``, this is an ElementTree instance. If the parse mode is "text",
814 this is a Unicode string. If the loader fails, it can return None or
815 raise an exception.
816
817
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000818.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000819
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000820Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200821^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000822
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000823.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000824
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000825 Element class. This class defines the Element interface, and provides a
826 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000827
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000828 The element name, attribute names, and attribute values can be either
829 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
830 an optional dictionary, containing element attributes. *extra* contains
831 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000832
833
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000834 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000835
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000836 A string identifying what kind of data this element represents (the
837 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000838
839
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000840 .. attribute:: text
Ned Deilyeca04452015-08-17 22:11:17 -0400841 tail
Georg Brandl116aa622007-08-15 14:28:22 +0000842
Ned Deilyeca04452015-08-17 22:11:17 -0400843 These attributes can be used to hold additional data associated with
844 the element. Their values are usually strings but may be any
845 application-specific object. If the element is created from
846 an XML file, the *text* attribute holds either the text between
847 the element's start tag and its first child or end tag, or ``None``, and
848 the *tail* attribute holds either the text between the element's
849 end tag and the next tag, or ``None``. For the XML data
Georg Brandl116aa622007-08-15 14:28:22 +0000850
Ned Deilyeca04452015-08-17 22:11:17 -0400851 .. code-block:: xml
Georg Brandl116aa622007-08-15 14:28:22 +0000852
Ned Deilyeca04452015-08-17 22:11:17 -0400853 <a><b>1<c>2<d/>3</c></b>4</a>
Georg Brandl116aa622007-08-15 14:28:22 +0000854
Ned Deilyeca04452015-08-17 22:11:17 -0400855 the *a* element has ``None`` for both *text* and *tail* attributes,
856 the *b* element has *text* ``"1"`` and *tail* ``"4"``,
857 the *c* element has *text* ``"2"`` and *tail* ``None``,
858 and the *d* element has *text* ``None`` and *tail* ``"3"``.
859
860 To collect the inner text of an element, see :meth:`itertext`, for
861 example ``"".join(element.itertext())``.
862
863 Applications may store arbitrary objects in these attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000864
Georg Brandl116aa622007-08-15 14:28:22 +0000865
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000866 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000867
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000868 A dictionary containing the element's attributes. Note that while the
869 *attrib* value is always a real mutable Python dictionary, an ElementTree
870 implementation may choose to use another internal representation, and
871 create the dictionary only if someone asks for it. To take advantage of
872 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000873
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000874 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000875
876
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000877 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000878
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000879 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700880 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000881
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000882
883 .. method:: get(key, default=None)
884
885 Gets the element attribute named *key*.
886
887 Returns the attribute value, or *default* if the attribute was not found.
888
889
890 .. method:: items()
891
892 Returns the element attributes as a sequence of (name, value) pairs. The
893 attributes are returned in an arbitrary order.
894
895
896 .. method:: keys()
897
898 Returns the elements attribute names as a list. The names are returned
899 in an arbitrary order.
900
901
902 .. method:: set(key, value)
903
904 Set the attribute *key* on the element to *value*.
905
906 The following methods work on the element's children (subelements).
907
908
909 .. method:: append(subelement)
910
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200911 Adds the element *subelement* to the end of this element's internal list
912 of subelements. Raises :exc:`TypeError` if *subelement* is not an
913 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000914
915
916 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000917
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000918 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200919 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000920
Ezio Melottif8754a62010-03-21 07:16:43 +0000921 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000922
Georg Brandl116aa622007-08-15 14:28:22 +0000923
Eli Bendersky737b1732012-05-29 06:02:56 +0300924 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000925
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000926 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200927 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300928 or ``None``. *namespaces* is an optional mapping from namespace prefix
Stefan Behnele8113f52019-04-18 19:05:03 +0200929 to full name. Pass ``''`` as prefix to move all unprefixed tag names
Stefan Behnele9927e12019-04-14 10:09:09 +0200930 in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000931
Georg Brandl116aa622007-08-15 14:28:22 +0000932
Eli Bendersky737b1732012-05-29 06:02:56 +0300933 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000934
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200935 Finds all matching subelements, by tag name or
936 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300937 elements in document order. *namespaces* is an optional mapping from
Stefan Behnele8113f52019-04-18 19:05:03 +0200938 namespace prefix to full name. Pass ``''`` as prefix to move all
Stefan Behnele9927e12019-04-14 10:09:09 +0200939 unprefixed tag names in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000940
Georg Brandl116aa622007-08-15 14:28:22 +0000941
Eli Bendersky737b1732012-05-29 06:02:56 +0300942 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000943
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000944 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200945 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
946 of the first matching element, or *default* if no element was found.
947 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300948 is returned. *namespaces* is an optional mapping from namespace prefix
Stefan Behnele8113f52019-04-18 19:05:03 +0200949 to full name. Pass ``''`` as prefix to move all unprefixed tag names
Stefan Behnele9927e12019-04-14 10:09:09 +0200950 in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000951
Georg Brandl116aa622007-08-15 14:28:22 +0000952
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000953 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000954
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +0300955 .. deprecated-removed:: 3.2 3.9
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000956 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000957
Georg Brandl116aa622007-08-15 14:28:22 +0000958
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000959 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000960
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +0300961 .. deprecated-removed:: 3.2 3.9
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000962 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000963
Georg Brandl116aa622007-08-15 14:28:22 +0000964
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200965 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000966
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200967 Inserts *subelement* at the given position in this element. Raises
968 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000969
Georg Brandl116aa622007-08-15 14:28:22 +0000970
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000971 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000972
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000973 Creates a tree :term:`iterator` with the current element as the root.
974 The iterator iterates over this element and all elements below it, in
975 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
976 elements whose tag equals *tag* are returned from the iterator. If the
977 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000978
Ezio Melotti138fc892011-10-10 00:02:03 +0300979 .. versionadded:: 3.2
980
Georg Brandl116aa622007-08-15 14:28:22 +0000981
Eli Bendersky737b1732012-05-29 06:02:56 +0300982 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000983
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200984 Finds all matching subelements, by tag name or
985 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300986 matching elements in document order. *namespaces* is an optional mapping
987 from namespace prefix to full name.
988
Georg Brandl116aa622007-08-15 14:28:22 +0000989
Ezio Melottif8754a62010-03-21 07:16:43 +0000990 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000991
Georg Brandl116aa622007-08-15 14:28:22 +0000992
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000993 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000994
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000995 Creates a text iterator. The iterator loops over this element and all
996 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000997
Ezio Melottif8754a62010-03-21 07:16:43 +0000998 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000999
1000
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001001 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +00001002
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001003 Creates a new element object of the same type as this element. Do not
1004 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +00001005
1006
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001007 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +00001008
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001009 Removes *subelement* from the element. Unlike the find\* methods this
1010 method compares elements based on the instance identity, not on tag value
1011 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +00001012
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001013 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +03001014 for working with subelements: :meth:`~object.__delitem__`,
1015 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
1016 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +00001017
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001018 Caution: Elements with no subelements will test as ``False``. This behavior
1019 will change in future versions. Use specific ``len(elem)`` or ``elem is
1020 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001021
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001022 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +00001023
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001024 if not element: # careful!
1025 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +00001026
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001027 if element is None:
1028 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +00001029
Miss Islington (bot)63673912019-07-24 11:32:56 -07001030 Prior to Python 3.8, the serialisation order of the XML attributes of
1031 elements was artificially made predictable by sorting the attributes by
1032 their name. Based on the now guaranteed ordering of dicts, this arbitrary
1033 reordering was removed in Python 3.8 to preserve the order in which
1034 attributes were originally parsed or created by user code.
1035
1036 In general, user code should try not to depend on a specific ordering of
1037 attributes, given that the `XML Information Set
1038 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute
1039 order from conveying information. Code should be prepared to deal with
1040 any ordering on input. In cases where deterministic XML output is required,
1041 e.g. for cryptographic signing or test data sets, canonical serialisation
1042 is available with the :func:`canonicalize` function.
1043
1044 In cases where canonical output is not applicable but a specific attribute
1045 order is still desirable on output, code should aim for creating the
1046 attributes directly in the desired order, to avoid perceptual mismatches
1047 for readers of the code. In cases where this is difficult to achieve, a
1048 recipe like the following can be applied prior to serialisation to enforce
1049 an order independently from the Element creation::
1050
1051 def reorder_attributes(root):
1052 for el in root.iter():
1053 attrib = el.attrib
1054 if len(attrib) > 1:
1055 # adjust attribute order, e.g. by sorting
1056 attribs = sorted(attrib.items())
1057 attrib.clear()
1058 attrib.update(attribs)
1059
Georg Brandl116aa622007-08-15 14:28:22 +00001060
1061.. _elementtree-elementtree-objects:
1062
1063ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001064^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001065
1066
Georg Brandl7f01a132009-09-16 15:58:14 +00001067.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001068
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001069 ElementTree wrapper class. This class represents an entire element
1070 hierarchy, and adds some extra support for serialization to and from
1071 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +00001072
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001073 *element* is the root element. The tree is initialized with the contents
1074 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +00001075
1076
Benjamin Petersone41251e2008-04-25 01:59:09 +00001077 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +00001078
Benjamin Petersone41251e2008-04-25 01:59:09 +00001079 Replaces the root element for this tree. This discards the current
1080 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001081 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001082
1083
Eli Bendersky737b1732012-05-29 06:02:56 +03001084 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001085
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001086 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001087
1088
Eli Bendersky737b1732012-05-29 06:02:56 +03001089 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001090
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001091 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001092
1093
Eli Bendersky737b1732012-05-29 06:02:56 +03001094 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001095
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001096 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001097
1098
Georg Brandl7f01a132009-09-16 15:58:14 +00001099 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001100
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001101 .. deprecated-removed:: 3.2 3.9
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001102 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +00001103
1104
Benjamin Petersone41251e2008-04-25 01:59:09 +00001105 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +00001106
Benjamin Petersone41251e2008-04-25 01:59:09 +00001107 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001108
1109
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001110 .. method:: iter(tag=None)
1111
1112 Creates and returns a tree iterator for the root element. The iterator
1113 loops over all elements in this tree, in section order. *tag* is the tag
Martin Panterd21e0b52015-10-10 10:36:22 +00001114 to look for (default is to return all elements).
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001115
1116
Eli Bendersky737b1732012-05-29 06:02:56 +03001117 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001118
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001119 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001120
Ezio Melottif8754a62010-03-21 07:16:43 +00001121 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001122
1123
Georg Brandl7f01a132009-09-16 15:58:14 +00001124 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001125
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001126 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +00001127 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +03001128 If not given, the standard :class:`XMLParser` parser is used. Returns the
1129 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +00001130
1131
Eli Benderskyf96cf912012-07-15 06:19:44 +03001132 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +02001133 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -08001134 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +00001135
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001136 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +03001137 :term:`file object` opened for writing. *encoding* [1]_ is the output
1138 encoding (default is US-ASCII).
1139 *xml_declaration* controls if an XML declaration should be added to the
1140 file. Use ``False`` for never, ``True`` for always, ``None``
1141 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +02001142 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +03001143 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
1144 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001145 The keyword-only *short_empty_elements* parameter controls the formatting
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +03001146 of elements that contain no content. If ``True`` (the default), they are
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001147 emitted as a single self-closed tag, otherwise they are emitted as a pair
1148 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +03001149
1150 The output is either a string (:class:`str`) or binary (:class:`bytes`).
1151 This is controlled by the *encoding* argument. If *encoding* is
1152 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
1153 this may conflict with the type of *file* if it's an open
1154 :term:`file object`; make sure you do not try to write a string to a
1155 binary stream and vice versa.
1156
R David Murray575fb312013-12-25 23:21:03 -05001157 .. versionadded:: 3.4
1158 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001159
Raymond Hettingere3685fd2018-10-28 11:18:22 -07001160 .. versionchanged:: 3.8
1161 The :meth:`write` method now preserves the attribute order specified
1162 by the user.
1163
Georg Brandl116aa622007-08-15 14:28:22 +00001164
Christian Heimesd8654cf2007-12-02 15:22:16 +00001165This is the XML file that is going to be manipulated::
1166
1167 <html>
1168 <head>
1169 <title>Example page</title>
1170 </head>
1171 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +00001172 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +00001173 or <a href="http://example.com/">example.com</a>.</p>
1174 </body>
1175 </html>
1176
1177Example of changing the attribute "target" of every link in first paragraph::
1178
1179 >>> from xml.etree.ElementTree import ElementTree
1180 >>> tree = ElementTree()
1181 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001182 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +00001183 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
1184 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001185 <Element 'p' at 0xb77ec26c>
1186 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +00001187 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001188 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +00001189 >>> for i in links: # Iterates through all found links
1190 ... i.attrib["target"] = "blank"
1191 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +00001192
1193.. _elementtree-qname-objects:
1194
1195QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001196^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001197
1198
Georg Brandl7f01a132009-09-16 15:58:14 +00001199.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001200
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001201 QName wrapper. This can be used to wrap a QName attribute value, in order
1202 to get proper namespace handling on output. *text_or_uri* is a string
1203 containing the QName value, in the form {uri}local, or, if the tag argument
1204 is given, the URI part of a QName. If *tag* is given, the first argument is
Martin Panter6245cb32016-04-15 02:14:19 +00001205 interpreted as a URI, and this argument is interpreted as a local name.
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001206 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +00001207
1208
Antoine Pitrou5b235d02013-04-18 19:37:06 +02001209
Georg Brandl116aa622007-08-15 14:28:22 +00001210.. _elementtree-treebuilder-objects:
1211
1212TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001213^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001214
1215
Stefan Behnel43851a22019-05-01 21:20:38 +02001216.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \
1217 pi_factory=None, insert_comments=False, insert_pis=False)
Georg Brandl116aa622007-08-15 14:28:22 +00001218
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001219 Generic element structure builder. This builder converts a sequence of
Stefan Behnel43851a22019-05-01 21:20:38 +02001220 start, data, end, comment and pi method calls to a well-formed element
1221 structure. You can use this class to build an element structure using
1222 a custom XML parser, or a parser for some other XML-like format.
1223
1224 *element_factory*, when given, must be a callable accepting two positional
1225 arguments: a tag and a dict of attributes. It is expected to return a new
1226 element instance.
1227
1228 The *comment_factory* and *pi_factory* functions, when given, should behave
1229 like the :func:`Comment` and :func:`ProcessingInstruction` functions to
1230 create comments and processing instructions. When not given, the default
1231 factories will be used. When *insert_comments* and/or *insert_pis* is true,
1232 comments/pis will be inserted into the tree if they appear within the root
1233 element (but not outside of it).
Georg Brandl116aa622007-08-15 14:28:22 +00001234
Benjamin Petersone41251e2008-04-25 01:59:09 +00001235 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001236
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001237 Flushes the builder buffers, and returns the toplevel document
1238 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001239
1240
Benjamin Petersone41251e2008-04-25 01:59:09 +00001241 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001242
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001243 Adds text to the current element. *data* is a string. This should be
1244 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +00001245
1246
Benjamin Petersone41251e2008-04-25 01:59:09 +00001247 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +00001248
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001249 Closes the current element. *tag* is the element name. Returns the
1250 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +00001251
1252
Benjamin Petersone41251e2008-04-25 01:59:09 +00001253 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +00001254
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001255 Opens a new element. *tag* is the element name. *attrs* is a dictionary
1256 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +00001257
1258
Stefan Behnel43851a22019-05-01 21:20:38 +02001259 .. method:: comment(text)
1260
1261 Creates a comment with the given *text*. If ``insert_comments`` is true,
1262 this will also add it to the tree.
1263
1264 .. versionadded:: 3.8
1265
1266
1267 .. method:: pi(target, text)
1268
1269 Creates a comment with the given *target* name and *text*. If
1270 ``insert_pis`` is true, this will also add it to the tree.
1271
1272 .. versionadded:: 3.8
1273
1274
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001275 In addition, a custom :class:`TreeBuilder` object can provide the
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001276 following methods:
Georg Brandl116aa622007-08-15 14:28:22 +00001277
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001278 .. method:: doctype(name, pubid, system)
1279
1280 Handles a doctype declaration. *name* is the doctype name. *pubid* is
1281 the public identifier. *system* is the system identifier. This method
1282 does not exist on the default :class:`TreeBuilder` class.
1283
Ezio Melottif8754a62010-03-21 07:16:43 +00001284 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001285
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001286 .. method:: start_ns(prefix, uri)
1287
1288 Is called whenever the parser encounters a new namespace declaration,
1289 before the ``start()`` callback for the opening element that defines it.
1290 *prefix* is ``''`` for the default namespace and the declared
1291 namespace prefix name otherwise. *uri* is the namespace URI.
1292
1293 .. versionadded:: 3.8
1294
1295 .. method:: end_ns(prefix)
1296
1297 Is called after the ``end()`` callback of an element that declared
1298 a namespace prefix mapping, with the name of the *prefix* that went
1299 out of scope.
1300
1301 .. versionadded:: 3.8
1302
Georg Brandl116aa622007-08-15 14:28:22 +00001303
Stefan Behnele1d5dd62019-05-01 22:34:13 +02001304.. class:: C14NWriterTarget(write, *, \
1305 with_comments=False, strip_text=False, rewrite_prefixes=False, \
1306 qname_aware_tags=None, qname_aware_attrs=None, \
1307 exclude_attrs=None, exclude_tags=None)
1308
1309 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the
1310 same as for the :func:`canonicalize` function. This class does not build a
1311 tree but translates the callback events directly into a serialised form
1312 using the *write* function.
1313
1314 .. versionadded:: 3.8
1315
1316
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001317.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +00001318
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001319XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001320^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001321
1322
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001323.. class:: XMLParser(*, target=None, encoding=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001324
Eli Benderskyb5869342013-08-30 05:51:20 -07001325 This class is the low-level building block of the module. It uses
1326 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
Georg Brandladeffcc2016-02-26 19:13:47 +01001327 be fed XML data incrementally with the :meth:`feed` method, and parsing
1328 events are translated to a push API - by invoking callbacks on the *target*
1329 object. If *target* is omitted, the standard :class:`TreeBuilder` is used.
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001330 If *encoding* [1]_ is given, the value overrides the
Georg Brandladeffcc2016-02-26 19:13:47 +01001331 encoding specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +00001332
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001333 .. versionchanged:: 3.8
1334 Parameters are now :ref:`keyword-only <keyword-only_parameter>`.
1335 The *html* argument no longer supported.
1336
Georg Brandl116aa622007-08-15 14:28:22 +00001337
Benjamin Petersone41251e2008-04-25 01:59:09 +00001338 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001339
Eli Benderskybfd78372013-08-24 15:11:44 -07001340 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -07001341 ``close()`` method of the *target* passed during construction; by default,
1342 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +00001343
1344
Benjamin Petersone41251e2008-04-25 01:59:09 +00001345 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001346
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001347 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +00001348
Eli Benderskyb5869342013-08-30 05:51:20 -07001349 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1350 for each opening tag, its ``end(tag)`` method for each closing tag, and data
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001351 is processed by method ``data(data)``. For further supported callback
1352 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls
Eli Benderskyb5869342013-08-30 05:51:20 -07001353 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1354 building a tree structure. This is an example of counting the maximum depth
1355 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +00001356
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001357 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001358 >>> class MaxDepth: # The target object of the parser
1359 ... maxDepth = 0
1360 ... depth = 0
1361 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001362 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001363 ... if self.depth > self.maxDepth:
1364 ... self.maxDepth = self.depth
1365 ... def end(self, tag): # Called for each closing tag.
1366 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001367 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001368 ... pass # We do not need to do anything with data.
1369 ... def close(self): # Called when all data has been parsed.
1370 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001371 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001372 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001373 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001374 >>> exampleXml = """
1375 ... <a>
1376 ... <b>
1377 ... </b>
1378 ... <b>
1379 ... <c>
1380 ... <d>
1381 ... </d>
1382 ... </c>
1383 ... </b>
1384 ... </a>"""
1385 >>> parser.feed(exampleXml)
1386 >>> parser.close()
1387 4
Christian Heimesb186d002008-03-18 15:15:01 +00001388
Eli Benderskyb5869342013-08-30 05:51:20 -07001389
1390.. _elementtree-xmlpullparser-objects:
1391
1392XMLPullParser Objects
1393^^^^^^^^^^^^^^^^^^^^^
1394
1395.. class:: XMLPullParser(events=None)
1396
Eli Bendersky2c68e302013-08-31 07:37:23 -07001397 A pull parser suitable for non-blocking applications. Its input-side API is
1398 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1399 callback target, :class:`XMLPullParser` collects an internal list of parsing
1400 events and lets the user read from it. *events* is a sequence of events to
1401 report back. The supported events are the strings ``"start"``, ``"end"``,
Stefan Behnel43851a22019-05-01 21:20:38 +02001402 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events
1403 are used to get detailed namespace information). If *events* is omitted,
1404 only ``"end"`` events are reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001405
1406 .. method:: feed(data)
1407
1408 Feed the given bytes data to the parser.
1409
1410 .. method:: close()
1411
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001412 Signal the parser that the data stream is terminated. Unlike
1413 :meth:`XMLParser.close`, this method always returns :const:`None`.
1414 Any events not yet retrieved when the parser is closed can still be
1415 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001416
1417 .. method:: read_events()
1418
R David Murray410d3202014-01-04 23:52:50 -05001419 Return an iterator over the events which have been encountered in the
1420 data fed to the
1421 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001422 string representing the type of event (e.g. ``"end"``) and *elem* is the
Stefan Behnel43851a22019-05-01 21:20:38 +02001423 encountered :class:`Element` object, or other context value as follows.
1424
1425 * ``start``, ``end``: the current Element.
1426 * ``comment``, ``pi``: the current comment / processing instruction
1427 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace
1428 mapping.
1429 * ``end-ns``: :const:`None` (this may change in a future version)
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001430
1431 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001432 yielded again. Events are consumed from the internal queue only when
1433 they are retrieved from the iterator, so multiple readers iterating in
1434 parallel over iterators obtained from :meth:`read_events` will have
1435 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001436
1437 .. note::
1438
1439 :class:`XMLPullParser` only guarantees that it has seen the ">"
1440 character of a starting tag when it emits a "start" event, so the
1441 attributes are defined, but the contents of the text and tail attributes
1442 are undefined at that point. The same applies to the element children;
1443 they may or may not be present.
1444
1445 If you need a fully populated element, look for "end" events instead.
1446
1447 .. versionadded:: 3.4
1448
Stefan Behnel43851a22019-05-01 21:20:38 +02001449 .. versionchanged:: 3.8
1450 The ``comment`` and ``pi`` events were added.
1451
1452
Eli Bendersky5b77d812012-03-16 08:20:05 +02001453Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001454^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001455
1456.. class:: ParseError
1457
1458 XML parse error, raised by the various parsing methods in this module when
1459 parsing fails. The string representation of an instance of this exception
1460 will contain a user-friendly error message. In addition, it will have
1461 the following attributes available:
1462
1463 .. attribute:: code
1464
1465 A numeric error code from the expat parser. See the documentation of
1466 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1467
1468 .. attribute:: position
1469
1470 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001471
1472.. rubric:: Footnotes
1473
Serhiy Storchakad97b7dc2017-05-16 23:18:09 +03001474.. [1] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001475 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
Serhiy Storchaka6dff0202016-05-07 10:49:07 +03001476 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1477 and https://www.iana.org/assignments/character-sets/character-sets.xhtml.