blob: 87f4ee347d604df7f98e4b7f41b0cd6c070ec9bb [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04009**Source code:** :source:`Lib/xml/etree/ElementTree.py`
10
11--------------
12
Eli Benderskyc1d98692012-03-30 11:44:15 +030013The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
14for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000015
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010016.. versionchanged:: 3.3
17 This module will use a fast implementation whenever available.
Serhiy Storchakaec88e1b2020-06-10 18:39:12 +030018
19.. deprecated:: 3.3
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010020 The :mod:`xml.etree.cElementTree` module is deprecated.
21
Christian Heimes7380a672013-03-26 17:35:55 +010022
23.. warning::
24
25 The :mod:`xml.etree.ElementTree` module is not secure against
26 maliciously constructed data. If you need to parse untrusted or
27 unauthenticated data see :ref:`xml-vulnerabilities`.
28
Eli Benderskyc1d98692012-03-30 11:44:15 +030029Tutorial
30--------
Georg Brandl116aa622007-08-15 14:28:22 +000031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
33short). The goal is to demonstrate some of the building blocks and basic
34concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020035
Eli Benderskyc1d98692012-03-30 11:44:15 +030036XML tree and elements
37^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020038
Eli Benderskyc1d98692012-03-30 11:44:15 +030039XML is an inherently hierarchical data format, and the most natural way to
40represent it is with a tree. ``ET`` has two classes for this purpose -
41:class:`ElementTree` represents the whole XML document as a tree, and
42:class:`Element` represents a single node in this tree. Interactions with
43the whole document (reading and writing to/from files) are usually done
44on the :class:`ElementTree` level. Interactions with a single XML element
45and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Benderskyc1d98692012-03-30 11:44:15 +030047.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020048
Eli Benderskyc1d98692012-03-30 11:44:15 +030049Parsing XML
50^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020051
Eli Bendersky0f4e9342012-08-14 07:19:33 +030052We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020053
Eli Bendersky0f4e9342012-08-14 07:19:33 +030054.. code-block:: xml
55
56 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020057 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030058 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020059 <rank>1</rank>
60 <year>2008</year>
61 <gdppc>141100</gdppc>
62 <neighbor name="Austria" direction="E"/>
63 <neighbor name="Switzerland" direction="W"/>
64 </country>
65 <country name="Singapore">
66 <rank>4</rank>
67 <year>2011</year>
68 <gdppc>59900</gdppc>
69 <neighbor name="Malaysia" direction="N"/>
70 </country>
71 <country name="Panama">
72 <rank>68</rank>
73 <year>2011</year>
74 <gdppc>13600</gdppc>
75 <neighbor name="Costa Rica" direction="W"/>
76 <neighbor name="Colombia" direction="E"/>
77 </country>
78 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020079
Eli Bendersky0f4e9342012-08-14 07:19:33 +030080We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030081
82 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030083 tree = ET.parse('country_data.xml')
84 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030085
Eli Bendersky0f4e9342012-08-14 07:19:33 +030086Or directly from a string::
87
88 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030089
90:func:`fromstring` parses XML from a string directly into an :class:`Element`,
91which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030092create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030093
94As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
95
96 >>> root.tag
97 'data'
98 >>> root.attrib
99 {}
100
101It also has children nodes over which we can iterate::
102
103 >>> for child in root:
Serhiy Storchakadba90392016-05-10 12:01:23 +0300104 ... print(child.tag, child.attrib)
Eli Benderskyc1d98692012-03-30 11:44:15 +0300105 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300106 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300107 country {'name': 'Singapore'}
108 country {'name': 'Panama'}
109
110Children are nested, and we can access specific child nodes by index::
111
112 >>> root[0][1].text
113 '2008'
114
R David Murray410d3202014-01-04 23:52:50 -0500115
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700116.. note::
117
118 Not all elements of the XML input will end up as elements of the
119 parsed tree. Currently, this module skips over any XML comments,
120 processing instructions, and document type declarations in the
121 input. Nevertheless, trees built using this module's API rather
122 than parsing from XML text can have comments and processing
123 instructions in them; they will be included when generating XML
124 output. A document type declaration may be accessed by passing a
125 custom :class:`TreeBuilder` instance to the :class:`XMLParser`
126 constructor.
127
128
R David Murray410d3202014-01-04 23:52:50 -0500129.. _elementtree-pull-parsing:
130
Eli Bendersky2c68e302013-08-31 07:37:23 -0700131Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700132^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700133
R David Murray410d3202014-01-04 23:52:50 -0500134Most parsing functions provided by this module require the whole document
135to be read at once before returning any result. It is possible to use an
136:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700137calls methods on a callback target, which is too low-level and inconvenient for
138most needs. Sometimes what the user really wants is to be able to parse XML
139incrementally, without blocking operations, while enjoying the convenience of
140fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700141
Eli Benderskyb5869342013-08-30 05:51:20 -0700142The most powerful tool for doing this is :class:`XMLPullParser`. It does not
143require a blocking read to obtain the XML data, and is instead fed with data
144incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500145elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700146
Eli Bendersky2c68e302013-08-31 07:37:23 -0700147 >>> parser = ET.XMLPullParser(['start', 'end'])
148 >>> parser.feed('<mytag>sometext')
149 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700150 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700151 >>> parser.feed(' more text</mytag>')
152 >>> for event, elem in parser.read_events():
Serhiy Storchakadba90392016-05-10 12:01:23 +0300153 ... print(event)
154 ... print(elem.tag, 'text=', elem.text)
Eli Benderskyb5869342013-08-30 05:51:20 -0700155 ...
156 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700157
Eli Bendersky2c68e302013-08-31 07:37:23 -0700158The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700159where the XML data is being received from a socket or read incrementally from
160some storage device. In such cases, blocking reads are unacceptable.
161
Eli Benderskyb5869342013-08-30 05:51:20 -0700162Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
163simpler use-cases. If you don't mind your application blocking on reading XML
164data but would still like to have incremental parsing capabilities, take a look
165at :func:`iterparse`. It can be useful when you're reading a large XML document
166and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700167
Eli Benderskyc1d98692012-03-30 11:44:15 +0300168Finding interesting elements
169^^^^^^^^^^^^^^^^^^^^^^^^^^^^
170
171:class:`Element` has some useful methods that help iterate recursively over all
172the sub-tree below it (its children, their children, and so on). For example,
173:meth:`Element.iter`::
174
175 >>> for neighbor in root.iter('neighbor'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300176 ... print(neighbor.attrib)
Eli Benderskyc1d98692012-03-30 11:44:15 +0300177 ...
178 {'name': 'Austria', 'direction': 'E'}
179 {'name': 'Switzerland', 'direction': 'W'}
180 {'name': 'Malaysia', 'direction': 'N'}
181 {'name': 'Costa Rica', 'direction': 'W'}
182 {'name': 'Colombia', 'direction': 'E'}
183
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300184:meth:`Element.findall` finds only elements with a tag which are direct
185children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200186with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300187content. :meth:`Element.get` accesses the element's attributes::
188
189 >>> for country in root.findall('country'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300190 ... rank = country.find('rank').text
191 ... name = country.get('name')
192 ... print(name, rank)
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300193 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300194 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300195 Singapore 4
196 Panama 68
197
Eli Benderskyc1d98692012-03-30 11:44:15 +0300198More sophisticated specification of which elements to look for is possible by
199using :ref:`XPath <elementtree-xpath>`.
200
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300201Modifying an XML File
202^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300203
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300204:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300205The :meth:`ElementTree.write` method serves this purpose.
206
207Once created, an :class:`Element` object may be manipulated by directly changing
208its fields (such as :attr:`Element.text`), adding and modifying attributes
209(:meth:`Element.set` method), as well as adding new children (for example
210with :meth:`Element.append`).
211
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300212Let's say we want to add one to each country's rank, and add an ``updated``
213attribute to the rank element::
214
215 >>> for rank in root.iter('rank'):
Serhiy Storchakadba90392016-05-10 12:01:23 +0300216 ... new_rank = int(rank.text) + 1
217 ... rank.text = str(new_rank)
218 ... rank.set('updated', 'yes')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300219 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300220 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300221
222Our XML now looks like this:
223
224.. code-block:: xml
225
226 <?xml version="1.0"?>
227 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300228 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300229 <rank updated="yes">2</rank>
230 <year>2008</year>
231 <gdppc>141100</gdppc>
232 <neighbor name="Austria" direction="E"/>
233 <neighbor name="Switzerland" direction="W"/>
234 </country>
235 <country name="Singapore">
236 <rank updated="yes">5</rank>
237 <year>2011</year>
238 <gdppc>59900</gdppc>
239 <neighbor name="Malaysia" direction="N"/>
240 </country>
241 <country name="Panama">
242 <rank updated="yes">69</rank>
243 <year>2011</year>
244 <gdppc>13600</gdppc>
245 <neighbor name="Costa Rica" direction="W"/>
246 <neighbor name="Colombia" direction="E"/>
247 </country>
248 </data>
249
250We can remove elements using :meth:`Element.remove`. Let's say we want to
251remove all countries with a rank higher than 50::
252
253 >>> for country in root.findall('country'):
scoder40db7982020-10-05 01:13:46 +0200254 ... # using root.findall() to avoid removal during traversal
Serhiy Storchakadba90392016-05-10 12:01:23 +0300255 ... rank = int(country.find('rank').text)
256 ... if rank > 50:
257 ... root.remove(country)
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300258 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300259 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300260
scoder40db7982020-10-05 01:13:46 +0200261Note that concurrent modification while iterating can lead to problems,
262just like when iterating and modifying Python lists or dicts.
263Therefore, the example first collects all matching elements with
264``root.findall()``, and only then iterates over the list of matches.
265
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300266Our XML now looks like this:
267
268.. code-block:: xml
269
270 <?xml version="1.0"?>
271 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300272 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300273 <rank updated="yes">2</rank>
274 <year>2008</year>
275 <gdppc>141100</gdppc>
276 <neighbor name="Austria" direction="E"/>
277 <neighbor name="Switzerland" direction="W"/>
278 </country>
279 <country name="Singapore">
280 <rank updated="yes">5</rank>
281 <year>2011</year>
282 <gdppc>59900</gdppc>
283 <neighbor name="Malaysia" direction="N"/>
284 </country>
285 </data>
286
287Building XML documents
288^^^^^^^^^^^^^^^^^^^^^^
289
Eli Benderskyc1d98692012-03-30 11:44:15 +0300290The :func:`SubElement` function also provides a convenient way to create new
291sub-elements for a given element::
292
293 >>> a = ET.Element('a')
294 >>> b = ET.SubElement(a, 'b')
295 >>> c = ET.SubElement(a, 'c')
296 >>> d = ET.SubElement(c, 'd')
297 >>> ET.dump(a)
298 <a><b /><c><d /></c></a>
299
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700300Parsing XML with Namespaces
301^^^^^^^^^^^^^^^^^^^^^^^^^^^
302
303If the XML input has `namespaces
304<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
305with prefixes in the form ``prefix:sometag`` get expanded to
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700306``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
307Also, if there is a `default namespace
sblondon8d1f2f42018-02-10 23:39:43 +0100308<https://www.w3.org/TR/xml-names/#defaulting>`__,
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700309that full URI gets prepended to all of the non-prefixed tags.
310
311Here is an XML example that incorporates two namespaces, one with the
312prefix "fictional" and the other serving as the default namespace:
313
314.. code-block:: xml
315
316 <?xml version="1.0"?>
317 <actors xmlns:fictional="http://characters.example.com"
318 xmlns="http://people.example.com">
319 <actor>
320 <name>John Cleese</name>
321 <fictional:character>Lancelot</fictional:character>
322 <fictional:character>Archie Leach</fictional:character>
323 </actor>
324 <actor>
325 <name>Eric Idle</name>
326 <fictional:character>Sir Robin</fictional:character>
327 <fictional:character>Gunther</fictional:character>
328 <fictional:character>Commander Clement</fictional:character>
329 </actor>
330 </actors>
331
332One way to search and explore this XML example is to manually add the
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700333URI to every tag or attribute in the xpath of a
334:meth:`~Element.find` or :meth:`~Element.findall`::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700335
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700336 root = fromstring(xml_text)
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700337 for actor in root.findall('{http://people.example.com}actor'):
338 name = actor.find('{http://people.example.com}name')
339 print(name.text)
340 for char in actor.findall('{http://characters.example.com}character'):
341 print(' |-->', char.text)
342
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700343A better way to search the namespaced XML example is to create a
344dictionary with your own prefixes and use those in the search functions::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700345
346 ns = {'real_person': 'http://people.example.com',
347 'role': 'http://characters.example.com'}
348
349 for actor in root.findall('real_person:actor', ns):
350 name = actor.find('real_person:name', ns)
351 print(name.text)
352 for char in actor.findall('role:character', ns):
353 print(' |-->', char.text)
354
355These two approaches both output::
356
357 John Cleese
358 |--> Lancelot
359 |--> Archie Leach
360 Eric Idle
361 |--> Sir Robin
362 |--> Gunther
363 |--> Commander Clement
364
365
Eli Benderskyc1d98692012-03-30 11:44:15 +0300366Additional resources
367^^^^^^^^^^^^^^^^^^^^
368
369See http://effbot.org/zone/element-index.htm for tutorials and links to other
370docs.
371
372
373.. _elementtree-xpath:
374
375XPath support
376-------------
377
378This module provides limited support for
Serhiy Storchaka6dff0202016-05-07 10:49:07 +0300379`XPath expressions <https://www.w3.org/TR/xpath>`_ for locating elements in a
Eli Benderskyc1d98692012-03-30 11:44:15 +0300380tree. The goal is to support a small subset of the abbreviated syntax; a full
381XPath engine is outside the scope of the module.
382
383Example
384^^^^^^^
385
386Here's an example that demonstrates some of the XPath capabilities of the
387module. We'll be using the ``countrydata`` XML document from the
388:ref:`Parsing XML <elementtree-parsing-xml>` section::
389
390 import xml.etree.ElementTree as ET
391
392 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200393
394 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300395 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200396
397 # All 'neighbor' grand-children of 'country' children of the top-level
398 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300399 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200400
401 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300402 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200403
404 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300405 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200406
407 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300408 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200409
Stefan Behnel47541682019-05-03 20:58:16 +0200410For XML with namespaces, use the usual qualified ``{namespace}tag`` notation::
411
412 # All dublin-core "title" tags in the document
413 root.findall(".//{http://purl.org/dc/elements/1.1/}title")
414
415
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200416Supported XPath syntax
417^^^^^^^^^^^^^^^^^^^^^^
418
Georg Brandl44ea77b2013-03-28 13:28:44 +0100419.. tabularcolumns:: |l|L|
420
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200421+-----------------------+------------------------------------------------------+
422| Syntax | Meaning |
423+=======================+======================================================+
424| ``tag`` | Selects all child elements with the given tag. |
425| | For example, ``spam`` selects all child elements |
Raymond Hettinger1e1e6012014-03-29 11:50:08 -0700426| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200427| | grandchildren named ``egg`` in all children named |
Stefan Behnel47541682019-05-03 20:58:16 +0200428| | ``spam``. ``{namespace}*`` selects all tags in the |
429| | given namespace, ``{*}spam`` selects tags named |
430| | ``spam`` in any (or no) namespace, and ``{}*`` |
431| | only selects tags that are not in a namespace. |
432| | |
433| | .. versionchanged:: 3.8 |
434| | Support for star-wildcards was added. |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200435+-----------------------+------------------------------------------------------+
Stefan Behnel47541682019-05-03 20:58:16 +0200436| ``*`` | Selects all child elements, including comments and |
437| | processing instructions. For example, ``*/egg`` |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200438| | selects all grandchildren named ``egg``. |
439+-----------------------+------------------------------------------------------+
440| ``.`` | Selects the current node. This is mostly useful |
441| | at the beginning of the path, to indicate that it's |
442| | a relative path. |
443+-----------------------+------------------------------------------------------+
444| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200445| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200446| | all ``egg`` elements in the entire tree. |
447+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700448| ``..`` | Selects the parent element. Returns ``None`` if the |
449| | path attempts to reach the ancestors of the start |
450| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200451+-----------------------+------------------------------------------------------+
452| ``[@attrib]`` | Selects all elements that have the given attribute. |
453+-----------------------+------------------------------------------------------+
454| ``[@attrib='value']`` | Selects all elements for which the given attribute |
455| | has the given value. The value cannot contain |
456| | quotes. |
457+-----------------------+------------------------------------------------------+
Ammar Askar97e8b1e2020-11-09 02:02:39 -0500458| ``[@attrib!='value']``| Selects all elements for which the given attribute |
459| | does not have the given value. The value cannot |
460| | contain quotes. |
461| | |
462| | .. versionadded:: 3.10 |
463+-----------------------+------------------------------------------------------+
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200464| ``[tag]`` | Selects all elements that have a child named |
465| | ``tag``. Only immediate children are supported. |
466+-----------------------+------------------------------------------------------+
scoder101a5e82017-09-30 15:35:21 +0200467| ``[.='text']`` | Selects all elements whose complete text content, |
468| | including descendants, equals the given ``text``. |
469| | |
470| | .. versionadded:: 3.7 |
471+-----------------------+------------------------------------------------------+
Ammar Askar97e8b1e2020-11-09 02:02:39 -0500472| ``[.!='text']`` | Selects all elements whose complete text content, |
473| | including descendants, does not equal the given |
474| | ``text``. |
475| | |
476| | .. versionadded:: 3.10 |
477+-----------------------+------------------------------------------------------+
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700478| ``[tag='text']`` | Selects all elements that have a child named |
479| | ``tag`` whose complete text content, including |
480| | descendants, equals the given ``text``. |
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700481+-----------------------+------------------------------------------------------+
Ammar Askar97e8b1e2020-11-09 02:02:39 -0500482| ``[tag!='text']`` | Selects all elements that have a child named |
483| | ``tag`` whose complete text content, including |
484| | descendants, does not equal the given ``text``. |
485| | |
486| | .. versionadded:: 3.10 |
487+-----------------------+------------------------------------------------------+
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200488| ``[position]`` | Selects all elements that are located at the given |
489| | position. The position can be either an integer |
490| | (1 is the first position), the expression ``last()`` |
491| | (for the last position), or a position relative to |
492| | the last position (e.g. ``last()-1``). |
493+-----------------------+------------------------------------------------------+
494
495Predicates (expressions within square brackets) must be preceded by a tag
496name, an asterisk, or another predicate. ``position`` predicates must be
497preceded by a tag name.
498
499Reference
500---------
501
Georg Brandl116aa622007-08-15 14:28:22 +0000502.. _elementtree-functions:
503
504Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200505^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000506
Stefan Behnele1d5dd62019-05-01 22:34:13 +0200507.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)
508
509 `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.
510
511 Canonicalization is a way to normalise XML output in a way that allows
512 byte-by-byte comparisons and digital signatures. It reduced the freedom
513 that XML serializers have and instead generates a more constrained XML
514 representation. The main restrictions regard the placement of namespace
515 declarations, the ordering of attributes, and ignorable whitespace.
516
517 This function takes an XML data string (*xml_data*) or a file path or
518 file-like object (*from_file*) as input, converts it to the canonical
519 form, and writes it out using the *out* file(-like) object, if provided,
520 or returns it as a text string if not. The output file receives text,
521 not bytes. It should therefore be opened in text mode with ``utf-8``
522 encoding.
523
524 Typical uses::
525
526 xml_data = "<root>...</root>"
527 print(canonicalize(xml_data))
528
529 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
530 canonicalize(xml_data, out=out_file)
531
532 with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
533 canonicalize(from_file="inputfile.xml", out=out_file)
534
535 The configuration *options* are as follows:
536
537 - *with_comments*: set to true to include comments (default: false)
538 - *strip_text*: set to true to strip whitespace before and after text content
539 (default: false)
540 - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
541 (default: false)
542 - *qname_aware_tags*: a set of qname aware tag names in which prefixes
543 should be replaced in text content (default: empty)
544 - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
545 should be replaced in text content (default: empty)
546 - *exclude_attrs*: a set of attribute names that should not be serialised
547 - *exclude_tags*: a set of tag names that should not be serialised
548
549 In the option list above, "a set" refers to any collection or iterable of
550 strings, no ordering is expected.
551
552 .. versionadded:: 3.8
553
Georg Brandl116aa622007-08-15 14:28:22 +0000554
Georg Brandl7f01a132009-09-16 15:58:14 +0000555.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000556
Georg Brandlf6945182008-02-01 11:56:49 +0000557 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000558 that will be serialized as an XML comment by the standard serializer. The
559 comment string can be either a bytestring or a Unicode string. *text* is a
560 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000561 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000562
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700563 Note that :class:`XMLParser` skips over comments in the input
564 instead of creating comment objects for them. An :class:`ElementTree` will
565 only contain comment nodes if they have been inserted into to
566 the tree using one of the :class:`Element` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000567
568.. function:: dump(elem)
569
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000570 Writes an element tree or element structure to sys.stdout. This function
571 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000572
573 The exact output format is implementation dependent. In this version, it's
574 written as an ordinary XML file.
575
576 *elem* is an element tree or an individual element.
577
Raymond Hettingere3685fd2018-10-28 11:18:22 -0700578 .. versionchanged:: 3.8
579 The :func:`dump` function now preserves the attribute order specified
580 by the user.
581
Georg Brandl116aa622007-08-15 14:28:22 +0000582
Manjusakae5458bd2019-02-22 08:33:57 +0800583.. function:: fromstring(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000584
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000585 Parses an XML section from a string constant. Same as :func:`XML`. *text*
Manjusakae5458bd2019-02-22 08:33:57 +0800586 is a string containing XML data. *parser* is an optional parser instance.
587 If not given, the standard :class:`XMLParser` parser is used.
588 Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000589
590
591.. function:: fromstringlist(sequence, parser=None)
592
593 Parses an XML document from a sequence of string fragments. *sequence* is a
594 list or other sequence containing XML data fragments. *parser* is an
595 optional parser instance. If not given, the standard :class:`XMLParser`
596 parser is used. Returns an :class:`Element` instance.
597
Ezio Melottif8754a62010-03-21 07:16:43 +0000598 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000599
600
Stefan Behnelb5d3cee2019-08-23 16:44:25 +0200601.. function:: indent(tree, space=" ", level=0)
602
603 Appends whitespace to the subtree to indent the tree visually.
604 This can be used to generate pretty-printed XML output.
605 *tree* can be an Element or ElementTree. *space* is the whitespace
606 string that will be inserted for each indentation level, two space
607 characters by default. For indenting partial subtrees inside of an
608 already indented tree, pass the initial indentation level as *level*.
609
610 .. versionadded:: 3.9
611
612
Georg Brandl116aa622007-08-15 14:28:22 +0000613.. function:: iselement(element)
614
Serhiy Storchaka138ccbb2019-11-12 16:57:03 +0200615 Check if an object appears to be a valid element object. *element* is an
616 element instance. Return ``True`` if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000617
618
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000619.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000620
621 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200622 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700623 containing XML data. *events* is a sequence of events to report back. The
Stefan Behnel43851a22019-05-01 21:20:38 +0200624 supported events are the strings ``"start"``, ``"end"``, ``"comment"``,
625 ``"pi"``, ``"start-ns"`` and ``"end-ns"``
626 (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200627 information). If *events* is omitted, only ``"end"`` events are reported.
628 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700629 :class:`XMLParser` parser is used. *parser* must be a subclass of
630 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
631 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000632
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700633 Note that while :func:`iterparse` builds the tree incrementally, it issues
634 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700635 for applications where blocking reads can't be made. For fully non-blocking
636 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700637
Benjamin Peterson75edad02009-01-01 15:05:06 +0000638 .. note::
639
Eli Benderskyb5869342013-08-30 05:51:20 -0700640 :func:`iterparse` only guarantees that it has seen the ">" character of a
641 starting tag when it emits a "start" event, so the attributes are defined,
642 but the contents of the text and tail attributes are undefined at that
643 point. The same applies to the element children; they may or may not be
644 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000645
646 If you need a fully populated element, look for "end" events instead.
647
Eli Benderskyb5869342013-08-30 05:51:20 -0700648 .. deprecated:: 3.4
649 The *parser* argument.
650
Stefan Behnel43851a22019-05-01 21:20:38 +0200651 .. versionchanged:: 3.8
652 The ``comment`` and ``pi`` events were added.
653
654
Georg Brandl7f01a132009-09-16 15:58:14 +0000655.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000656
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000657 Parses an XML section into an element tree. *source* is a filename or file
658 object containing XML data. *parser* is an optional parser instance. If
659 not given, the standard :class:`XMLParser` parser is used. Returns an
660 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
662
Georg Brandl7f01a132009-09-16 15:58:14 +0000663.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000664
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000665 PI element factory. This factory function creates a special element that
666 will be serialized as an XML processing instruction. *target* is a string
667 containing the PI target. *text* is a string containing the PI contents, if
668 given. Returns an element instance, representing a processing instruction.
669
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700670 Note that :class:`XMLParser` skips over processing instructions
671 in the input instead of creating comment objects for them. An
672 :class:`ElementTree` will only contain processing instruction nodes if
673 they have been inserted into to the tree using one of the
674 :class:`Element` methods.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000675
676.. function:: register_namespace(prefix, uri)
677
678 Registers a namespace prefix. The registry is global, and any existing
679 mapping for either the given prefix or the namespace URI will be removed.
680 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
681 attributes in this namespace will be serialized with the given prefix, if at
682 all possible.
683
Ezio Melottif8754a62010-03-21 07:16:43 +0000684 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000685
686
Georg Brandl7f01a132009-09-16 15:58:14 +0000687.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000688
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000689 Subelement factory. This function creates an element instance, and appends
690 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000691
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000692 The element name, attribute names, and attribute values can be either
693 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
694 the subelement name. *attrib* is an optional dictionary, containing element
695 attributes. *extra* contains additional attributes, given as keyword
696 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000697
698
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200699.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Stefan Behnela3697db2019-07-24 20:22:50 +0200700 xml_declaration=None, default_namespace=None, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800701 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000703 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000704 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000705 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700706 generate a Unicode string (otherwise, a bytestring is generated). *method*
707 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200708 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
709 meaning as in :meth:`ElementTree.write`. Returns an (optionally) encoded string
710 containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000711
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800712 .. versionadded:: 3.4
713 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000714
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200715 .. versionadded:: 3.8
716 The *xml_declaration* and *default_namespace* parameters.
717
Stefan Behnela3697db2019-07-24 20:22:50 +0200718 .. versionchanged:: 3.8
719 The :func:`tostring` function now preserves the attribute order
720 specified by the user.
721
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800722
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200723.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Stefan Behnela3697db2019-07-24 20:22:50 +0200724 xml_declaration=None, default_namespace=None, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800725 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000726
727 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000728 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000729 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700730 generate a Unicode string (otherwise, a bytestring is generated). *method*
731 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200732 *xml_declaration*, *default_namespace* and *short_empty_elements* has the same
733 meaning as in :meth:`ElementTree.write`. Returns a list of (optionally) encoded
734 strings containing the XML data. It does not guarantee any specific sequence,
735 except that ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000736
Ezio Melottif8754a62010-03-21 07:16:43 +0000737 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000738
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800739 .. versionadded:: 3.4
740 The *short_empty_elements* parameter.
741
Bernt Røskar Brennaffca16e2019-04-14 10:07:02 +0200742 .. versionadded:: 3.8
743 The *xml_declaration* and *default_namespace* parameters.
744
Stefan Behnela3697db2019-07-24 20:22:50 +0200745 .. versionchanged:: 3.8
746 The :func:`tostringlist` function now preserves the attribute order
747 specified by the user.
748
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000749
750.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000751
752 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000753 embed "XML literals" in Python code. *text* is a string containing XML
754 data. *parser* is an optional parser instance. If not given, the standard
755 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000756
757
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000758.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000759
760 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000761 which maps from element id:s to elements. *text* is a string containing XML
762 data. *parser* is an optional parser instance. If not given, the standard
763 :class:`XMLParser` parser is used. Returns a tuple containing an
764 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000765
766
Anjali Bansal97b817e2019-09-11 19:39:53 +0530767.. _elementtree-xinclude:
768
769XInclude support
770----------------
771
772This module provides limited support for
773`XInclude directives <https://www.w3.org/TR/xinclude/>`_, via the :mod:`xml.etree.ElementInclude` helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree.
774
775Example
776^^^^^^^
777
778Here's an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the ``{http://www.w3.org/2001/XInclude}include`` element and set the **parse** attribute to ``"xml"``, and use the **href** attribute to specify the document to include.
779
780.. code-block:: xml
781
782 <?xml version="1.0"?>
783 <document xmlns:xi="http://www.w3.org/2001/XInclude">
784 <xi:include href="source.xml" parse="xml" />
785 </document>
786
787By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax.
788
789To process this file, load it as usual, and pass the root element to the :mod:`xml.etree.ElementTree` module:
790
791.. code-block:: python
792
793 from xml.etree import ElementTree, ElementInclude
794
795 tree = ElementTree.parse("document.xml")
796 root = tree.getroot()
797
798 ElementInclude.include(root)
799
800The ElementInclude module replaces the ``{http://www.w3.org/2001/XInclude}include`` element with the root element from the **source.xml** document. The result might look something like this:
801
802.. code-block:: xml
803
804 <document xmlns:xi="http://www.w3.org/2001/XInclude">
805 <para>This is a paragraph.</para>
806 </document>
807
808If the **parse** attribute is omitted, it defaults to "xml". The href attribute is required.
809
810To include a text document, use the ``{http://www.w3.org/2001/XInclude}include`` element, and set the **parse** attribute to "text":
811
812.. code-block:: xml
813
814 <?xml version="1.0"?>
815 <document xmlns:xi="http://www.w3.org/2001/XInclude">
816 Copyright (c) <xi:include href="year.txt" parse="text" />.
817 </document>
818
819The result might look something like:
820
821.. code-block:: xml
822
823 <document xmlns:xi="http://www.w3.org/2001/XInclude">
824 Copyright (c) 2003.
825 </document>
826
827Reference
828---------
829
830.. _elementinclude-functions:
831
832Functions
833^^^^^^^^^
834
835.. function:: xml.etree.ElementInclude.default_loader( href, parse, encoding=None)
836
837 Default loader. This default loader reads an included resource from disk. *href* is a URL.
838 *parse* is for parse mode either "xml" or "text". *encoding*
839 is an optional text encoding. If not given, encoding is ``utf-8``. Returns the
840 expanded resource. If the parse mode is ``"xml"``, this is an ElementTree
841 instance. If the parse mode is "text", this is a Unicode string. If the
842 loader fails, it can return None or raise an exception.
843
844
Shantanu301f0d42020-06-08 07:11:44 -0700845.. function:: xml.etree.ElementInclude.include( elem, loader=None, base_url=None, \
846 max_depth=6)
Anjali Bansal97b817e2019-09-11 19:39:53 +0530847
848 This function expands XInclude directives. *elem* is the root element. *loader* is
849 an optional resource loader. If omitted, it defaults to :func:`default_loader`.
850 If given, it should be a callable that implements the same interface as
Shantanu301f0d42020-06-08 07:11:44 -0700851 :func:`default_loader`. *base_url* is base URL of the original file, to resolve
852 relative include file references. *max_depth* is the maximum number of recursive
853 inclusions. Limited to reduce the risk of malicious content explosion. Pass a
854 negative value to disable the limitation.
855
856 Returns the expanded resource. If the parse mode is
Anjali Bansal97b817e2019-09-11 19:39:53 +0530857 ``"xml"``, this is an ElementTree instance. If the parse mode is "text",
858 this is a Unicode string. If the loader fails, it can return None or
859 raise an exception.
860
Shantanu301f0d42020-06-08 07:11:44 -0700861 .. versionadded:: 3.9
862 The *base_url* and *max_depth* parameters.
863
Anjali Bansal97b817e2019-09-11 19:39:53 +0530864
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000865.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000866
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000867Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200868^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000869
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000870.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000871
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000872 Element class. This class defines the Element interface, and provides a
873 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000874
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000875 The element name, attribute names, and attribute values can be either
876 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
877 an optional dictionary, containing element attributes. *extra* contains
878 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000879
880
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000881 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000882
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000883 A string identifying what kind of data this element represents (the
884 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000885
886
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000887 .. attribute:: text
Ned Deilyeca04452015-08-17 22:11:17 -0400888 tail
Georg Brandl116aa622007-08-15 14:28:22 +0000889
Ned Deilyeca04452015-08-17 22:11:17 -0400890 These attributes can be used to hold additional data associated with
891 the element. Their values are usually strings but may be any
892 application-specific object. If the element is created from
893 an XML file, the *text* attribute holds either the text between
894 the element's start tag and its first child or end tag, or ``None``, and
895 the *tail* attribute holds either the text between the element's
896 end tag and the next tag, or ``None``. For the XML data
Georg Brandl116aa622007-08-15 14:28:22 +0000897
Ned Deilyeca04452015-08-17 22:11:17 -0400898 .. code-block:: xml
Georg Brandl116aa622007-08-15 14:28:22 +0000899
Ned Deilyeca04452015-08-17 22:11:17 -0400900 <a><b>1<c>2<d/>3</c></b>4</a>
Georg Brandl116aa622007-08-15 14:28:22 +0000901
Ned Deilyeca04452015-08-17 22:11:17 -0400902 the *a* element has ``None`` for both *text* and *tail* attributes,
903 the *b* element has *text* ``"1"`` and *tail* ``"4"``,
904 the *c* element has *text* ``"2"`` and *tail* ``None``,
905 and the *d* element has *text* ``None`` and *tail* ``"3"``.
906
907 To collect the inner text of an element, see :meth:`itertext`, for
908 example ``"".join(element.itertext())``.
909
910 Applications may store arbitrary objects in these attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000911
Georg Brandl116aa622007-08-15 14:28:22 +0000912
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000913 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000914
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000915 A dictionary containing the element's attributes. Note that while the
916 *attrib* value is always a real mutable Python dictionary, an ElementTree
917 implementation may choose to use another internal representation, and
918 create the dictionary only if someone asks for it. To take advantage of
919 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000920
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000921 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000922
923
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000924 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000925
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000926 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700927 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000928
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000929
930 .. method:: get(key, default=None)
931
932 Gets the element attribute named *key*.
933
934 Returns the attribute value, or *default* if the attribute was not found.
935
936
937 .. method:: items()
938
939 Returns the element attributes as a sequence of (name, value) pairs. The
940 attributes are returned in an arbitrary order.
941
942
943 .. method:: keys()
944
945 Returns the elements attribute names as a list. The names are returned
946 in an arbitrary order.
947
948
949 .. method:: set(key, value)
950
951 Set the attribute *key* on the element to *value*.
952
953 The following methods work on the element's children (subelements).
954
955
956 .. method:: append(subelement)
957
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200958 Adds the element *subelement* to the end of this element's internal list
959 of subelements. Raises :exc:`TypeError` if *subelement* is not an
960 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000961
962
963 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000964
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000965 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200966 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000967
Ezio Melottif8754a62010-03-21 07:16:43 +0000968 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000969
Georg Brandl116aa622007-08-15 14:28:22 +0000970
Eli Bendersky737b1732012-05-29 06:02:56 +0300971 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000972
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000973 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200974 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300975 or ``None``. *namespaces* is an optional mapping from namespace prefix
Stefan Behnele8113f52019-04-18 19:05:03 +0200976 to full name. Pass ``''`` as prefix to move all unprefixed tag names
Stefan Behnele9927e12019-04-14 10:09:09 +0200977 in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000978
Georg Brandl116aa622007-08-15 14:28:22 +0000979
Eli Bendersky737b1732012-05-29 06:02:56 +0300980 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000981
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200982 Finds all matching subelements, by tag name or
983 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300984 elements in document order. *namespaces* is an optional mapping from
Stefan Behnele8113f52019-04-18 19:05:03 +0200985 namespace prefix to full name. Pass ``''`` as prefix to move all
Stefan Behnele9927e12019-04-14 10:09:09 +0200986 unprefixed tag names in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000987
Georg Brandl116aa622007-08-15 14:28:22 +0000988
Eli Bendersky737b1732012-05-29 06:02:56 +0300989 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000990
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000991 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200992 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
993 of the first matching element, or *default* if no element was found.
994 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300995 is returned. *namespaces* is an optional mapping from namespace prefix
Stefan Behnele8113f52019-04-18 19:05:03 +0200996 to full name. Pass ``''`` as prefix to move all unprefixed tag names
Stefan Behnele9927e12019-04-14 10:09:09 +0200997 in the expression into the given namespace.
Georg Brandl116aa622007-08-15 14:28:22 +0000998
Georg Brandl116aa622007-08-15 14:28:22 +0000999
Eli Bendersky396e8fc2012-03-23 14:24:20 +02001000 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +00001001
Eli Bendersky396e8fc2012-03-23 14:24:20 +02001002 Inserts *subelement* at the given position in this element. Raises
1003 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +00001004
Georg Brandl116aa622007-08-15 14:28:22 +00001005
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001006 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001007
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001008 Creates a tree :term:`iterator` with the current element as the root.
1009 The iterator iterates over this element and all elements below it, in
1010 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
1011 elements whose tag equals *tag* are returned from the iterator. If the
1012 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +00001013
Ezio Melotti138fc892011-10-10 00:02:03 +03001014 .. versionadded:: 3.2
1015
Georg Brandl116aa622007-08-15 14:28:22 +00001016
Eli Bendersky737b1732012-05-29 06:02:56 +03001017 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001018
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001019 Finds all matching subelements, by tag name or
1020 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +03001021 matching elements in document order. *namespaces* is an optional mapping
1022 from namespace prefix to full name.
1023
Georg Brandl116aa622007-08-15 14:28:22 +00001024
Ezio Melottif8754a62010-03-21 07:16:43 +00001025 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001026
Georg Brandl116aa622007-08-15 14:28:22 +00001027
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001028 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +00001029
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001030 Creates a text iterator. The iterator loops over this element and all
1031 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +00001032
Ezio Melottif8754a62010-03-21 07:16:43 +00001033 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001034
1035
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001036 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +00001037
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001038 Creates a new element object of the same type as this element. Do not
1039 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +00001040
1041
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001042 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +00001043
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001044 Removes *subelement* from the element. Unlike the find\* methods this
1045 method compares elements based on the instance identity, not on tag value
1046 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +00001047
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001048 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +03001049 for working with subelements: :meth:`~object.__delitem__`,
1050 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
1051 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +00001052
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001053 Caution: Elements with no subelements will test as ``False``. This behavior
1054 will change in future versions. Use specific ``len(elem)`` or ``elem is
1055 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +00001056
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001057 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +00001058
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001059 if not element: # careful!
1060 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +00001061
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001062 if element is None:
1063 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +00001064
Stefan Behnela3697db2019-07-24 20:22:50 +02001065 Prior to Python 3.8, the serialisation order of the XML attributes of
1066 elements was artificially made predictable by sorting the attributes by
1067 their name. Based on the now guaranteed ordering of dicts, this arbitrary
1068 reordering was removed in Python 3.8 to preserve the order in which
1069 attributes were originally parsed or created by user code.
1070
1071 In general, user code should try not to depend on a specific ordering of
1072 attributes, given that the `XML Information Set
1073 <https://www.w3.org/TR/xml-infoset/>`_ explicitly excludes the attribute
1074 order from conveying information. Code should be prepared to deal with
1075 any ordering on input. In cases where deterministic XML output is required,
1076 e.g. for cryptographic signing or test data sets, canonical serialisation
1077 is available with the :func:`canonicalize` function.
1078
1079 In cases where canonical output is not applicable but a specific attribute
1080 order is still desirable on output, code should aim for creating the
1081 attributes directly in the desired order, to avoid perceptual mismatches
1082 for readers of the code. In cases where this is difficult to achieve, a
1083 recipe like the following can be applied prior to serialisation to enforce
1084 an order independently from the Element creation::
1085
1086 def reorder_attributes(root):
1087 for el in root.iter():
1088 attrib = el.attrib
1089 if len(attrib) > 1:
1090 # adjust attribute order, e.g. by sorting
1091 attribs = sorted(attrib.items())
1092 attrib.clear()
1093 attrib.update(attribs)
1094
Georg Brandl116aa622007-08-15 14:28:22 +00001095
1096.. _elementtree-elementtree-objects:
1097
1098ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001099^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001100
1101
Georg Brandl7f01a132009-09-16 15:58:14 +00001102.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001103
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001104 ElementTree wrapper class. This class represents an entire element
1105 hierarchy, and adds some extra support for serialization to and from
1106 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +00001107
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001108 *element* is the root element. The tree is initialized with the contents
1109 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +00001110
1111
Benjamin Petersone41251e2008-04-25 01:59:09 +00001112 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +00001113
Benjamin Petersone41251e2008-04-25 01:59:09 +00001114 Replaces the root element for this tree. This discards the current
1115 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001116 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001117
1118
Eli Bendersky737b1732012-05-29 06:02:56 +03001119 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001120
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001121 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001122
1123
Eli Bendersky737b1732012-05-29 06:02:56 +03001124 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001125
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001126 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001127
1128
Eli Bendersky737b1732012-05-29 06:02:56 +03001129 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001130
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001131 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001132
1133
Benjamin Petersone41251e2008-04-25 01:59:09 +00001134 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +00001135
Benjamin Petersone41251e2008-04-25 01:59:09 +00001136 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +00001137
1138
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001139 .. method:: iter(tag=None)
1140
1141 Creates and returns a tree iterator for the root element. The iterator
1142 loops over all elements in this tree, in section order. *tag* is the tag
Martin Panterd21e0b52015-10-10 10:36:22 +00001143 to look for (default is to return all elements).
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001144
1145
Eli Bendersky737b1732012-05-29 06:02:56 +03001146 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001147
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001148 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001149
Ezio Melottif8754a62010-03-21 07:16:43 +00001150 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001151
1152
Georg Brandl7f01a132009-09-16 15:58:14 +00001153 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001154
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001155 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +00001156 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +03001157 If not given, the standard :class:`XMLParser` parser is used. Returns the
1158 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +00001159
1160
Eli Benderskyf96cf912012-07-15 06:19:44 +03001161 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +02001162 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -08001163 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +00001164
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001165 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +03001166 :term:`file object` opened for writing. *encoding* [1]_ is the output
1167 encoding (default is US-ASCII).
1168 *xml_declaration* controls if an XML declaration should be added to the
1169 file. Use ``False`` for never, ``True`` for always, ``None``
1170 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +02001171 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +03001172 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
1173 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001174 The keyword-only *short_empty_elements* parameter controls the formatting
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +03001175 of elements that contain no content. If ``True`` (the default), they are
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001176 emitted as a single self-closed tag, otherwise they are emitted as a pair
1177 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +03001178
1179 The output is either a string (:class:`str`) or binary (:class:`bytes`).
1180 This is controlled by the *encoding* argument. If *encoding* is
1181 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
1182 this may conflict with the type of *file* if it's an open
1183 :term:`file object`; make sure you do not try to write a string to a
1184 binary stream and vice versa.
1185
R David Murray575fb312013-12-25 23:21:03 -05001186 .. versionadded:: 3.4
1187 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -08001188
Raymond Hettingere3685fd2018-10-28 11:18:22 -07001189 .. versionchanged:: 3.8
1190 The :meth:`write` method now preserves the attribute order specified
1191 by the user.
1192
Georg Brandl116aa622007-08-15 14:28:22 +00001193
Christian Heimesd8654cf2007-12-02 15:22:16 +00001194This is the XML file that is going to be manipulated::
1195
1196 <html>
1197 <head>
1198 <title>Example page</title>
1199 </head>
1200 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +00001201 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +00001202 or <a href="http://example.com/">example.com</a>.</p>
1203 </body>
1204 </html>
1205
1206Example of changing the attribute "target" of every link in first paragraph::
1207
1208 >>> from xml.etree.ElementTree import ElementTree
1209 >>> tree = ElementTree()
1210 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001211 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +00001212 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
1213 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001214 <Element 'p' at 0xb77ec26c>
1215 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +00001216 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001217 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +00001218 >>> for i in links: # Iterates through all found links
1219 ... i.attrib["target"] = "blank"
1220 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +00001221
1222.. _elementtree-qname-objects:
1223
1224QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001225^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001226
1227
Georg Brandl7f01a132009-09-16 15:58:14 +00001228.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +00001229
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001230 QName wrapper. This can be used to wrap a QName attribute value, in order
1231 to get proper namespace handling on output. *text_or_uri* is a string
1232 containing the QName value, in the form {uri}local, or, if the tag argument
1233 is given, the URI part of a QName. If *tag* is given, the first argument is
Martin Panter6245cb32016-04-15 02:14:19 +00001234 interpreted as a URI, and this argument is interpreted as a local name.
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001235 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +00001236
1237
Antoine Pitrou5b235d02013-04-18 19:37:06 +02001238
Georg Brandl116aa622007-08-15 14:28:22 +00001239.. _elementtree-treebuilder-objects:
1240
1241TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001242^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +00001243
1244
Stefan Behnel43851a22019-05-01 21:20:38 +02001245.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \
1246 pi_factory=None, insert_comments=False, insert_pis=False)
Georg Brandl116aa622007-08-15 14:28:22 +00001247
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001248 Generic element structure builder. This builder converts a sequence of
Stefan Behnel43851a22019-05-01 21:20:38 +02001249 start, data, end, comment and pi method calls to a well-formed element
1250 structure. You can use this class to build an element structure using
1251 a custom XML parser, or a parser for some other XML-like format.
1252
1253 *element_factory*, when given, must be a callable accepting two positional
1254 arguments: a tag and a dict of attributes. It is expected to return a new
1255 element instance.
1256
1257 The *comment_factory* and *pi_factory* functions, when given, should behave
1258 like the :func:`Comment` and :func:`ProcessingInstruction` functions to
1259 create comments and processing instructions. When not given, the default
1260 factories will be used. When *insert_comments* and/or *insert_pis* is true,
1261 comments/pis will be inserted into the tree if they appear within the root
1262 element (but not outside of it).
Georg Brandl116aa622007-08-15 14:28:22 +00001263
Benjamin Petersone41251e2008-04-25 01:59:09 +00001264 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001265
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001266 Flushes the builder buffers, and returns the toplevel document
1267 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001268
1269
Benjamin Petersone41251e2008-04-25 01:59:09 +00001270 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001271
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001272 Adds text to the current element. *data* is a string. This should be
1273 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +00001274
1275
Benjamin Petersone41251e2008-04-25 01:59:09 +00001276 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +00001277
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001278 Closes the current element. *tag* is the element name. Returns the
1279 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +00001280
1281
Benjamin Petersone41251e2008-04-25 01:59:09 +00001282 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +00001283
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001284 Opens a new element. *tag* is the element name. *attrs* is a dictionary
1285 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +00001286
1287
Stefan Behnel43851a22019-05-01 21:20:38 +02001288 .. method:: comment(text)
1289
1290 Creates a comment with the given *text*. If ``insert_comments`` is true,
1291 this will also add it to the tree.
1292
1293 .. versionadded:: 3.8
1294
1295
1296 .. method:: pi(target, text)
1297
1298 Creates a comment with the given *target* name and *text*. If
1299 ``insert_pis`` is true, this will also add it to the tree.
1300
1301 .. versionadded:: 3.8
1302
1303
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001304 In addition, a custom :class:`TreeBuilder` object can provide the
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001305 following methods:
Georg Brandl116aa622007-08-15 14:28:22 +00001306
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001307 .. method:: doctype(name, pubid, system)
1308
1309 Handles a doctype declaration. *name* is the doctype name. *pubid* is
1310 the public identifier. *system* is the system identifier. This method
1311 does not exist on the default :class:`TreeBuilder` class.
1312
Ezio Melottif8754a62010-03-21 07:16:43 +00001313 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001314
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001315 .. method:: start_ns(prefix, uri)
1316
1317 Is called whenever the parser encounters a new namespace declaration,
1318 before the ``start()`` callback for the opening element that defines it.
1319 *prefix* is ``''`` for the default namespace and the declared
1320 namespace prefix name otherwise. *uri* is the namespace URI.
1321
1322 .. versionadded:: 3.8
1323
1324 .. method:: end_ns(prefix)
1325
1326 Is called after the ``end()`` callback of an element that declared
1327 a namespace prefix mapping, with the name of the *prefix* that went
1328 out of scope.
1329
1330 .. versionadded:: 3.8
1331
Georg Brandl116aa622007-08-15 14:28:22 +00001332
Stefan Behnele1d5dd62019-05-01 22:34:13 +02001333.. class:: C14NWriterTarget(write, *, \
1334 with_comments=False, strip_text=False, rewrite_prefixes=False, \
1335 qname_aware_tags=None, qname_aware_attrs=None, \
1336 exclude_attrs=None, exclude_tags=None)
1337
1338 A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the
1339 same as for the :func:`canonicalize` function. This class does not build a
1340 tree but translates the callback events directly into a serialised form
1341 using the *write* function.
1342
1343 .. versionadded:: 3.8
1344
1345
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001346.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +00001347
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001348XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001349^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001350
1351
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001352.. class:: XMLParser(*, target=None, encoding=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001353
Eli Benderskyb5869342013-08-30 05:51:20 -07001354 This class is the low-level building block of the module. It uses
1355 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
Georg Brandladeffcc2016-02-26 19:13:47 +01001356 be fed XML data incrementally with the :meth:`feed` method, and parsing
1357 events are translated to a push API - by invoking callbacks on the *target*
1358 object. If *target* is omitted, the standard :class:`TreeBuilder` is used.
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001359 If *encoding* [1]_ is given, the value overrides the
Georg Brandladeffcc2016-02-26 19:13:47 +01001360 encoding specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +00001361
Serhiy Storchaka02ec92f2018-07-24 12:03:34 +03001362 .. versionchanged:: 3.8
1363 Parameters are now :ref:`keyword-only <keyword-only_parameter>`.
1364 The *html* argument no longer supported.
1365
Georg Brandl116aa622007-08-15 14:28:22 +00001366
Benjamin Petersone41251e2008-04-25 01:59:09 +00001367 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001368
Eli Benderskybfd78372013-08-24 15:11:44 -07001369 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -07001370 ``close()`` method of the *target* passed during construction; by default,
1371 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +00001372
1373
Benjamin Petersone41251e2008-04-25 01:59:09 +00001374 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001375
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001376 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +00001377
Eli Benderskyb5869342013-08-30 05:51:20 -07001378 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1379 for each opening tag, its ``end(tag)`` method for each closing tag, and data
Stefan Behneldde3eeb2019-05-01 21:49:58 +02001380 is processed by method ``data(data)``. For further supported callback
1381 methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls
Eli Benderskyb5869342013-08-30 05:51:20 -07001382 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1383 building a tree structure. This is an example of counting the maximum depth
1384 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +00001385
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001386 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001387 >>> class MaxDepth: # The target object of the parser
1388 ... maxDepth = 0
1389 ... depth = 0
1390 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001391 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001392 ... if self.depth > self.maxDepth:
1393 ... self.maxDepth = self.depth
1394 ... def end(self, tag): # Called for each closing tag.
1395 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001396 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001397 ... pass # We do not need to do anything with data.
1398 ... def close(self): # Called when all data has been parsed.
1399 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001400 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001401 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001402 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001403 >>> exampleXml = """
1404 ... <a>
1405 ... <b>
1406 ... </b>
1407 ... <b>
1408 ... <c>
1409 ... <d>
1410 ... </d>
1411 ... </c>
1412 ... </b>
1413 ... </a>"""
1414 >>> parser.feed(exampleXml)
1415 >>> parser.close()
1416 4
Christian Heimesb186d002008-03-18 15:15:01 +00001417
Eli Benderskyb5869342013-08-30 05:51:20 -07001418
1419.. _elementtree-xmlpullparser-objects:
1420
1421XMLPullParser Objects
1422^^^^^^^^^^^^^^^^^^^^^
1423
1424.. class:: XMLPullParser(events=None)
1425
Eli Bendersky2c68e302013-08-31 07:37:23 -07001426 A pull parser suitable for non-blocking applications. Its input-side API is
1427 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1428 callback target, :class:`XMLPullParser` collects an internal list of parsing
1429 events and lets the user read from it. *events* is a sequence of events to
1430 report back. The supported events are the strings ``"start"``, ``"end"``,
Stefan Behnel43851a22019-05-01 21:20:38 +02001431 ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events
1432 are used to get detailed namespace information). If *events* is omitted,
1433 only ``"end"`` events are reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001434
1435 .. method:: feed(data)
1436
1437 Feed the given bytes data to the parser.
1438
1439 .. method:: close()
1440
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001441 Signal the parser that the data stream is terminated. Unlike
1442 :meth:`XMLParser.close`, this method always returns :const:`None`.
1443 Any events not yet retrieved when the parser is closed can still be
1444 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001445
1446 .. method:: read_events()
1447
R David Murray410d3202014-01-04 23:52:50 -05001448 Return an iterator over the events which have been encountered in the
1449 data fed to the
1450 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001451 string representing the type of event (e.g. ``"end"``) and *elem* is the
Stefan Behnel43851a22019-05-01 21:20:38 +02001452 encountered :class:`Element` object, or other context value as follows.
1453
1454 * ``start``, ``end``: the current Element.
1455 * ``comment``, ``pi``: the current comment / processing instruction
1456 * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace
1457 mapping.
1458 * ``end-ns``: :const:`None` (this may change in a future version)
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001459
1460 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001461 yielded again. Events are consumed from the internal queue only when
1462 they are retrieved from the iterator, so multiple readers iterating in
1463 parallel over iterators obtained from :meth:`read_events` will have
1464 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001465
1466 .. note::
1467
1468 :class:`XMLPullParser` only guarantees that it has seen the ">"
1469 character of a starting tag when it emits a "start" event, so the
1470 attributes are defined, but the contents of the text and tail attributes
1471 are undefined at that point. The same applies to the element children;
1472 they may or may not be present.
1473
1474 If you need a fully populated element, look for "end" events instead.
1475
1476 .. versionadded:: 3.4
1477
Stefan Behnel43851a22019-05-01 21:20:38 +02001478 .. versionchanged:: 3.8
1479 The ``comment`` and ``pi`` events were added.
1480
1481
Eli Bendersky5b77d812012-03-16 08:20:05 +02001482Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001483^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001484
1485.. class:: ParseError
1486
1487 XML parse error, raised by the various parsing methods in this module when
1488 parsing fails. The string representation of an instance of this exception
1489 will contain a user-friendly error message. In addition, it will have
1490 the following attributes available:
1491
1492 .. attribute:: code
1493
1494 A numeric error code from the expat parser. See the documentation of
1495 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1496
1497 .. attribute:: position
1498
1499 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001500
1501.. rubric:: Footnotes
1502
Serhiy Storchakad97b7dc2017-05-16 23:18:09 +03001503.. [1] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001504 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
Serhiy Storchaka6dff0202016-05-07 10:49:07 +03001505 not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1506 and https://www.iana.org/assignments/character-sets/character-sets.xhtml.