blob: fe7ad9841ae50a1fa9de6aee25f7d3bc32a7e103 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
R David Murray410d3202014-01-04 23:52:50 -0500108
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700109.. note::
110
111 Not all elements of the XML input will end up as elements of the
112 parsed tree. Currently, this module skips over any XML comments,
113 processing instructions, and document type declarations in the
114 input. Nevertheless, trees built using this module's API rather
115 than parsing from XML text can have comments and processing
116 instructions in them; they will be included when generating XML
117 output. A document type declaration may be accessed by passing a
118 custom :class:`TreeBuilder` instance to the :class:`XMLParser`
119 constructor.
120
121
R David Murray410d3202014-01-04 23:52:50 -0500122.. _elementtree-pull-parsing:
123
Eli Bendersky2c68e302013-08-31 07:37:23 -0700124Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700125^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700126
R David Murray410d3202014-01-04 23:52:50 -0500127Most parsing functions provided by this module require the whole document
128to be read at once before returning any result. It is possible to use an
129:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700130calls methods on a callback target, which is too low-level and inconvenient for
131most needs. Sometimes what the user really wants is to be able to parse XML
132incrementally, without blocking operations, while enjoying the convenience of
133fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700134
Eli Benderskyb5869342013-08-30 05:51:20 -0700135The most powerful tool for doing this is :class:`XMLPullParser`. It does not
136require a blocking read to obtain the XML data, and is instead fed with data
137incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500138elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700139
Eli Bendersky2c68e302013-08-31 07:37:23 -0700140 >>> parser = ET.XMLPullParser(['start', 'end'])
141 >>> parser.feed('<mytag>sometext')
142 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700143 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700144 >>> parser.feed(' more text</mytag>')
145 >>> for event, elem in parser.read_events():
Eli Benderskyb5869342013-08-30 05:51:20 -0700146 ... print(event)
147 ... print(elem.tag, 'text=', elem.text)
148 ...
149 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700150
Eli Bendersky2c68e302013-08-31 07:37:23 -0700151The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700152where the XML data is being received from a socket or read incrementally from
153some storage device. In such cases, blocking reads are unacceptable.
154
Eli Benderskyb5869342013-08-30 05:51:20 -0700155Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
156simpler use-cases. If you don't mind your application blocking on reading XML
157data but would still like to have incremental parsing capabilities, take a look
158at :func:`iterparse`. It can be useful when you're reading a large XML document
159and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700160
Eli Benderskyc1d98692012-03-30 11:44:15 +0300161Finding interesting elements
162^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163
164:class:`Element` has some useful methods that help iterate recursively over all
165the sub-tree below it (its children, their children, and so on). For example,
166:meth:`Element.iter`::
167
168 >>> for neighbor in root.iter('neighbor'):
169 ... print(neighbor.attrib)
170 ...
171 {'name': 'Austria', 'direction': 'E'}
172 {'name': 'Switzerland', 'direction': 'W'}
173 {'name': 'Malaysia', 'direction': 'N'}
174 {'name': 'Costa Rica', 'direction': 'W'}
175 {'name': 'Colombia', 'direction': 'E'}
176
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300177:meth:`Element.findall` finds only elements with a tag which are direct
178children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200179with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300180content. :meth:`Element.get` accesses the element's attributes::
181
182 >>> for country in root.findall('country'):
183 ... rank = country.find('rank').text
184 ... name = country.get('name')
185 ... print(name, rank)
186 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300187 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300188 Singapore 4
189 Panama 68
190
Eli Benderskyc1d98692012-03-30 11:44:15 +0300191More sophisticated specification of which elements to look for is possible by
192using :ref:`XPath <elementtree-xpath>`.
193
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300194Modifying an XML File
195^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300196
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300197:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300198The :meth:`ElementTree.write` method serves this purpose.
199
200Once created, an :class:`Element` object may be manipulated by directly changing
201its fields (such as :attr:`Element.text`), adding and modifying attributes
202(:meth:`Element.set` method), as well as adding new children (for example
203with :meth:`Element.append`).
204
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300205Let's say we want to add one to each country's rank, and add an ``updated``
206attribute to the rank element::
207
208 >>> for rank in root.iter('rank'):
209 ... new_rank = int(rank.text) + 1
210 ... rank.text = str(new_rank)
211 ... rank.set('updated', 'yes')
212 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300213 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300214
215Our XML now looks like this:
216
217.. code-block:: xml
218
219 <?xml version="1.0"?>
220 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300221 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300222 <rank updated="yes">2</rank>
223 <year>2008</year>
224 <gdppc>141100</gdppc>
225 <neighbor name="Austria" direction="E"/>
226 <neighbor name="Switzerland" direction="W"/>
227 </country>
228 <country name="Singapore">
229 <rank updated="yes">5</rank>
230 <year>2011</year>
231 <gdppc>59900</gdppc>
232 <neighbor name="Malaysia" direction="N"/>
233 </country>
234 <country name="Panama">
235 <rank updated="yes">69</rank>
236 <year>2011</year>
237 <gdppc>13600</gdppc>
238 <neighbor name="Costa Rica" direction="W"/>
239 <neighbor name="Colombia" direction="E"/>
240 </country>
241 </data>
242
243We can remove elements using :meth:`Element.remove`. Let's say we want to
244remove all countries with a rank higher than 50::
245
246 >>> for country in root.findall('country'):
247 ... rank = int(country.find('rank').text)
248 ... if rank > 50:
249 ... root.remove(country)
250 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300251 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300252
253Our XML now looks like this:
254
255.. code-block:: xml
256
257 <?xml version="1.0"?>
258 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300259 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300260 <rank updated="yes">2</rank>
261 <year>2008</year>
262 <gdppc>141100</gdppc>
263 <neighbor name="Austria" direction="E"/>
264 <neighbor name="Switzerland" direction="W"/>
265 </country>
266 <country name="Singapore">
267 <rank updated="yes">5</rank>
268 <year>2011</year>
269 <gdppc>59900</gdppc>
270 <neighbor name="Malaysia" direction="N"/>
271 </country>
272 </data>
273
274Building XML documents
275^^^^^^^^^^^^^^^^^^^^^^
276
Eli Benderskyc1d98692012-03-30 11:44:15 +0300277The :func:`SubElement` function also provides a convenient way to create new
278sub-elements for a given element::
279
280 >>> a = ET.Element('a')
281 >>> b = ET.SubElement(a, 'b')
282 >>> c = ET.SubElement(a, 'c')
283 >>> d = ET.SubElement(c, 'd')
284 >>> ET.dump(a)
285 <a><b /><c><d /></c></a>
286
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700287Parsing XML with Namespaces
288^^^^^^^^^^^^^^^^^^^^^^^^^^^
289
290If the XML input has `namespaces
291<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
292with prefixes in the form ``prefix:sometag`` get expanded to
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700293``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
294Also, if there is a `default namespace
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700295<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
296that full URI gets prepended to all of the non-prefixed tags.
297
298Here is an XML example that incorporates two namespaces, one with the
299prefix "fictional" and the other serving as the default namespace:
300
301.. code-block:: xml
302
303 <?xml version="1.0"?>
304 <actors xmlns:fictional="http://characters.example.com"
305 xmlns="http://people.example.com">
306 <actor>
307 <name>John Cleese</name>
308 <fictional:character>Lancelot</fictional:character>
309 <fictional:character>Archie Leach</fictional:character>
310 </actor>
311 <actor>
312 <name>Eric Idle</name>
313 <fictional:character>Sir Robin</fictional:character>
314 <fictional:character>Gunther</fictional:character>
315 <fictional:character>Commander Clement</fictional:character>
316 </actor>
317 </actors>
318
319One way to search and explore this XML example is to manually add the
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700320URI to every tag or attribute in the xpath of a
321:meth:`~Element.find` or :meth:`~Element.findall`::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700322
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700323 root = fromstring(xml_text)
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700324 for actor in root.findall('{http://people.example.com}actor'):
325 name = actor.find('{http://people.example.com}name')
326 print(name.text)
327 for char in actor.findall('{http://characters.example.com}character'):
328 print(' |-->', char.text)
329
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700330A better way to search the namespaced XML example is to create a
331dictionary with your own prefixes and use those in the search functions::
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700332
333 ns = {'real_person': 'http://people.example.com',
334 'role': 'http://characters.example.com'}
335
336 for actor in root.findall('real_person:actor', ns):
337 name = actor.find('real_person:name', ns)
338 print(name.text)
339 for char in actor.findall('role:character', ns):
340 print(' |-->', char.text)
341
342These two approaches both output::
343
344 John Cleese
345 |--> Lancelot
346 |--> Archie Leach
347 Eric Idle
348 |--> Sir Robin
349 |--> Gunther
350 |--> Commander Clement
351
352
Eli Benderskyc1d98692012-03-30 11:44:15 +0300353Additional resources
354^^^^^^^^^^^^^^^^^^^^
355
356See http://effbot.org/zone/element-index.htm for tutorials and links to other
357docs.
358
359
360.. _elementtree-xpath:
361
362XPath support
363-------------
364
365This module provides limited support for
366`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
367tree. The goal is to support a small subset of the abbreviated syntax; a full
368XPath engine is outside the scope of the module.
369
370Example
371^^^^^^^
372
373Here's an example that demonstrates some of the XPath capabilities of the
374module. We'll be using the ``countrydata`` XML document from the
375:ref:`Parsing XML <elementtree-parsing-xml>` section::
376
377 import xml.etree.ElementTree as ET
378
379 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200380
381 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300382 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200383
384 # All 'neighbor' grand-children of 'country' children of the top-level
385 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300386 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200387
388 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300389 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200390
391 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300392 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200393
394 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300395 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200396
397Supported XPath syntax
398^^^^^^^^^^^^^^^^^^^^^^
399
Georg Brandl44ea77b2013-03-28 13:28:44 +0100400.. tabularcolumns:: |l|L|
401
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200402+-----------------------+------------------------------------------------------+
403| Syntax | Meaning |
404+=======================+======================================================+
405| ``tag`` | Selects all child elements with the given tag. |
406| | For example, ``spam`` selects all child elements |
Raymond Hettinger1e1e6012014-03-29 11:50:08 -0700407| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200408| | grandchildren named ``egg`` in all children named |
409| | ``spam``. |
410+-----------------------+------------------------------------------------------+
411| ``*`` | Selects all child elements. For example, ``*/egg`` |
412| | selects all grandchildren named ``egg``. |
413+-----------------------+------------------------------------------------------+
414| ``.`` | Selects the current node. This is mostly useful |
415| | at the beginning of the path, to indicate that it's |
416| | a relative path. |
417+-----------------------+------------------------------------------------------+
418| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200419| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200420| | all ``egg`` elements in the entire tree. |
421+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700422| ``..`` | Selects the parent element. Returns ``None`` if the |
423| | path attempts to reach the ancestors of the start |
424| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200425+-----------------------+------------------------------------------------------+
426| ``[@attrib]`` | Selects all elements that have the given attribute. |
427+-----------------------+------------------------------------------------------+
428| ``[@attrib='value']`` | Selects all elements for which the given attribute |
429| | has the given value. The value cannot contain |
430| | quotes. |
431+-----------------------+------------------------------------------------------+
432| ``[tag]`` | Selects all elements that have a child named |
433| | ``tag``. Only immediate children are supported. |
434+-----------------------+------------------------------------------------------+
Raymond Hettingerc43a6662015-03-30 20:29:28 -0700435| ``[tag='text']`` | Selects all elements that have a child named |
436| | ``tag`` whose complete text content, including |
437| | descendants, equals the given ``text``. |
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700438+-----------------------+------------------------------------------------------+
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200439| ``[position]`` | Selects all elements that are located at the given |
440| | position. The position can be either an integer |
441| | (1 is the first position), the expression ``last()`` |
442| | (for the last position), or a position relative to |
443| | the last position (e.g. ``last()-1``). |
444+-----------------------+------------------------------------------------------+
445
446Predicates (expressions within square brackets) must be preceded by a tag
447name, an asterisk, or another predicate. ``position`` predicates must be
448preceded by a tag name.
449
450Reference
451---------
452
Georg Brandl116aa622007-08-15 14:28:22 +0000453.. _elementtree-functions:
454
455Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200456^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000457
458
Georg Brandl7f01a132009-09-16 15:58:14 +0000459.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000460
Georg Brandlf6945182008-02-01 11:56:49 +0000461 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000462 that will be serialized as an XML comment by the standard serializer. The
463 comment string can be either a bytestring or a Unicode string. *text* is a
464 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000465 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000466
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700467 Note that :class:`XMLParser` skips over comments in the input
468 instead of creating comment objects for them. An :class:`ElementTree` will
469 only contain comment nodes if they have been inserted into to
470 the tree using one of the :class:`Element` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000471
472.. function:: dump(elem)
473
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000474 Writes an element tree or element structure to sys.stdout. This function
475 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000476
477 The exact output format is implementation dependent. In this version, it's
478 written as an ordinary XML file.
479
480 *elem* is an element tree or an individual element.
481
482
Georg Brandl116aa622007-08-15 14:28:22 +0000483.. function:: fromstring(text)
484
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000485 Parses an XML section from a string constant. Same as :func:`XML`. *text*
486 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000487
488
489.. function:: fromstringlist(sequence, parser=None)
490
491 Parses an XML document from a sequence of string fragments. *sequence* is a
492 list or other sequence containing XML data fragments. *parser* is an
493 optional parser instance. If not given, the standard :class:`XMLParser`
494 parser is used. Returns an :class:`Element` instance.
495
Ezio Melottif8754a62010-03-21 07:16:43 +0000496 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000497
498
499.. function:: iselement(element)
500
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000501 Checks if an object appears to be a valid element object. *element* is an
502 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000503
504
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000505.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000506
507 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200508 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700509 containing XML data. *events* is a sequence of events to report back. The
Eli Benderskyb5869342013-08-30 05:51:20 -0700510 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
511 ``"end-ns"`` (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200512 information). If *events* is omitted, only ``"end"`` events are reported.
513 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700514 :class:`XMLParser` parser is used. *parser* must be a subclass of
515 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
516 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000517
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700518 Note that while :func:`iterparse` builds the tree incrementally, it issues
519 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700520 for applications where blocking reads can't be made. For fully non-blocking
521 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700522
Benjamin Peterson75edad02009-01-01 15:05:06 +0000523 .. note::
524
Eli Benderskyb5869342013-08-30 05:51:20 -0700525 :func:`iterparse` only guarantees that it has seen the ">" character of a
526 starting tag when it emits a "start" event, so the attributes are defined,
527 but the contents of the text and tail attributes are undefined at that
528 point. The same applies to the element children; they may or may not be
529 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000530
531 If you need a fully populated element, look for "end" events instead.
532
Eli Benderskyb5869342013-08-30 05:51:20 -0700533 .. deprecated:: 3.4
534 The *parser* argument.
535
Georg Brandl7f01a132009-09-16 15:58:14 +0000536.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000537
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000538 Parses an XML section into an element tree. *source* is a filename or file
539 object containing XML data. *parser* is an optional parser instance. If
540 not given, the standard :class:`XMLParser` parser is used. Returns an
541 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543
Georg Brandl7f01a132009-09-16 15:58:14 +0000544.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000545
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000546 PI element factory. This factory function creates a special element that
547 will be serialized as an XML processing instruction. *target* is a string
548 containing the PI target. *text* is a string containing the PI contents, if
549 given. Returns an element instance, representing a processing instruction.
550
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700551 Note that :class:`XMLParser` skips over processing instructions
552 in the input instead of creating comment objects for them. An
553 :class:`ElementTree` will only contain processing instruction nodes if
554 they have been inserted into to the tree using one of the
555 :class:`Element` methods.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000556
557.. function:: register_namespace(prefix, uri)
558
559 Registers a namespace prefix. The registry is global, and any existing
560 mapping for either the given prefix or the namespace URI will be removed.
561 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
562 attributes in this namespace will be serialized with the given prefix, if at
563 all possible.
564
Ezio Melottif8754a62010-03-21 07:16:43 +0000565 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000566
567
Georg Brandl7f01a132009-09-16 15:58:14 +0000568.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000569
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000570 Subelement factory. This function creates an element instance, and appends
571 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000572
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000573 The element name, attribute names, and attribute values can be either
574 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
575 the subelement name. *attrib* is an optional dictionary, containing element
576 attributes. *extra* contains additional attributes, given as keyword
577 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000578
579
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200580.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800581 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000582
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000583 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000584 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000585 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700586 generate a Unicode string (otherwise, a bytestring is generated). *method*
587 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800588 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700589 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000590
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800591 .. versionadded:: 3.4
592 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000593
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800594
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200595.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800596 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000597
598 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000599 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000600 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700601 generate a Unicode string (otherwise, a bytestring is generated). *method*
602 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800603 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700604 Returns a list of (optionally) encoded strings containing the XML data.
605 It does not guarantee any specific sequence, except that
Serhiy Storchaka5e028ae2014-02-06 21:10:41 +0200606 ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000607
Ezio Melottif8754a62010-03-21 07:16:43 +0000608 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000609
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800610 .. versionadded:: 3.4
611 The *short_empty_elements* parameter.
612
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000613
614.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000615
616 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000617 embed "XML literals" in Python code. *text* is a string containing XML
618 data. *parser* is an optional parser instance. If not given, the standard
619 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000620
621
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000622.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000623
624 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000625 which maps from element id:s to elements. *text* is a string containing XML
626 data. *parser* is an optional parser instance. If not given, the standard
627 :class:`XMLParser` parser is used. Returns a tuple containing an
628 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000629
630
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000631.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000632
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000633Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200634^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000636.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000637
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000638 Element class. This class defines the Element interface, and provides a
639 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000640
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000641 The element name, attribute names, and attribute values can be either
642 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
643 an optional dictionary, containing element attributes. *extra* contains
644 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000645
646
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000647 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000648
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000649 A string identifying what kind of data this element represents (the
650 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000651
652
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000653 .. attribute:: text
Ned Deilyeca04452015-08-17 22:11:17 -0400654 tail
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Ned Deilyeca04452015-08-17 22:11:17 -0400656 These attributes can be used to hold additional data associated with
657 the element. Their values are usually strings but may be any
658 application-specific object. If the element is created from
659 an XML file, the *text* attribute holds either the text between
660 the element's start tag and its first child or end tag, or ``None``, and
661 the *tail* attribute holds either the text between the element's
662 end tag and the next tag, or ``None``. For the XML data
Georg Brandl116aa622007-08-15 14:28:22 +0000663
Ned Deilyeca04452015-08-17 22:11:17 -0400664 .. code-block:: xml
Georg Brandl116aa622007-08-15 14:28:22 +0000665
Ned Deilyeca04452015-08-17 22:11:17 -0400666 <a><b>1<c>2<d/>3</c></b>4</a>
Georg Brandl116aa622007-08-15 14:28:22 +0000667
Ned Deilyeca04452015-08-17 22:11:17 -0400668 the *a* element has ``None`` for both *text* and *tail* attributes,
669 the *b* element has *text* ``"1"`` and *tail* ``"4"``,
670 the *c* element has *text* ``"2"`` and *tail* ``None``,
671 and the *d* element has *text* ``None`` and *tail* ``"3"``.
672
673 To collect the inner text of an element, see :meth:`itertext`, for
674 example ``"".join(element.itertext())``.
675
676 Applications may store arbitrary objects in these attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000677
Georg Brandl116aa622007-08-15 14:28:22 +0000678
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000679 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000680
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000681 A dictionary containing the element's attributes. Note that while the
682 *attrib* value is always a real mutable Python dictionary, an ElementTree
683 implementation may choose to use another internal representation, and
684 create the dictionary only if someone asks for it. To take advantage of
685 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000686
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000687 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000688
689
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000690 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000691
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000692 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700693 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000694
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000695
696 .. method:: get(key, default=None)
697
698 Gets the element attribute named *key*.
699
700 Returns the attribute value, or *default* if the attribute was not found.
701
702
703 .. method:: items()
704
705 Returns the element attributes as a sequence of (name, value) pairs. The
706 attributes are returned in an arbitrary order.
707
708
709 .. method:: keys()
710
711 Returns the elements attribute names as a list. The names are returned
712 in an arbitrary order.
713
714
715 .. method:: set(key, value)
716
717 Set the attribute *key* on the element to *value*.
718
719 The following methods work on the element's children (subelements).
720
721
722 .. method:: append(subelement)
723
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200724 Adds the element *subelement* to the end of this element's internal list
725 of subelements. Raises :exc:`TypeError` if *subelement* is not an
726 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000727
728
729 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000730
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000731 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200732 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000733
Ezio Melottif8754a62010-03-21 07:16:43 +0000734 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000735
Georg Brandl116aa622007-08-15 14:28:22 +0000736
Eli Bendersky737b1732012-05-29 06:02:56 +0300737 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000738
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000739 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200740 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300741 or ``None``. *namespaces* is an optional mapping from namespace prefix
742 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000743
Georg Brandl116aa622007-08-15 14:28:22 +0000744
Eli Bendersky737b1732012-05-29 06:02:56 +0300745 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000746
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200747 Finds all matching subelements, by tag name or
748 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300749 elements in document order. *namespaces* is an optional mapping from
750 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000751
Georg Brandl116aa622007-08-15 14:28:22 +0000752
Eli Bendersky737b1732012-05-29 06:02:56 +0300753 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000754
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000755 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200756 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
757 of the first matching element, or *default* if no element was found.
758 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300759 is returned. *namespaces* is an optional mapping from namespace prefix
760 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000761
Georg Brandl116aa622007-08-15 14:28:22 +0000762
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000763 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000764
Georg Brandl67b21b72010-08-17 15:07:14 +0000765 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000766 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000767
Georg Brandl116aa622007-08-15 14:28:22 +0000768
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000769 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000770
Georg Brandl67b21b72010-08-17 15:07:14 +0000771 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000772 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000773
Georg Brandl116aa622007-08-15 14:28:22 +0000774
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200775 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000776
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200777 Inserts *subelement* at the given position in this element. Raises
778 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000779
Georg Brandl116aa622007-08-15 14:28:22 +0000780
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000781 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000782
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000783 Creates a tree :term:`iterator` with the current element as the root.
784 The iterator iterates over this element and all elements below it, in
785 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
786 elements whose tag equals *tag* are returned from the iterator. If the
787 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000788
Ezio Melotti138fc892011-10-10 00:02:03 +0300789 .. versionadded:: 3.2
790
Georg Brandl116aa622007-08-15 14:28:22 +0000791
Eli Bendersky737b1732012-05-29 06:02:56 +0300792 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000793
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200794 Finds all matching subelements, by tag name or
795 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300796 matching elements in document order. *namespaces* is an optional mapping
797 from namespace prefix to full name.
798
Georg Brandl116aa622007-08-15 14:28:22 +0000799
Ezio Melottif8754a62010-03-21 07:16:43 +0000800 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000801
Georg Brandl116aa622007-08-15 14:28:22 +0000802
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000803 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000804
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000805 Creates a text iterator. The iterator loops over this element and all
806 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000807
Ezio Melottif8754a62010-03-21 07:16:43 +0000808 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000809
810
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000811 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000812
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000813 Creates a new element object of the same type as this element. Do not
814 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000815
816
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000817 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000818
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000819 Removes *subelement* from the element. Unlike the find\* methods this
820 method compares elements based on the instance identity, not on tag value
821 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000822
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000823 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300824 for working with subelements: :meth:`~object.__delitem__`,
825 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
826 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000827
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000828 Caution: Elements with no subelements will test as ``False``. This behavior
829 will change in future versions. Use specific ``len(elem)`` or ``elem is
830 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000831
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000832 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000833
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000834 if not element: # careful!
835 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000836
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000837 if element is None:
838 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000839
840
841.. _elementtree-elementtree-objects:
842
843ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200844^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000845
846
Georg Brandl7f01a132009-09-16 15:58:14 +0000847.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000848
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000849 ElementTree wrapper class. This class represents an entire element
850 hierarchy, and adds some extra support for serialization to and from
851 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000852
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000853 *element* is the root element. The tree is initialized with the contents
854 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000855
856
Benjamin Petersone41251e2008-04-25 01:59:09 +0000857 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000858
Benjamin Petersone41251e2008-04-25 01:59:09 +0000859 Replaces the root element for this tree. This discards the current
860 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000861 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000862
863
Eli Bendersky737b1732012-05-29 06:02:56 +0300864 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000865
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200866 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000867
868
Eli Bendersky737b1732012-05-29 06:02:56 +0300869 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000870
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200871 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000872
873
Eli Bendersky737b1732012-05-29 06:02:56 +0300874 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000875
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200876 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000877
878
Georg Brandl7f01a132009-09-16 15:58:14 +0000879 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000880
Georg Brandl67b21b72010-08-17 15:07:14 +0000881 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000882 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000883
884
Benjamin Petersone41251e2008-04-25 01:59:09 +0000885 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000886
Benjamin Petersone41251e2008-04-25 01:59:09 +0000887 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000888
889
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000890 .. method:: iter(tag=None)
891
892 Creates and returns a tree iterator for the root element. The iterator
893 loops over all elements in this tree, in section order. *tag* is the tag
Martin Panterd21e0b52015-10-10 10:36:22 +0000894 to look for (default is to return all elements).
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000895
896
Eli Bendersky737b1732012-05-29 06:02:56 +0300897 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000898
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200899 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000900
Ezio Melottif8754a62010-03-21 07:16:43 +0000901 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000902
903
Georg Brandl7f01a132009-09-16 15:58:14 +0000904 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000905
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000906 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000907 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300908 If not given, the standard :class:`XMLParser` parser is used. Returns the
909 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000910
911
Eli Benderskyf96cf912012-07-15 06:19:44 +0300912 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200913 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800914 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000915
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000916 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300917 :term:`file object` opened for writing. *encoding* [1]_ is the output
918 encoding (default is US-ASCII).
919 *xml_declaration* controls if an XML declaration should be added to the
920 file. Use ``False`` for never, ``True`` for always, ``None``
921 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200922 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300923 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
924 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800925 The keyword-only *short_empty_elements* parameter controls the formatting
926 of elements that contain no content. If *True* (the default), they are
927 emitted as a single self-closed tag, otherwise they are emitted as a pair
928 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300929
930 The output is either a string (:class:`str`) or binary (:class:`bytes`).
931 This is controlled by the *encoding* argument. If *encoding* is
932 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
933 this may conflict with the type of *file* if it's an open
934 :term:`file object`; make sure you do not try to write a string to a
935 binary stream and vice versa.
936
R David Murray575fb312013-12-25 23:21:03 -0500937 .. versionadded:: 3.4
938 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800939
Georg Brandl116aa622007-08-15 14:28:22 +0000940
Christian Heimesd8654cf2007-12-02 15:22:16 +0000941This is the XML file that is going to be manipulated::
942
943 <html>
944 <head>
945 <title>Example page</title>
946 </head>
947 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000948 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000949 or <a href="http://example.com/">example.com</a>.</p>
950 </body>
951 </html>
952
953Example of changing the attribute "target" of every link in first paragraph::
954
955 >>> from xml.etree.ElementTree import ElementTree
956 >>> tree = ElementTree()
957 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000958 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000959 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
960 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000961 <Element 'p' at 0xb77ec26c>
962 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000963 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000964 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000965 >>> for i in links: # Iterates through all found links
966 ... i.attrib["target"] = "blank"
967 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000968
969.. _elementtree-qname-objects:
970
971QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200972^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000973
974
Georg Brandl7f01a132009-09-16 15:58:14 +0000975.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000976
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000977 QName wrapper. This can be used to wrap a QName attribute value, in order
978 to get proper namespace handling on output. *text_or_uri* is a string
979 containing the QName value, in the form {uri}local, or, if the tag argument
980 is given, the URI part of a QName. If *tag* is given, the first argument is
981 interpreted as an URI, and this argument is interpreted as a local name.
982 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000983
984
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200985
Georg Brandl116aa622007-08-15 14:28:22 +0000986.. _elementtree-treebuilder-objects:
987
988TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200989^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000990
991
Georg Brandl7f01a132009-09-16 15:58:14 +0000992.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000993
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000994 Generic element structure builder. This builder converts a sequence of
995 start, data, and end method calls to a well-formed element structure. You
996 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300997 or a parser for some other XML-like format. *element_factory*, when given,
998 must be a callable accepting two positional arguments: a tag and
999 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001000
Benjamin Petersone41251e2008-04-25 01:59:09 +00001001 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001002
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001003 Flushes the builder buffers, and returns the toplevel document
1004 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +00001005
1006
Benjamin Petersone41251e2008-04-25 01:59:09 +00001007 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001008
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001009 Adds text to the current element. *data* is a string. This should be
1010 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +00001011
1012
Benjamin Petersone41251e2008-04-25 01:59:09 +00001013 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +00001014
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001015 Closes the current element. *tag* is the element name. Returns the
1016 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +00001017
1018
Benjamin Petersone41251e2008-04-25 01:59:09 +00001019 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +00001020
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001021 Opens a new element. *tag* is the element name. *attrs* is a dictionary
1022 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +00001023
1024
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001025 In addition, a custom :class:`TreeBuilder` object can provide the
1026 following method:
Georg Brandl116aa622007-08-15 14:28:22 +00001027
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001028 .. method:: doctype(name, pubid, system)
1029
1030 Handles a doctype declaration. *name* is the doctype name. *pubid* is
1031 the public identifier. *system* is the system identifier. This method
1032 does not exist on the default :class:`TreeBuilder` class.
1033
Ezio Melottif8754a62010-03-21 07:16:43 +00001034 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001035
1036
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001037.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +00001038
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001039XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001040^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001041
1042
1043.. class:: XMLParser(html=0, target=None, encoding=None)
1044
Eli Benderskyb5869342013-08-30 05:51:20 -07001045 This class is the low-level building block of the module. It uses
1046 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
Georg Brandladeffcc2016-02-26 19:13:47 +01001047 be fed XML data incrementally with the :meth:`feed` method, and parsing
1048 events are translated to a push API - by invoking callbacks on the *target*
1049 object. If *target* is omitted, the standard :class:`TreeBuilder` is used.
1050 The *html* argument was historically used for backwards compatibility and is
1051 now deprecated. If *encoding* [1]_ is given, the value overrides the
1052 encoding specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +00001053
Eli Benderskyb5869342013-08-30 05:51:20 -07001054 .. deprecated:: 3.4
Larry Hastings3732ed22014-03-15 21:13:56 -07001055 The *html* argument. The remaining arguments should be passed via
Georg Brandladeffcc2016-02-26 19:13:47 +01001056 keyword to prepare for the removal of the *html* argument.
Georg Brandl116aa622007-08-15 14:28:22 +00001057
Benjamin Petersone41251e2008-04-25 01:59:09 +00001058 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001059
Eli Benderskybfd78372013-08-24 15:11:44 -07001060 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -07001061 ``close()`` method of the *target* passed during construction; by default,
1062 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +00001063
1064
Benjamin Petersone41251e2008-04-25 01:59:09 +00001065 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +00001066
Georg Brandl67b21b72010-08-17 15:07:14 +00001067 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001068 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
1069 target.
Georg Brandl116aa622007-08-15 14:28:22 +00001070
1071
Benjamin Petersone41251e2008-04-25 01:59:09 +00001072 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001073
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001074 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +00001075
Eli Benderskyb5869342013-08-30 05:51:20 -07001076 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1077 for each opening tag, its ``end(tag)`` method for each closing tag, and data
1078 is processed by method ``data(data)``. :meth:`XMLParser.close` calls
1079 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1080 building a tree structure. This is an example of counting the maximum depth
1081 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +00001082
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001083 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001084 >>> class MaxDepth: # The target object of the parser
1085 ... maxDepth = 0
1086 ... depth = 0
1087 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001088 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001089 ... if self.depth > self.maxDepth:
1090 ... self.maxDepth = self.depth
1091 ... def end(self, tag): # Called for each closing tag.
1092 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001093 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001094 ... pass # We do not need to do anything with data.
1095 ... def close(self): # Called when all data has been parsed.
1096 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001097 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001098 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001099 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001100 >>> exampleXml = """
1101 ... <a>
1102 ... <b>
1103 ... </b>
1104 ... <b>
1105 ... <c>
1106 ... <d>
1107 ... </d>
1108 ... </c>
1109 ... </b>
1110 ... </a>"""
1111 >>> parser.feed(exampleXml)
1112 >>> parser.close()
1113 4
Christian Heimesb186d002008-03-18 15:15:01 +00001114
Eli Benderskyb5869342013-08-30 05:51:20 -07001115
1116.. _elementtree-xmlpullparser-objects:
1117
1118XMLPullParser Objects
1119^^^^^^^^^^^^^^^^^^^^^
1120
1121.. class:: XMLPullParser(events=None)
1122
Eli Bendersky2c68e302013-08-31 07:37:23 -07001123 A pull parser suitable for non-blocking applications. Its input-side API is
1124 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1125 callback target, :class:`XMLPullParser` collects an internal list of parsing
1126 events and lets the user read from it. *events* is a sequence of events to
1127 report back. The supported events are the strings ``"start"``, ``"end"``,
1128 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
1129 namespace information). If *events* is omitted, only ``"end"`` events are
1130 reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001131
1132 .. method:: feed(data)
1133
1134 Feed the given bytes data to the parser.
1135
1136 .. method:: close()
1137
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001138 Signal the parser that the data stream is terminated. Unlike
1139 :meth:`XMLParser.close`, this method always returns :const:`None`.
1140 Any events not yet retrieved when the parser is closed can still be
1141 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001142
1143 .. method:: read_events()
1144
R David Murray410d3202014-01-04 23:52:50 -05001145 Return an iterator over the events which have been encountered in the
1146 data fed to the
1147 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001148 string representing the type of event (e.g. ``"end"``) and *elem* is the
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001149 encountered :class:`Element` object.
1150
1151 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001152 yielded again. Events are consumed from the internal queue only when
1153 they are retrieved from the iterator, so multiple readers iterating in
1154 parallel over iterators obtained from :meth:`read_events` will have
1155 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001156
1157 .. note::
1158
1159 :class:`XMLPullParser` only guarantees that it has seen the ">"
1160 character of a starting tag when it emits a "start" event, so the
1161 attributes are defined, but the contents of the text and tail attributes
1162 are undefined at that point. The same applies to the element children;
1163 they may or may not be present.
1164
1165 If you need a fully populated element, look for "end" events instead.
1166
1167 .. versionadded:: 3.4
1168
Eli Bendersky5b77d812012-03-16 08:20:05 +02001169Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001170^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001171
1172.. class:: ParseError
1173
1174 XML parse error, raised by the various parsing methods in this module when
1175 parsing fails. The string representation of an instance of this exception
1176 will contain a user-friendly error message. In addition, it will have
1177 the following attributes available:
1178
1179 .. attribute:: code
1180
1181 A numeric error code from the expat parser. See the documentation of
1182 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1183
1184 .. attribute:: position
1185
1186 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001187
1188.. rubric:: Footnotes
1189
1190.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001191 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1192 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandlb7354a62014-10-29 10:57:37 +01001193 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.