blob: f09934bbe08ef5ea7a58e08343fc6ad2a9094089 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
R David Murray410d3202014-01-04 23:52:50 -0500108
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700109.. note::
110
111 Not all elements of the XML input will end up as elements of the
112 parsed tree. Currently, this module skips over any XML comments,
113 processing instructions, and document type declarations in the
114 input. Nevertheless, trees built using this module's API rather
115 than parsing from XML text can have comments and processing
116 instructions in them; they will be included when generating XML
117 output. A document type declaration may be accessed by passing a
118 custom :class:`TreeBuilder` instance to the :class:`XMLParser`
119 constructor.
120
121
R David Murray410d3202014-01-04 23:52:50 -0500122.. _elementtree-pull-parsing:
123
Eli Bendersky2c68e302013-08-31 07:37:23 -0700124Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700125^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700126
R David Murray410d3202014-01-04 23:52:50 -0500127Most parsing functions provided by this module require the whole document
128to be read at once before returning any result. It is possible to use an
129:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700130calls methods on a callback target, which is too low-level and inconvenient for
131most needs. Sometimes what the user really wants is to be able to parse XML
132incrementally, without blocking operations, while enjoying the convenience of
133fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700134
Eli Benderskyb5869342013-08-30 05:51:20 -0700135The most powerful tool for doing this is :class:`XMLPullParser`. It does not
136require a blocking read to obtain the XML data, and is instead fed with data
137incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500138elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700139
Eli Bendersky2c68e302013-08-31 07:37:23 -0700140 >>> parser = ET.XMLPullParser(['start', 'end'])
141 >>> parser.feed('<mytag>sometext')
142 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700143 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700144 >>> parser.feed(' more text</mytag>')
145 >>> for event, elem in parser.read_events():
Eli Benderskyb5869342013-08-30 05:51:20 -0700146 ... print(event)
147 ... print(elem.tag, 'text=', elem.text)
148 ...
149 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700150
Eli Bendersky2c68e302013-08-31 07:37:23 -0700151The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700152where the XML data is being received from a socket or read incrementally from
153some storage device. In such cases, blocking reads are unacceptable.
154
Eli Benderskyb5869342013-08-30 05:51:20 -0700155Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
156simpler use-cases. If you don't mind your application blocking on reading XML
157data but would still like to have incremental parsing capabilities, take a look
158at :func:`iterparse`. It can be useful when you're reading a large XML document
159and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700160
Eli Benderskyc1d98692012-03-30 11:44:15 +0300161Finding interesting elements
162^^^^^^^^^^^^^^^^^^^^^^^^^^^^
163
164:class:`Element` has some useful methods that help iterate recursively over all
165the sub-tree below it (its children, their children, and so on). For example,
166:meth:`Element.iter`::
167
168 >>> for neighbor in root.iter('neighbor'):
169 ... print(neighbor.attrib)
170 ...
171 {'name': 'Austria', 'direction': 'E'}
172 {'name': 'Switzerland', 'direction': 'W'}
173 {'name': 'Malaysia', 'direction': 'N'}
174 {'name': 'Costa Rica', 'direction': 'W'}
175 {'name': 'Colombia', 'direction': 'E'}
176
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300177:meth:`Element.findall` finds only elements with a tag which are direct
178children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200179with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300180content. :meth:`Element.get` accesses the element's attributes::
181
182 >>> for country in root.findall('country'):
183 ... rank = country.find('rank').text
184 ... name = country.get('name')
185 ... print(name, rank)
186 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300187 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300188 Singapore 4
189 Panama 68
190
Eli Benderskyc1d98692012-03-30 11:44:15 +0300191More sophisticated specification of which elements to look for is possible by
192using :ref:`XPath <elementtree-xpath>`.
193
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300194Modifying an XML File
195^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300196
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300197:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300198The :meth:`ElementTree.write` method serves this purpose.
199
200Once created, an :class:`Element` object may be manipulated by directly changing
201its fields (such as :attr:`Element.text`), adding and modifying attributes
202(:meth:`Element.set` method), as well as adding new children (for example
203with :meth:`Element.append`).
204
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300205Let's say we want to add one to each country's rank, and add an ``updated``
206attribute to the rank element::
207
208 >>> for rank in root.iter('rank'):
209 ... new_rank = int(rank.text) + 1
210 ... rank.text = str(new_rank)
211 ... rank.set('updated', 'yes')
212 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300213 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300214
215Our XML now looks like this:
216
217.. code-block:: xml
218
219 <?xml version="1.0"?>
220 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300221 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300222 <rank updated="yes">2</rank>
223 <year>2008</year>
224 <gdppc>141100</gdppc>
225 <neighbor name="Austria" direction="E"/>
226 <neighbor name="Switzerland" direction="W"/>
227 </country>
228 <country name="Singapore">
229 <rank updated="yes">5</rank>
230 <year>2011</year>
231 <gdppc>59900</gdppc>
232 <neighbor name="Malaysia" direction="N"/>
233 </country>
234 <country name="Panama">
235 <rank updated="yes">69</rank>
236 <year>2011</year>
237 <gdppc>13600</gdppc>
238 <neighbor name="Costa Rica" direction="W"/>
239 <neighbor name="Colombia" direction="E"/>
240 </country>
241 </data>
242
243We can remove elements using :meth:`Element.remove`. Let's say we want to
244remove all countries with a rank higher than 50::
245
246 >>> for country in root.findall('country'):
247 ... rank = int(country.find('rank').text)
248 ... if rank > 50:
249 ... root.remove(country)
250 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300251 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300252
253Our XML now looks like this:
254
255.. code-block:: xml
256
257 <?xml version="1.0"?>
258 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300259 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300260 <rank updated="yes">2</rank>
261 <year>2008</year>
262 <gdppc>141100</gdppc>
263 <neighbor name="Austria" direction="E"/>
264 <neighbor name="Switzerland" direction="W"/>
265 </country>
266 <country name="Singapore">
267 <rank updated="yes">5</rank>
268 <year>2011</year>
269 <gdppc>59900</gdppc>
270 <neighbor name="Malaysia" direction="N"/>
271 </country>
272 </data>
273
274Building XML documents
275^^^^^^^^^^^^^^^^^^^^^^
276
Eli Benderskyc1d98692012-03-30 11:44:15 +0300277The :func:`SubElement` function also provides a convenient way to create new
278sub-elements for a given element::
279
280 >>> a = ET.Element('a')
281 >>> b = ET.SubElement(a, 'b')
282 >>> c = ET.SubElement(a, 'c')
283 >>> d = ET.SubElement(c, 'd')
284 >>> ET.dump(a)
285 <a><b /><c><d /></c></a>
286
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700287Parsing XML with Namespaces
288^^^^^^^^^^^^^^^^^^^^^^^^^^^
289
290If the XML input has `namespaces
291<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
292with prefixes in the form ``prefix:sometag`` get expanded to
293``{uri}tag`` where the *prefix* is replaced by the full *URI*. Also,
294if there is a `default namespace
295<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
296that full URI gets prepended to all of the non-prefixed tags.
297
298Here is an XML example that incorporates two namespaces, one with the
299prefix "fictional" and the other serving as the default namespace:
300
301.. code-block:: xml
302
303 <?xml version="1.0"?>
304 <actors xmlns:fictional="http://characters.example.com"
305 xmlns="http://people.example.com">
306 <actor>
307 <name>John Cleese</name>
308 <fictional:character>Lancelot</fictional:character>
309 <fictional:character>Archie Leach</fictional:character>
310 </actor>
311 <actor>
312 <name>Eric Idle</name>
313 <fictional:character>Sir Robin</fictional:character>
314 <fictional:character>Gunther</fictional:character>
315 <fictional:character>Commander Clement</fictional:character>
316 </actor>
317 </actors>
318
319One way to search and explore this XML example is to manually add the
320URI to every tag or attribute in the xpath of a *find()* or *findall()*::
321
322 root = from_string(xml_text)
323 for actor in root.findall('{http://people.example.com}actor'):
324 name = actor.find('{http://people.example.com}name')
325 print(name.text)
326 for char in actor.findall('{http://characters.example.com}character'):
327 print(' |-->', char.text)
328
329Another way to search the namespaced XML example is to create a
330dictionary with your own prefixes and use those in the search::
331
332 ns = {'real_person': 'http://people.example.com',
333 'role': 'http://characters.example.com'}
334
335 for actor in root.findall('real_person:actor', ns):
336 name = actor.find('real_person:name', ns)
337 print(name.text)
338 for char in actor.findall('role:character', ns):
339 print(' |-->', char.text)
340
341These two approaches both output::
342
343 John Cleese
344 |--> Lancelot
345 |--> Archie Leach
346 Eric Idle
347 |--> Sir Robin
348 |--> Gunther
349 |--> Commander Clement
350
351
Eli Benderskyc1d98692012-03-30 11:44:15 +0300352Additional resources
353^^^^^^^^^^^^^^^^^^^^
354
355See http://effbot.org/zone/element-index.htm for tutorials and links to other
356docs.
357
358
359.. _elementtree-xpath:
360
361XPath support
362-------------
363
364This module provides limited support for
365`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
366tree. The goal is to support a small subset of the abbreviated syntax; a full
367XPath engine is outside the scope of the module.
368
369Example
370^^^^^^^
371
372Here's an example that demonstrates some of the XPath capabilities of the
373module. We'll be using the ``countrydata`` XML document from the
374:ref:`Parsing XML <elementtree-parsing-xml>` section::
375
376 import xml.etree.ElementTree as ET
377
378 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200379
380 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300381 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200382
383 # All 'neighbor' grand-children of 'country' children of the top-level
384 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300385 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200386
387 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300388 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200389
390 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300391 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200392
393 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300394 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200395
396Supported XPath syntax
397^^^^^^^^^^^^^^^^^^^^^^
398
Georg Brandl44ea77b2013-03-28 13:28:44 +0100399.. tabularcolumns:: |l|L|
400
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200401+-----------------------+------------------------------------------------------+
402| Syntax | Meaning |
403+=======================+======================================================+
404| ``tag`` | Selects all child elements with the given tag. |
405| | For example, ``spam`` selects all child elements |
Raymond Hettinger1e1e6012014-03-29 11:50:08 -0700406| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200407| | grandchildren named ``egg`` in all children named |
408| | ``spam``. |
409+-----------------------+------------------------------------------------------+
410| ``*`` | Selects all child elements. For example, ``*/egg`` |
411| | selects all grandchildren named ``egg``. |
412+-----------------------+------------------------------------------------------+
413| ``.`` | Selects the current node. This is mostly useful |
414| | at the beginning of the path, to indicate that it's |
415| | a relative path. |
416+-----------------------+------------------------------------------------------+
417| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200418| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200419| | all ``egg`` elements in the entire tree. |
420+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700421| ``..`` | Selects the parent element. Returns ``None`` if the |
422| | path attempts to reach the ancestors of the start |
423| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200424+-----------------------+------------------------------------------------------+
425| ``[@attrib]`` | Selects all elements that have the given attribute. |
426+-----------------------+------------------------------------------------------+
427| ``[@attrib='value']`` | Selects all elements for which the given attribute |
428| | has the given value. The value cannot contain |
429| | quotes. |
430+-----------------------+------------------------------------------------------+
431| ``[tag]`` | Selects all elements that have a child named |
432| | ``tag``. Only immediate children are supported. |
433+-----------------------+------------------------------------------------------+
Raymond Hettingerf6e31b72015-03-22 15:29:09 -0700434| ``[tag=text]`` | Selects all elements that have a child named |
435| | ``tag`` that includes the given ``text``. |
436+-----------------------+------------------------------------------------------+
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200437| ``[position]`` | Selects all elements that are located at the given |
438| | position. The position can be either an integer |
439| | (1 is the first position), the expression ``last()`` |
440| | (for the last position), or a position relative to |
441| | the last position (e.g. ``last()-1``). |
442+-----------------------+------------------------------------------------------+
443
444Predicates (expressions within square brackets) must be preceded by a tag
445name, an asterisk, or another predicate. ``position`` predicates must be
446preceded by a tag name.
447
448Reference
449---------
450
Georg Brandl116aa622007-08-15 14:28:22 +0000451.. _elementtree-functions:
452
453Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200454^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000455
456
Georg Brandl7f01a132009-09-16 15:58:14 +0000457.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Georg Brandlf6945182008-02-01 11:56:49 +0000459 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000460 that will be serialized as an XML comment by the standard serializer. The
461 comment string can be either a bytestring or a Unicode string. *text* is a
462 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000463 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000464
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700465 Note that :class:`XMLParser` skips over comments in the input
466 instead of creating comment objects for them. An :class:`ElementTree` will
467 only contain comment nodes if they have been inserted into to
468 the tree using one of the :class:`Element` methods.
Georg Brandl116aa622007-08-15 14:28:22 +0000469
470.. function:: dump(elem)
471
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000472 Writes an element tree or element structure to sys.stdout. This function
473 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475 The exact output format is implementation dependent. In this version, it's
476 written as an ordinary XML file.
477
478 *elem* is an element tree or an individual element.
479
480
Georg Brandl116aa622007-08-15 14:28:22 +0000481.. function:: fromstring(text)
482
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000483 Parses an XML section from a string constant. Same as :func:`XML`. *text*
484 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000485
486
487.. function:: fromstringlist(sequence, parser=None)
488
489 Parses an XML document from a sequence of string fragments. *sequence* is a
490 list or other sequence containing XML data fragments. *parser* is an
491 optional parser instance. If not given, the standard :class:`XMLParser`
492 parser is used. Returns an :class:`Element` instance.
493
Ezio Melottif8754a62010-03-21 07:16:43 +0000494 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000495
496
497.. function:: iselement(element)
498
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000499 Checks if an object appears to be a valid element object. *element* is an
500 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000501
502
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000503.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000504
505 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200506 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700507 containing XML data. *events* is a sequence of events to report back. The
Eli Benderskyb5869342013-08-30 05:51:20 -0700508 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
509 ``"end-ns"`` (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200510 information). If *events* is omitted, only ``"end"`` events are reported.
511 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700512 :class:`XMLParser` parser is used. *parser* must be a subclass of
513 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
514 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000515
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700516 Note that while :func:`iterparse` builds the tree incrementally, it issues
517 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700518 for applications where blocking reads can't be made. For fully non-blocking
519 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700520
Benjamin Peterson75edad02009-01-01 15:05:06 +0000521 .. note::
522
Eli Benderskyb5869342013-08-30 05:51:20 -0700523 :func:`iterparse` only guarantees that it has seen the ">" character of a
524 starting tag when it emits a "start" event, so the attributes are defined,
525 but the contents of the text and tail attributes are undefined at that
526 point. The same applies to the element children; they may or may not be
527 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000528
529 If you need a fully populated element, look for "end" events instead.
530
Eli Benderskyb5869342013-08-30 05:51:20 -0700531 .. deprecated:: 3.4
532 The *parser* argument.
533
Georg Brandl7f01a132009-09-16 15:58:14 +0000534.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000535
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000536 Parses an XML section into an element tree. *source* is a filename or file
537 object containing XML data. *parser* is an optional parser instance. If
538 not given, the standard :class:`XMLParser` parser is used. Returns an
539 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000540
541
Georg Brandl7f01a132009-09-16 15:58:14 +0000542.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000543
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000544 PI element factory. This factory function creates a special element that
545 will be serialized as an XML processing instruction. *target* is a string
546 containing the PI target. *text* is a string containing the PI contents, if
547 given. Returns an element instance, representing a processing instruction.
548
Eli Bendersky0bd22d42014-04-03 06:14:38 -0700549 Note that :class:`XMLParser` skips over processing instructions
550 in the input instead of creating comment objects for them. An
551 :class:`ElementTree` will only contain processing instruction nodes if
552 they have been inserted into to the tree using one of the
553 :class:`Element` methods.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000554
555.. function:: register_namespace(prefix, uri)
556
557 Registers a namespace prefix. The registry is global, and any existing
558 mapping for either the given prefix or the namespace URI will be removed.
559 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
560 attributes in this namespace will be serialized with the given prefix, if at
561 all possible.
562
Ezio Melottif8754a62010-03-21 07:16:43 +0000563 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000564
565
Georg Brandl7f01a132009-09-16 15:58:14 +0000566.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000567
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000568 Subelement factory. This function creates an element instance, and appends
569 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000570
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000571 The element name, attribute names, and attribute values can be either
572 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
573 the subelement name. *attrib* is an optional dictionary, containing element
574 attributes. *extra* contains additional attributes, given as keyword
575 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000576
577
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200578.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800579 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000580
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000581 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000582 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000583 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700584 generate a Unicode string (otherwise, a bytestring is generated). *method*
585 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800586 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700587 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000588
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800589 .. versionadded:: 3.4
590 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000591
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800592
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200593.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800594 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000595
596 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000597 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000598 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700599 generate a Unicode string (otherwise, a bytestring is generated). *method*
600 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800601 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700602 Returns a list of (optionally) encoded strings containing the XML data.
603 It does not guarantee any specific sequence, except that
Serhiy Storchaka5e028ae2014-02-06 21:10:41 +0200604 ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000605
Ezio Melottif8754a62010-03-21 07:16:43 +0000606 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000607
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800608 .. versionadded:: 3.4
609 The *short_empty_elements* parameter.
610
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000611
612.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000613
614 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000615 embed "XML literals" in Python code. *text* is a string containing XML
616 data. *parser* is an optional parser instance. If not given, the standard
617 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000618
619
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000620.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000621
622 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000623 which maps from element id:s to elements. *text* is a string containing XML
624 data. *parser* is an optional parser instance. If not given, the standard
625 :class:`XMLParser` parser is used. Returns a tuple containing an
626 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000627
628
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000629.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000631Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200632^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000633
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000634.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000636 Element class. This class defines the Element interface, and provides a
637 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000638
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000639 The element name, attribute names, and attribute values can be either
640 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
641 an optional dictionary, containing element attributes. *extra* contains
642 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000643
644
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000645 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000646
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000647 A string identifying what kind of data this element represents (the
648 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000649
650
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000651 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000652
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000653 The *text* attribute can be used to hold additional data associated with
654 the element. As the name implies this attribute is usually a string but
655 may be any application-specific object. If the element is created from
656 an XML file the attribute will contain any text found between the element
657 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000658
659
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000660 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000661
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000662 The *tail* attribute can be used to hold additional data associated with
663 the element. This attribute is usually a string but may be any
664 application-specific object. If the element is created from an XML file
665 the attribute will contain any text found after the element's end tag and
666 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000667
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000669 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000670
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000671 A dictionary containing the element's attributes. Note that while the
672 *attrib* value is always a real mutable Python dictionary, an ElementTree
673 implementation may choose to use another internal representation, and
674 create the dictionary only if someone asks for it. To take advantage of
675 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000676
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000677 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000678
679
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000680 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000681
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000682 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700683 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000684
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000685
686 .. method:: get(key, default=None)
687
688 Gets the element attribute named *key*.
689
690 Returns the attribute value, or *default* if the attribute was not found.
691
692
693 .. method:: items()
694
695 Returns the element attributes as a sequence of (name, value) pairs. The
696 attributes are returned in an arbitrary order.
697
698
699 .. method:: keys()
700
701 Returns the elements attribute names as a list. The names are returned
702 in an arbitrary order.
703
704
705 .. method:: set(key, value)
706
707 Set the attribute *key* on the element to *value*.
708
709 The following methods work on the element's children (subelements).
710
711
712 .. method:: append(subelement)
713
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200714 Adds the element *subelement* to the end of this element's internal list
715 of subelements. Raises :exc:`TypeError` if *subelement* is not an
716 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000717
718
719 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000721 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200722 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000723
Ezio Melottif8754a62010-03-21 07:16:43 +0000724 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000725
Georg Brandl116aa622007-08-15 14:28:22 +0000726
Eli Bendersky737b1732012-05-29 06:02:56 +0300727 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000728
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000729 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200730 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300731 or ``None``. *namespaces* is an optional mapping from namespace prefix
732 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000733
Georg Brandl116aa622007-08-15 14:28:22 +0000734
Eli Bendersky737b1732012-05-29 06:02:56 +0300735 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000736
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200737 Finds all matching subelements, by tag name or
738 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300739 elements in document order. *namespaces* is an optional mapping from
740 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000741
Georg Brandl116aa622007-08-15 14:28:22 +0000742
Eli Bendersky737b1732012-05-29 06:02:56 +0300743 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000744
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000745 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200746 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
747 of the first matching element, or *default* if no element was found.
748 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300749 is returned. *namespaces* is an optional mapping from namespace prefix
750 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000751
Georg Brandl116aa622007-08-15 14:28:22 +0000752
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000753 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000754
Georg Brandl67b21b72010-08-17 15:07:14 +0000755 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000756 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000757
Georg Brandl116aa622007-08-15 14:28:22 +0000758
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000759 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000760
Georg Brandl67b21b72010-08-17 15:07:14 +0000761 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000762 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000763
Georg Brandl116aa622007-08-15 14:28:22 +0000764
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200765 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000766
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200767 Inserts *subelement* at the given position in this element. Raises
768 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000769
Georg Brandl116aa622007-08-15 14:28:22 +0000770
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000771 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000772
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000773 Creates a tree :term:`iterator` with the current element as the root.
774 The iterator iterates over this element and all elements below it, in
775 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
776 elements whose tag equals *tag* are returned from the iterator. If the
777 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000778
Ezio Melotti138fc892011-10-10 00:02:03 +0300779 .. versionadded:: 3.2
780
Georg Brandl116aa622007-08-15 14:28:22 +0000781
Eli Bendersky737b1732012-05-29 06:02:56 +0300782 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000783
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200784 Finds all matching subelements, by tag name or
785 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300786 matching elements in document order. *namespaces* is an optional mapping
787 from namespace prefix to full name.
788
Georg Brandl116aa622007-08-15 14:28:22 +0000789
Ezio Melottif8754a62010-03-21 07:16:43 +0000790 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000791
Georg Brandl116aa622007-08-15 14:28:22 +0000792
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000793 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000794
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000795 Creates a text iterator. The iterator loops over this element and all
796 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000797
Ezio Melottif8754a62010-03-21 07:16:43 +0000798 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000799
800
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000801 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000802
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000803 Creates a new element object of the same type as this element. Do not
804 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000805
806
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000807 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000808
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000809 Removes *subelement* from the element. Unlike the find\* methods this
810 method compares elements based on the instance identity, not on tag value
811 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000812
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000813 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300814 for working with subelements: :meth:`~object.__delitem__`,
815 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
816 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000817
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000818 Caution: Elements with no subelements will test as ``False``. This behavior
819 will change in future versions. Use specific ``len(elem)`` or ``elem is
820 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000821
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000822 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000823
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000824 if not element: # careful!
825 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000826
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000827 if element is None:
828 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000829
830
831.. _elementtree-elementtree-objects:
832
833ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200834^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000835
836
Georg Brandl7f01a132009-09-16 15:58:14 +0000837.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000838
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000839 ElementTree wrapper class. This class represents an entire element
840 hierarchy, and adds some extra support for serialization to and from
841 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000842
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000843 *element* is the root element. The tree is initialized with the contents
844 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000845
846
Benjamin Petersone41251e2008-04-25 01:59:09 +0000847 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000848
Benjamin Petersone41251e2008-04-25 01:59:09 +0000849 Replaces the root element for this tree. This discards the current
850 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000851 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000852
853
Eli Bendersky737b1732012-05-29 06:02:56 +0300854 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000855
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200856 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000857
858
Eli Bendersky737b1732012-05-29 06:02:56 +0300859 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000860
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200861 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000862
863
Eli Bendersky737b1732012-05-29 06:02:56 +0300864 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000865
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200866 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000867
868
Georg Brandl7f01a132009-09-16 15:58:14 +0000869 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000870
Georg Brandl67b21b72010-08-17 15:07:14 +0000871 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000872 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000873
874
Benjamin Petersone41251e2008-04-25 01:59:09 +0000875 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000876
Benjamin Petersone41251e2008-04-25 01:59:09 +0000877 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000878
879
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000880 .. method:: iter(tag=None)
881
882 Creates and returns a tree iterator for the root element. The iterator
883 loops over all elements in this tree, in section order. *tag* is the tag
884 to look for (default is to return all elements)
885
886
Eli Bendersky737b1732012-05-29 06:02:56 +0300887 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000888
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200889 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000890
Ezio Melottif8754a62010-03-21 07:16:43 +0000891 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000892
893
Georg Brandl7f01a132009-09-16 15:58:14 +0000894 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000895
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000896 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000897 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300898 If not given, the standard :class:`XMLParser` parser is used. Returns the
899 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000900
901
Eli Benderskyf96cf912012-07-15 06:19:44 +0300902 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200903 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800904 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000905
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000906 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300907 :term:`file object` opened for writing. *encoding* [1]_ is the output
908 encoding (default is US-ASCII).
909 *xml_declaration* controls if an XML declaration should be added to the
910 file. Use ``False`` for never, ``True`` for always, ``None``
911 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200912 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300913 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
914 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800915 The keyword-only *short_empty_elements* parameter controls the formatting
916 of elements that contain no content. If *True* (the default), they are
917 emitted as a single self-closed tag, otherwise they are emitted as a pair
918 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300919
920 The output is either a string (:class:`str`) or binary (:class:`bytes`).
921 This is controlled by the *encoding* argument. If *encoding* is
922 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
923 this may conflict with the type of *file* if it's an open
924 :term:`file object`; make sure you do not try to write a string to a
925 binary stream and vice versa.
926
R David Murray575fb312013-12-25 23:21:03 -0500927 .. versionadded:: 3.4
928 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800929
Georg Brandl116aa622007-08-15 14:28:22 +0000930
Christian Heimesd8654cf2007-12-02 15:22:16 +0000931This is the XML file that is going to be manipulated::
932
933 <html>
934 <head>
935 <title>Example page</title>
936 </head>
937 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000938 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000939 or <a href="http://example.com/">example.com</a>.</p>
940 </body>
941 </html>
942
943Example of changing the attribute "target" of every link in first paragraph::
944
945 >>> from xml.etree.ElementTree import ElementTree
946 >>> tree = ElementTree()
947 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000948 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000949 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
950 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000951 <Element 'p' at 0xb77ec26c>
952 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000953 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000954 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000955 >>> for i in links: # Iterates through all found links
956 ... i.attrib["target"] = "blank"
957 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000958
959.. _elementtree-qname-objects:
960
961QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200962^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000963
964
Georg Brandl7f01a132009-09-16 15:58:14 +0000965.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000966
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000967 QName wrapper. This can be used to wrap a QName attribute value, in order
968 to get proper namespace handling on output. *text_or_uri* is a string
969 containing the QName value, in the form {uri}local, or, if the tag argument
970 is given, the URI part of a QName. If *tag* is given, the first argument is
971 interpreted as an URI, and this argument is interpreted as a local name.
972 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000973
974
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200975
Georg Brandl116aa622007-08-15 14:28:22 +0000976.. _elementtree-treebuilder-objects:
977
978TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200979^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000980
981
Georg Brandl7f01a132009-09-16 15:58:14 +0000982.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000983
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000984 Generic element structure builder. This builder converts a sequence of
985 start, data, and end method calls to a well-formed element structure. You
986 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300987 or a parser for some other XML-like format. *element_factory*, when given,
988 must be a callable accepting two positional arguments: a tag and
989 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000990
Benjamin Petersone41251e2008-04-25 01:59:09 +0000991 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000992
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000993 Flushes the builder buffers, and returns the toplevel document
994 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000995
996
Benjamin Petersone41251e2008-04-25 01:59:09 +0000997 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000998
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000999 Adds text to the current element. *data* is a string. This should be
1000 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +00001001
1002
Benjamin Petersone41251e2008-04-25 01:59:09 +00001003 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +00001004
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001005 Closes the current element. *tag* is the element name. Returns the
1006 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +00001007
1008
Benjamin Petersone41251e2008-04-25 01:59:09 +00001009 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +00001010
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001011 Opens a new element. *tag* is the element name. *attrs* is a dictionary
1012 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +00001013
1014
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001015 In addition, a custom :class:`TreeBuilder` object can provide the
1016 following method:
Georg Brandl116aa622007-08-15 14:28:22 +00001017
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001018 .. method:: doctype(name, pubid, system)
1019
1020 Handles a doctype declaration. *name* is the doctype name. *pubid* is
1021 the public identifier. *system* is the system identifier. This method
1022 does not exist on the default :class:`TreeBuilder` class.
1023
Ezio Melottif8754a62010-03-21 07:16:43 +00001024 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +00001025
1026
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001027.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +00001028
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001029XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001030^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001031
1032
1033.. class:: XMLParser(html=0, target=None, encoding=None)
1034
Eli Benderskyb5869342013-08-30 05:51:20 -07001035 This class is the low-level building block of the module. It uses
1036 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
1037 be fed XML data incrementall with the :meth:`feed` method, and parsing events
1038 are translated to a push API - by invoking callbacks on the *target* object.
1039 If *target* is omitted, the standard :class:`TreeBuilder` is used. The
1040 *html* argument was historically used for backwards compatibility and is now
1041 deprecated. If *encoding* [1]_ is given, the value overrides the encoding
Eli Bendersky52467b12012-06-01 07:13:08 +03001042 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +00001043
Eli Benderskyb5869342013-08-30 05:51:20 -07001044 .. deprecated:: 3.4
Larry Hastings3732ed22014-03-15 21:13:56 -07001045 The *html* argument. The remaining arguments should be passed via
1046 keywword to prepare for the removal of the *html* argument.
Georg Brandl116aa622007-08-15 14:28:22 +00001047
Benjamin Petersone41251e2008-04-25 01:59:09 +00001048 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +00001049
Eli Benderskybfd78372013-08-24 15:11:44 -07001050 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -07001051 ``close()`` method of the *target* passed during construction; by default,
1052 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +00001053
1054
Benjamin Petersone41251e2008-04-25 01:59:09 +00001055 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +00001056
Georg Brandl67b21b72010-08-17 15:07:14 +00001057 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001058 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
1059 target.
Georg Brandl116aa622007-08-15 14:28:22 +00001060
1061
Benjamin Petersone41251e2008-04-25 01:59:09 +00001062 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +00001063
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001064 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +00001065
Eli Benderskyb5869342013-08-30 05:51:20 -07001066 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
1067 for each opening tag, its ``end(tag)`` method for each closing tag, and data
1068 is processed by method ``data(data)``. :meth:`XMLParser.close` calls
1069 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
1070 building a tree structure. This is an example of counting the maximum depth
1071 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +00001072
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001073 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001074 >>> class MaxDepth: # The target object of the parser
1075 ... maxDepth = 0
1076 ... depth = 0
1077 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001078 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001079 ... if self.depth > self.maxDepth:
1080 ... self.maxDepth = self.depth
1081 ... def end(self, tag): # Called for each closing tag.
1082 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001083 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001084 ... pass # We do not need to do anything with data.
1085 ... def close(self): # Called when all data has been parsed.
1086 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001087 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001088 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001089 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001090 >>> exampleXml = """
1091 ... <a>
1092 ... <b>
1093 ... </b>
1094 ... <b>
1095 ... <c>
1096 ... <d>
1097 ... </d>
1098 ... </c>
1099 ... </b>
1100 ... </a>"""
1101 >>> parser.feed(exampleXml)
1102 >>> parser.close()
1103 4
Christian Heimesb186d002008-03-18 15:15:01 +00001104
Eli Benderskyb5869342013-08-30 05:51:20 -07001105
1106.. _elementtree-xmlpullparser-objects:
1107
1108XMLPullParser Objects
1109^^^^^^^^^^^^^^^^^^^^^
1110
1111.. class:: XMLPullParser(events=None)
1112
Eli Bendersky2c68e302013-08-31 07:37:23 -07001113 A pull parser suitable for non-blocking applications. Its input-side API is
1114 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1115 callback target, :class:`XMLPullParser` collects an internal list of parsing
1116 events and lets the user read from it. *events* is a sequence of events to
1117 report back. The supported events are the strings ``"start"``, ``"end"``,
1118 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
1119 namespace information). If *events* is omitted, only ``"end"`` events are
1120 reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001121
1122 .. method:: feed(data)
1123
1124 Feed the given bytes data to the parser.
1125
1126 .. method:: close()
1127
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001128 Signal the parser that the data stream is terminated. Unlike
1129 :meth:`XMLParser.close`, this method always returns :const:`None`.
1130 Any events not yet retrieved when the parser is closed can still be
1131 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001132
1133 .. method:: read_events()
1134
R David Murray410d3202014-01-04 23:52:50 -05001135 Return an iterator over the events which have been encountered in the
1136 data fed to the
1137 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001138 string representing the type of event (e.g. ``"end"``) and *elem* is the
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001139 encountered :class:`Element` object.
1140
1141 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001142 yielded again. Events are consumed from the internal queue only when
1143 they are retrieved from the iterator, so multiple readers iterating in
1144 parallel over iterators obtained from :meth:`read_events` will have
1145 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001146
1147 .. note::
1148
1149 :class:`XMLPullParser` only guarantees that it has seen the ">"
1150 character of a starting tag when it emits a "start" event, so the
1151 attributes are defined, but the contents of the text and tail attributes
1152 are undefined at that point. The same applies to the element children;
1153 they may or may not be present.
1154
1155 If you need a fully populated element, look for "end" events instead.
1156
1157 .. versionadded:: 3.4
1158
Eli Bendersky5b77d812012-03-16 08:20:05 +02001159Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001160^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001161
1162.. class:: ParseError
1163
1164 XML parse error, raised by the various parsing methods in this module when
1165 parsing fails. The string representation of an instance of this exception
1166 will contain a user-friendly error message. In addition, it will have
1167 the following attributes available:
1168
1169 .. attribute:: code
1170
1171 A numeric error code from the expat parser. See the documentation of
1172 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1173
1174 .. attribute:: position
1175
1176 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001177
1178.. rubric:: Footnotes
1179
1180.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001181 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1182 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandlb7354a62014-10-29 10:57:37 +01001183 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.