blob: c6dbce062a82c5ef67b4ccea50f187fb7fa45755 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
R David Murray410d3202014-01-04 23:52:50 -0500108
109.. _elementtree-pull-parsing:
110
Eli Bendersky2c68e302013-08-31 07:37:23 -0700111Pull API for non-blocking parsing
Eli Benderskyb5869342013-08-30 05:51:20 -0700112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3bdead12013-04-20 09:06:27 -0700113
R David Murray410d3202014-01-04 23:52:50 -0500114Most parsing functions provided by this module require the whole document
115to be read at once before returning any result. It is possible to use an
116:class:`XMLParser` and feed data into it incrementally, but it is a push API that
Eli Benderskyb5869342013-08-30 05:51:20 -0700117calls methods on a callback target, which is too low-level and inconvenient for
118most needs. Sometimes what the user really wants is to be able to parse XML
119incrementally, without blocking operations, while enjoying the convenience of
120fully constructed :class:`Element` objects.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700121
Eli Benderskyb5869342013-08-30 05:51:20 -0700122The most powerful tool for doing this is :class:`XMLPullParser`. It does not
123require a blocking read to obtain the XML data, and is instead fed with data
124incrementally with :meth:`XMLPullParser.feed` calls. To get the parsed XML
R David Murray410d3202014-01-04 23:52:50 -0500125elements, call :meth:`XMLPullParser.read_events`. Here is an example::
Eli Benderskyb5869342013-08-30 05:51:20 -0700126
Eli Bendersky2c68e302013-08-31 07:37:23 -0700127 >>> parser = ET.XMLPullParser(['start', 'end'])
128 >>> parser.feed('<mytag>sometext')
129 >>> list(parser.read_events())
Eli Benderskyb5869342013-08-30 05:51:20 -0700130 [('start', <Element 'mytag' at 0x7fa66db2be58>)]
Eli Bendersky2c68e302013-08-31 07:37:23 -0700131 >>> parser.feed(' more text</mytag>')
132 >>> for event, elem in parser.read_events():
Eli Benderskyb5869342013-08-30 05:51:20 -0700133 ... print(event)
134 ... print(elem.tag, 'text=', elem.text)
135 ...
136 end
Eli Bendersky3bdead12013-04-20 09:06:27 -0700137
Eli Bendersky2c68e302013-08-31 07:37:23 -0700138The obvious use case is applications that operate in a non-blocking fashion
Eli Bendersky3bdead12013-04-20 09:06:27 -0700139where the XML data is being received from a socket or read incrementally from
140some storage device. In such cases, blocking reads are unacceptable.
141
Eli Benderskyb5869342013-08-30 05:51:20 -0700142Because it's so flexible, :class:`XMLPullParser` can be inconvenient to use for
143simpler use-cases. If you don't mind your application blocking on reading XML
144data but would still like to have incremental parsing capabilities, take a look
145at :func:`iterparse`. It can be useful when you're reading a large XML document
146and don't want to hold it wholly in memory.
Eli Bendersky3bdead12013-04-20 09:06:27 -0700147
Eli Benderskyc1d98692012-03-30 11:44:15 +0300148Finding interesting elements
149^^^^^^^^^^^^^^^^^^^^^^^^^^^^
150
151:class:`Element` has some useful methods that help iterate recursively over all
152the sub-tree below it (its children, their children, and so on). For example,
153:meth:`Element.iter`::
154
155 >>> for neighbor in root.iter('neighbor'):
156 ... print(neighbor.attrib)
157 ...
158 {'name': 'Austria', 'direction': 'E'}
159 {'name': 'Switzerland', 'direction': 'W'}
160 {'name': 'Malaysia', 'direction': 'N'}
161 {'name': 'Costa Rica', 'direction': 'W'}
162 {'name': 'Colombia', 'direction': 'E'}
163
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300164:meth:`Element.findall` finds only elements with a tag which are direct
165children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandlbdaee3a2013-10-06 09:23:03 +0200166with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300167content. :meth:`Element.get` accesses the element's attributes::
168
169 >>> for country in root.findall('country'):
170 ... rank = country.find('rank').text
171 ... name = country.get('name')
172 ... print(name, rank)
173 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300174 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300175 Singapore 4
176 Panama 68
177
Eli Benderskyc1d98692012-03-30 11:44:15 +0300178More sophisticated specification of which elements to look for is possible by
179using :ref:`XPath <elementtree-xpath>`.
180
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300181Modifying an XML File
182^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300183
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300184:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300185The :meth:`ElementTree.write` method serves this purpose.
186
187Once created, an :class:`Element` object may be manipulated by directly changing
188its fields (such as :attr:`Element.text`), adding and modifying attributes
189(:meth:`Element.set` method), as well as adding new children (for example
190with :meth:`Element.append`).
191
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300192Let's say we want to add one to each country's rank, and add an ``updated``
193attribute to the rank element::
194
195 >>> for rank in root.iter('rank'):
196 ... new_rank = int(rank.text) + 1
197 ... rank.text = str(new_rank)
198 ... rank.set('updated', 'yes')
199 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300200 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300201
202Our XML now looks like this:
203
204.. code-block:: xml
205
206 <?xml version="1.0"?>
207 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300208 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300209 <rank updated="yes">2</rank>
210 <year>2008</year>
211 <gdppc>141100</gdppc>
212 <neighbor name="Austria" direction="E"/>
213 <neighbor name="Switzerland" direction="W"/>
214 </country>
215 <country name="Singapore">
216 <rank updated="yes">5</rank>
217 <year>2011</year>
218 <gdppc>59900</gdppc>
219 <neighbor name="Malaysia" direction="N"/>
220 </country>
221 <country name="Panama">
222 <rank updated="yes">69</rank>
223 <year>2011</year>
224 <gdppc>13600</gdppc>
225 <neighbor name="Costa Rica" direction="W"/>
226 <neighbor name="Colombia" direction="E"/>
227 </country>
228 </data>
229
230We can remove elements using :meth:`Element.remove`. Let's say we want to
231remove all countries with a rank higher than 50::
232
233 >>> for country in root.findall('country'):
234 ... rank = int(country.find('rank').text)
235 ... if rank > 50:
236 ... root.remove(country)
237 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300238 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300239
240Our XML now looks like this:
241
242.. code-block:: xml
243
244 <?xml version="1.0"?>
245 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300246 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300247 <rank updated="yes">2</rank>
248 <year>2008</year>
249 <gdppc>141100</gdppc>
250 <neighbor name="Austria" direction="E"/>
251 <neighbor name="Switzerland" direction="W"/>
252 </country>
253 <country name="Singapore">
254 <rank updated="yes">5</rank>
255 <year>2011</year>
256 <gdppc>59900</gdppc>
257 <neighbor name="Malaysia" direction="N"/>
258 </country>
259 </data>
260
261Building XML documents
262^^^^^^^^^^^^^^^^^^^^^^
263
Eli Benderskyc1d98692012-03-30 11:44:15 +0300264The :func:`SubElement` function also provides a convenient way to create new
265sub-elements for a given element::
266
267 >>> a = ET.Element('a')
268 >>> b = ET.SubElement(a, 'b')
269 >>> c = ET.SubElement(a, 'c')
270 >>> d = ET.SubElement(c, 'd')
271 >>> ET.dump(a)
272 <a><b /><c><d /></c></a>
273
274Additional resources
275^^^^^^^^^^^^^^^^^^^^
276
277See http://effbot.org/zone/element-index.htm for tutorials and links to other
278docs.
279
280
281.. _elementtree-xpath:
282
283XPath support
284-------------
285
286This module provides limited support for
287`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
288tree. The goal is to support a small subset of the abbreviated syntax; a full
289XPath engine is outside the scope of the module.
290
291Example
292^^^^^^^
293
294Here's an example that demonstrates some of the XPath capabilities of the
295module. We'll be using the ``countrydata`` XML document from the
296:ref:`Parsing XML <elementtree-parsing-xml>` section::
297
298 import xml.etree.ElementTree as ET
299
300 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200301
302 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300303 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200304
305 # All 'neighbor' grand-children of 'country' children of the top-level
306 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300307 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200308
309 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300310 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200311
312 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300313 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200314
315 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300316 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200317
318Supported XPath syntax
319^^^^^^^^^^^^^^^^^^^^^^
320
Georg Brandl44ea77b2013-03-28 13:28:44 +0100321.. tabularcolumns:: |l|L|
322
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200323+-----------------------+------------------------------------------------------+
324| Syntax | Meaning |
325+=======================+======================================================+
326| ``tag`` | Selects all child elements with the given tag. |
327| | For example, ``spam`` selects all child elements |
328| | named ``spam``, ``spam/egg`` selects all |
329| | grandchildren named ``egg`` in all children named |
330| | ``spam``. |
331+-----------------------+------------------------------------------------------+
332| ``*`` | Selects all child elements. For example, ``*/egg`` |
333| | selects all grandchildren named ``egg``. |
334+-----------------------+------------------------------------------------------+
335| ``.`` | Selects the current node. This is mostly useful |
336| | at the beginning of the path, to indicate that it's |
337| | a relative path. |
338+-----------------------+------------------------------------------------------+
339| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200340| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200341| | all ``egg`` elements in the entire tree. |
342+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700343| ``..`` | Selects the parent element. Returns ``None`` if the |
344| | path attempts to reach the ancestors of the start |
345| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200346+-----------------------+------------------------------------------------------+
347| ``[@attrib]`` | Selects all elements that have the given attribute. |
348+-----------------------+------------------------------------------------------+
349| ``[@attrib='value']`` | Selects all elements for which the given attribute |
350| | has the given value. The value cannot contain |
351| | quotes. |
352+-----------------------+------------------------------------------------------+
353| ``[tag]`` | Selects all elements that have a child named |
354| | ``tag``. Only immediate children are supported. |
355+-----------------------+------------------------------------------------------+
356| ``[position]`` | Selects all elements that are located at the given |
357| | position. The position can be either an integer |
358| | (1 is the first position), the expression ``last()`` |
359| | (for the last position), or a position relative to |
360| | the last position (e.g. ``last()-1``). |
361+-----------------------+------------------------------------------------------+
362
363Predicates (expressions within square brackets) must be preceded by a tag
364name, an asterisk, or another predicate. ``position`` predicates must be
365preceded by a tag name.
366
367Reference
368---------
369
Georg Brandl116aa622007-08-15 14:28:22 +0000370.. _elementtree-functions:
371
372Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200373^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000374
375
Georg Brandl7f01a132009-09-16 15:58:14 +0000376.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000377
Georg Brandlf6945182008-02-01 11:56:49 +0000378 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000379 that will be serialized as an XML comment by the standard serializer. The
380 comment string can be either a bytestring or a Unicode string. *text* is a
381 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000382 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000383
384
385.. function:: dump(elem)
386
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000387 Writes an element tree or element structure to sys.stdout. This function
388 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000389
390 The exact output format is implementation dependent. In this version, it's
391 written as an ordinary XML file.
392
393 *elem* is an element tree or an individual element.
394
395
Georg Brandl116aa622007-08-15 14:28:22 +0000396.. function:: fromstring(text)
397
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000398 Parses an XML section from a string constant. Same as :func:`XML`. *text*
399 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000400
401
402.. function:: fromstringlist(sequence, parser=None)
403
404 Parses an XML document from a sequence of string fragments. *sequence* is a
405 list or other sequence containing XML data fragments. *parser* is an
406 optional parser instance. If not given, the standard :class:`XMLParser`
407 parser is used. Returns an :class:`Element` instance.
408
Ezio Melottif8754a62010-03-21 07:16:43 +0000409 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000410
411
412.. function:: iselement(element)
413
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000414 Checks if an object appears to be a valid element object. *element* is an
415 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000416
417
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000418.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000419
420 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200421 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700422 containing XML data. *events* is a sequence of events to report back. The
Eli Benderskyb5869342013-08-30 05:51:20 -0700423 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and
424 ``"end-ns"`` (the "ns" events are used to get detailed namespace
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200425 information). If *events* is omitted, only ``"end"`` events are reported.
426 *parser* is an optional parser instance. If not given, the standard
Eli Benderskyb5869342013-08-30 05:51:20 -0700427 :class:`XMLParser` parser is used. *parser* must be a subclass of
428 :class:`XMLParser` and can only use the default :class:`TreeBuilder` as a
429 target. Returns an :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000430
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700431 Note that while :func:`iterparse` builds the tree incrementally, it issues
432 blocking reads on *source* (or the file it names). As such, it's unsuitable
Eli Bendersky2c68e302013-08-31 07:37:23 -0700433 for applications where blocking reads can't be made. For fully non-blocking
434 parsing, see :class:`XMLPullParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700435
Benjamin Peterson75edad02009-01-01 15:05:06 +0000436 .. note::
437
Eli Benderskyb5869342013-08-30 05:51:20 -0700438 :func:`iterparse` only guarantees that it has seen the ">" character of a
439 starting tag when it emits a "start" event, so the attributes are defined,
440 but the contents of the text and tail attributes are undefined at that
441 point. The same applies to the element children; they may or may not be
442 present.
Benjamin Peterson75edad02009-01-01 15:05:06 +0000443
444 If you need a fully populated element, look for "end" events instead.
445
Eli Benderskyb5869342013-08-30 05:51:20 -0700446 .. deprecated:: 3.4
447 The *parser* argument.
448
Georg Brandl7f01a132009-09-16 15:58:14 +0000449.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000450
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000451 Parses an XML section into an element tree. *source* is a filename or file
452 object containing XML data. *parser* is an optional parser instance. If
453 not given, the standard :class:`XMLParser` parser is used. Returns an
454 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000455
456
Georg Brandl7f01a132009-09-16 15:58:14 +0000457.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000458
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000459 PI element factory. This factory function creates a special element that
460 will be serialized as an XML processing instruction. *target* is a string
461 containing the PI target. *text* is a string containing the PI contents, if
462 given. Returns an element instance, representing a processing instruction.
463
464
465.. function:: register_namespace(prefix, uri)
466
467 Registers a namespace prefix. The registry is global, and any existing
468 mapping for either the given prefix or the namespace URI will be removed.
469 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
470 attributes in this namespace will be serialized with the given prefix, if at
471 all possible.
472
Ezio Melottif8754a62010-03-21 07:16:43 +0000473 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475
Georg Brandl7f01a132009-09-16 15:58:14 +0000476.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000477
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000478 Subelement factory. This function creates an element instance, and appends
479 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000480
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000481 The element name, attribute names, and attribute values can be either
482 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
483 the subelement name. *attrib* is an optional dictionary, containing element
484 attributes. *extra* contains additional attributes, given as keyword
485 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000486
487
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200488.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800489 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000490
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000491 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000492 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000493 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700494 generate a Unicode string (otherwise, a bytestring is generated). *method*
495 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800496 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700497 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000498
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800499 .. versionadded:: 3.4
500 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000501
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800502
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200503.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800504 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000505
506 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000507 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000508 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700509 generate a Unicode string (otherwise, a bytestring is generated). *method*
510 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800511 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700512 Returns a list of (optionally) encoded strings containing the XML data.
513 It does not guarantee any specific sequence, except that
Serhiy Storchaka5e028ae2014-02-06 21:10:41 +0200514 ``b"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000515
Ezio Melottif8754a62010-03-21 07:16:43 +0000516 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000517
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800518 .. versionadded:: 3.4
519 The *short_empty_elements* parameter.
520
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000521
522.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000523
524 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000525 embed "XML literals" in Python code. *text* is a string containing XML
526 data. *parser* is an optional parser instance. If not given, the standard
527 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000528
529
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000530.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000533 which maps from element id:s to elements. *text* is a string containing XML
534 data. *parser* is an optional parser instance. If not given, the standard
535 :class:`XMLParser` parser is used. Returns a tuple containing an
536 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000537
538
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000539.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000540
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000541Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200542^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000543
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000544.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000545
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000546 Element class. This class defines the Element interface, and provides a
547 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000548
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000549 The element name, attribute names, and attribute values can be either
550 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
551 an optional dictionary, containing element attributes. *extra* contains
552 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000553
554
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000555 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000556
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000557 A string identifying what kind of data this element represents (the
558 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000561 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000562
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000563 The *text* attribute can be used to hold additional data associated with
564 the element. As the name implies this attribute is usually a string but
565 may be any application-specific object. If the element is created from
566 an XML file the attribute will contain any text found between the element
567 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000568
569
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000570 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000571
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000572 The *tail* attribute can be used to hold additional data associated with
573 the element. This attribute is usually a string but may be any
574 application-specific object. If the element is created from an XML file
575 the attribute will contain any text found after the element's end tag and
576 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000577
Georg Brandl116aa622007-08-15 14:28:22 +0000578
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000579 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000580
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000581 A dictionary containing the element's attributes. Note that while the
582 *attrib* value is always a real mutable Python dictionary, an ElementTree
583 implementation may choose to use another internal representation, and
584 create the dictionary only if someone asks for it. To take advantage of
585 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000586
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000587 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000588
589
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000590 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000591
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000592 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700593 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000594
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000595
596 .. method:: get(key, default=None)
597
598 Gets the element attribute named *key*.
599
600 Returns the attribute value, or *default* if the attribute was not found.
601
602
603 .. method:: items()
604
605 Returns the element attributes as a sequence of (name, value) pairs. The
606 attributes are returned in an arbitrary order.
607
608
609 .. method:: keys()
610
611 Returns the elements attribute names as a list. The names are returned
612 in an arbitrary order.
613
614
615 .. method:: set(key, value)
616
617 Set the attribute *key* on the element to *value*.
618
619 The following methods work on the element's children (subelements).
620
621
622 .. method:: append(subelement)
623
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200624 Adds the element *subelement* to the end of this element's internal list
625 of subelements. Raises :exc:`TypeError` if *subelement* is not an
626 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000627
628
629 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000630
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000631 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200632 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000633
Ezio Melottif8754a62010-03-21 07:16:43 +0000634 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000635
Georg Brandl116aa622007-08-15 14:28:22 +0000636
Eli Bendersky737b1732012-05-29 06:02:56 +0300637 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000638
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000639 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200640 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300641 or ``None``. *namespaces* is an optional mapping from namespace prefix
642 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000643
Georg Brandl116aa622007-08-15 14:28:22 +0000644
Eli Bendersky737b1732012-05-29 06:02:56 +0300645 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000646
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200647 Finds all matching subelements, by tag name or
648 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300649 elements in document order. *namespaces* is an optional mapping from
650 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000651
Georg Brandl116aa622007-08-15 14:28:22 +0000652
Eli Bendersky737b1732012-05-29 06:02:56 +0300653 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000654
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000655 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200656 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
657 of the first matching element, or *default* if no element was found.
658 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300659 is returned. *namespaces* is an optional mapping from namespace prefix
660 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
Georg Brandl116aa622007-08-15 14:28:22 +0000662
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000663 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000664
Georg Brandl67b21b72010-08-17 15:07:14 +0000665 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000666 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000667
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000669 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000670
Georg Brandl67b21b72010-08-17 15:07:14 +0000671 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000672 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000673
Georg Brandl116aa622007-08-15 14:28:22 +0000674
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200675 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000676
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200677 Inserts *subelement* at the given position in this element. Raises
678 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000679
Georg Brandl116aa622007-08-15 14:28:22 +0000680
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000681 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000682
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000683 Creates a tree :term:`iterator` with the current element as the root.
684 The iterator iterates over this element and all elements below it, in
685 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
686 elements whose tag equals *tag* are returned from the iterator. If the
687 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000688
Ezio Melotti138fc892011-10-10 00:02:03 +0300689 .. versionadded:: 3.2
690
Georg Brandl116aa622007-08-15 14:28:22 +0000691
Eli Bendersky737b1732012-05-29 06:02:56 +0300692 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000693
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200694 Finds all matching subelements, by tag name or
695 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300696 matching elements in document order. *namespaces* is an optional mapping
697 from namespace prefix to full name.
698
Georg Brandl116aa622007-08-15 14:28:22 +0000699
Ezio Melottif8754a62010-03-21 07:16:43 +0000700 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000701
Georg Brandl116aa622007-08-15 14:28:22 +0000702
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000703 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000704
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000705 Creates a text iterator. The iterator loops over this element and all
706 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000707
Ezio Melottif8754a62010-03-21 07:16:43 +0000708 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000709
710
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000711 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000712
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000713 Creates a new element object of the same type as this element. Do not
714 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000715
716
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000717 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000719 Removes *subelement* from the element. Unlike the find\* methods this
720 method compares elements based on the instance identity, not on tag value
721 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000722
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000723 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300724 for working with subelements: :meth:`~object.__delitem__`,
725 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
726 :meth:`~object.__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000727
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000728 Caution: Elements with no subelements will test as ``False``. This behavior
729 will change in future versions. Use specific ``len(elem)`` or ``elem is
730 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000731
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000732 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000733
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000734 if not element: # careful!
735 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000736
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000737 if element is None:
738 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000739
740
741.. _elementtree-elementtree-objects:
742
743ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200744^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000745
746
Georg Brandl7f01a132009-09-16 15:58:14 +0000747.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000748
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000749 ElementTree wrapper class. This class represents an entire element
750 hierarchy, and adds some extra support for serialization to and from
751 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000752
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000753 *element* is the root element. The tree is initialized with the contents
754 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000755
756
Benjamin Petersone41251e2008-04-25 01:59:09 +0000757 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000758
Benjamin Petersone41251e2008-04-25 01:59:09 +0000759 Replaces the root element for this tree. This discards the current
760 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000761 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000762
763
Eli Bendersky737b1732012-05-29 06:02:56 +0300764 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000765
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200766 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000767
768
Eli Bendersky737b1732012-05-29 06:02:56 +0300769 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000770
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200771 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000772
773
Eli Bendersky737b1732012-05-29 06:02:56 +0300774 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000775
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200776 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000777
778
Georg Brandl7f01a132009-09-16 15:58:14 +0000779 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000780
Georg Brandl67b21b72010-08-17 15:07:14 +0000781 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000782 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000783
784
Benjamin Petersone41251e2008-04-25 01:59:09 +0000785 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000786
Benjamin Petersone41251e2008-04-25 01:59:09 +0000787 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000788
789
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000790 .. method:: iter(tag=None)
791
792 Creates and returns a tree iterator for the root element. The iterator
793 loops over all elements in this tree, in section order. *tag* is the tag
794 to look for (default is to return all elements)
795
796
Eli Bendersky737b1732012-05-29 06:02:56 +0300797 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000798
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200799 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000800
Ezio Melottif8754a62010-03-21 07:16:43 +0000801 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000802
803
Georg Brandl7f01a132009-09-16 15:58:14 +0000804 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000805
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000806 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000807 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300808 If not given, the standard :class:`XMLParser` parser is used. Returns the
809 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000810
811
Eli Benderskyf96cf912012-07-15 06:19:44 +0300812 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200813 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800814 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000815
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000816 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300817 :term:`file object` opened for writing. *encoding* [1]_ is the output
818 encoding (default is US-ASCII).
819 *xml_declaration* controls if an XML declaration should be added to the
820 file. Use ``False`` for never, ``True`` for always, ``None``
821 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200822 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300823 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
824 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800825 The keyword-only *short_empty_elements* parameter controls the formatting
826 of elements that contain no content. If *True* (the default), they are
827 emitted as a single self-closed tag, otherwise they are emitted as a pair
828 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300829
830 The output is either a string (:class:`str`) or binary (:class:`bytes`).
831 This is controlled by the *encoding* argument. If *encoding* is
832 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
833 this may conflict with the type of *file* if it's an open
834 :term:`file object`; make sure you do not try to write a string to a
835 binary stream and vice versa.
836
R David Murray575fb312013-12-25 23:21:03 -0500837 .. versionadded:: 3.4
838 The *short_empty_elements* parameter.
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800839
Georg Brandl116aa622007-08-15 14:28:22 +0000840
Christian Heimesd8654cf2007-12-02 15:22:16 +0000841This is the XML file that is going to be manipulated::
842
843 <html>
844 <head>
845 <title>Example page</title>
846 </head>
847 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000848 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000849 or <a href="http://example.com/">example.com</a>.</p>
850 </body>
851 </html>
852
853Example of changing the attribute "target" of every link in first paragraph::
854
855 >>> from xml.etree.ElementTree import ElementTree
856 >>> tree = ElementTree()
857 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000858 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000859 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
860 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000861 <Element 'p' at 0xb77ec26c>
862 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000863 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000864 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000865 >>> for i in links: # Iterates through all found links
866 ... i.attrib["target"] = "blank"
867 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000868
869.. _elementtree-qname-objects:
870
871QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200872^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000873
874
Georg Brandl7f01a132009-09-16 15:58:14 +0000875.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000876
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000877 QName wrapper. This can be used to wrap a QName attribute value, in order
878 to get proper namespace handling on output. *text_or_uri* is a string
879 containing the QName value, in the form {uri}local, or, if the tag argument
880 is given, the URI part of a QName. If *tag* is given, the first argument is
881 interpreted as an URI, and this argument is interpreted as a local name.
882 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000883
884
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200885
Georg Brandl116aa622007-08-15 14:28:22 +0000886.. _elementtree-treebuilder-objects:
887
888TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200889^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000890
891
Georg Brandl7f01a132009-09-16 15:58:14 +0000892.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000893
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000894 Generic element structure builder. This builder converts a sequence of
895 start, data, and end method calls to a well-formed element structure. You
896 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300897 or a parser for some other XML-like format. *element_factory*, when given,
898 must be a callable accepting two positional arguments: a tag and
899 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000900
Benjamin Petersone41251e2008-04-25 01:59:09 +0000901 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000902
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000903 Flushes the builder buffers, and returns the toplevel document
904 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000905
906
Benjamin Petersone41251e2008-04-25 01:59:09 +0000907 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000908
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000909 Adds text to the current element. *data* is a string. This should be
910 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +0000911
912
Benjamin Petersone41251e2008-04-25 01:59:09 +0000913 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +0000914
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000915 Closes the current element. *tag* is the element name. Returns the
916 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +0000917
918
Benjamin Petersone41251e2008-04-25 01:59:09 +0000919 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000920
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000921 Opens a new element. *tag* is the element name. *attrs* is a dictionary
922 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +0000923
924
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000925 In addition, a custom :class:`TreeBuilder` object can provide the
926 following method:
Georg Brandl116aa622007-08-15 14:28:22 +0000927
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000928 .. method:: doctype(name, pubid, system)
929
930 Handles a doctype declaration. *name* is the doctype name. *pubid* is
931 the public identifier. *system* is the system identifier. This method
932 does not exist on the default :class:`TreeBuilder` class.
933
Ezio Melottif8754a62010-03-21 07:16:43 +0000934 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000935
936
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000937.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000938
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000939XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200940^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000941
942
943.. class:: XMLParser(html=0, target=None, encoding=None)
944
Eli Benderskyb5869342013-08-30 05:51:20 -0700945 This class is the low-level building block of the module. It uses
946 :mod:`xml.parsers.expat` for efficient, event-based parsing of XML. It can
947 be fed XML data incrementall with the :meth:`feed` method, and parsing events
948 are translated to a push API - by invoking callbacks on the *target* object.
949 If *target* is omitted, the standard :class:`TreeBuilder` is used. The
950 *html* argument was historically used for backwards compatibility and is now
951 deprecated. If *encoding* [1]_ is given, the value overrides the encoding
Eli Bendersky52467b12012-06-01 07:13:08 +0300952 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +0000953
Eli Benderskyb5869342013-08-30 05:51:20 -0700954 .. deprecated:: 3.4
Larry Hastings3732ed22014-03-15 21:13:56 -0700955 The *html* argument. The remaining arguments should be passed via
956 keywword to prepare for the removal of the *html* argument.
Georg Brandl116aa622007-08-15 14:28:22 +0000957
Benjamin Petersone41251e2008-04-25 01:59:09 +0000958 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000959
Eli Benderskybfd78372013-08-24 15:11:44 -0700960 Finishes feeding data to the parser. Returns the result of calling the
Eli Benderskybf8ab772013-08-25 15:27:36 -0700961 ``close()`` method of the *target* passed during construction; by default,
962 this is the toplevel document element.
Georg Brandl116aa622007-08-15 14:28:22 +0000963
964
Benjamin Petersone41251e2008-04-25 01:59:09 +0000965 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +0000966
Georg Brandl67b21b72010-08-17 15:07:14 +0000967 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000968 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
969 target.
Georg Brandl116aa622007-08-15 14:28:22 +0000970
971
Benjamin Petersone41251e2008-04-25 01:59:09 +0000972 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000973
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000974 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +0000975
Eli Benderskyb5869342013-08-30 05:51:20 -0700976 :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method
977 for each opening tag, its ``end(tag)`` method for each closing tag, and data
978 is processed by method ``data(data)``. :meth:`XMLParser.close` calls
979 *target*\'s method ``close()``. :class:`XMLParser` can be used not only for
980 building a tree structure. This is an example of counting the maximum depth
981 of an XML file::
Christian Heimesd8654cf2007-12-02 15:22:16 +0000982
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000983 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +0000984 >>> class MaxDepth: # The target object of the parser
985 ... maxDepth = 0
986 ... depth = 0
987 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +0000988 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +0000989 ... if self.depth > self.maxDepth:
990 ... self.maxDepth = self.depth
991 ... def end(self, tag): # Called for each closing tag.
992 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +0000993 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +0000994 ... pass # We do not need to do anything with data.
995 ... def close(self): # Called when all data has been parsed.
996 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +0000997 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +0000998 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000999 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001000 >>> exampleXml = """
1001 ... <a>
1002 ... <b>
1003 ... </b>
1004 ... <b>
1005 ... <c>
1006 ... <d>
1007 ... </d>
1008 ... </c>
1009 ... </b>
1010 ... </a>"""
1011 >>> parser.feed(exampleXml)
1012 >>> parser.close()
1013 4
Christian Heimesb186d002008-03-18 15:15:01 +00001014
Eli Benderskyb5869342013-08-30 05:51:20 -07001015
1016.. _elementtree-xmlpullparser-objects:
1017
1018XMLPullParser Objects
1019^^^^^^^^^^^^^^^^^^^^^
1020
1021.. class:: XMLPullParser(events=None)
1022
Eli Bendersky2c68e302013-08-31 07:37:23 -07001023 A pull parser suitable for non-blocking applications. Its input-side API is
1024 similar to that of :class:`XMLParser`, but instead of pushing calls to a
1025 callback target, :class:`XMLPullParser` collects an internal list of parsing
1026 events and lets the user read from it. *events* is a sequence of events to
1027 report back. The supported events are the strings ``"start"``, ``"end"``,
1028 ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed
1029 namespace information). If *events* is omitted, only ``"end"`` events are
1030 reported.
Eli Benderskyb5869342013-08-30 05:51:20 -07001031
1032 .. method:: feed(data)
1033
1034 Feed the given bytes data to the parser.
1035
1036 .. method:: close()
1037
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001038 Signal the parser that the data stream is terminated. Unlike
1039 :meth:`XMLParser.close`, this method always returns :const:`None`.
1040 Any events not yet retrieved when the parser is closed can still be
1041 read with :meth:`read_events`.
Eli Benderskyb5869342013-08-30 05:51:20 -07001042
1043 .. method:: read_events()
1044
R David Murray410d3202014-01-04 23:52:50 -05001045 Return an iterator over the events which have been encountered in the
1046 data fed to the
1047 parser. The iterator yields ``(event, elem)`` pairs, where *event* is a
Eli Benderskyb5869342013-08-30 05:51:20 -07001048 string representing the type of event (e.g. ``"end"``) and *elem* is the
Nick Coghlan4cc2afa2013-09-28 23:50:35 +10001049 encountered :class:`Element` object.
1050
1051 Events provided in a previous call to :meth:`read_events` will not be
R David Murray410d3202014-01-04 23:52:50 -05001052 yielded again. Events are consumed from the internal queue only when
1053 they are retrieved from the iterator, so multiple readers iterating in
1054 parallel over iterators obtained from :meth:`read_events` will have
1055 unpredictable results.
Eli Benderskyb5869342013-08-30 05:51:20 -07001056
1057 .. note::
1058
1059 :class:`XMLPullParser` only guarantees that it has seen the ">"
1060 character of a starting tag when it emits a "start" event, so the
1061 attributes are defined, but the contents of the text and tail attributes
1062 are undefined at that point. The same applies to the element children;
1063 they may or may not be present.
1064
1065 If you need a fully populated element, look for "end" events instead.
1066
1067 .. versionadded:: 3.4
1068
Eli Bendersky5b77d812012-03-16 08:20:05 +02001069Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001070^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001071
1072.. class:: ParseError
1073
1074 XML parse error, raised by the various parsing methods in this module when
1075 parsing fails. The string representation of an instance of this exception
1076 will contain a user-friendly error message. In addition, it will have
1077 the following attributes available:
1078
1079 .. attribute:: code
1080
1081 A numeric error code from the expat parser. See the documentation of
1082 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1083
1084 .. attribute:: position
1085
1086 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001087
1088.. rubric:: Footnotes
1089
1090.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001091 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1092 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Benjamin Petersonad3d5c22009-02-26 03:38:59 +00001093 and http://www.iana.org/assignments/character-sets.