blob: d0bfed0ddaa06ceafa938ac6036bf7ac38e5dea3 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
Eli Benderskyc1d98692012-03-30 11:44:15 +03008The :mod:`xml.etree.ElementTree` module implements a simple and efficient API
9for parsing and creating XML data.
Florent Xiclunaf15351d2010-03-13 23:24:31 +000010
Florent Xiclunaa72a98f2012-02-13 11:03:30 +010011.. versionchanged:: 3.3
12 This module will use a fast implementation whenever available.
13 The :mod:`xml.etree.cElementTree` module is deprecated.
14
Christian Heimes7380a672013-03-26 17:35:55 +010015
16.. warning::
17
18 The :mod:`xml.etree.ElementTree` module is not secure against
19 maliciously constructed data. If you need to parse untrusted or
20 unauthenticated data see :ref:`xml-vulnerabilities`.
21
Eli Benderskyc1d98692012-03-30 11:44:15 +030022Tutorial
23--------
Georg Brandl116aa622007-08-15 14:28:22 +000024
Eli Benderskyc1d98692012-03-30 11:44:15 +030025This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
26short). The goal is to demonstrate some of the building blocks and basic
27concepts of the module.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020028
Eli Benderskyc1d98692012-03-30 11:44:15 +030029XML tree and elements
30^^^^^^^^^^^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020031
Eli Benderskyc1d98692012-03-30 11:44:15 +030032XML is an inherently hierarchical data format, and the most natural way to
33represent it is with a tree. ``ET`` has two classes for this purpose -
34:class:`ElementTree` represents the whole XML document as a tree, and
35:class:`Element` represents a single node in this tree. Interactions with
36the whole document (reading and writing to/from files) are usually done
37on the :class:`ElementTree` level. Interactions with a single XML element
38and its sub-elements are done on the :class:`Element` level.
Eli Bendersky3a4875e2012-03-26 20:43:32 +020039
Eli Benderskyc1d98692012-03-30 11:44:15 +030040.. _elementtree-parsing-xml:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020041
Eli Benderskyc1d98692012-03-30 11:44:15 +030042Parsing XML
43^^^^^^^^^^^
Eli Bendersky3a4875e2012-03-26 20:43:32 +020044
Eli Bendersky0f4e9342012-08-14 07:19:33 +030045We'll be using the following XML document as the sample data for this section:
Eli Bendersky3a4875e2012-03-26 20:43:32 +020046
Eli Bendersky0f4e9342012-08-14 07:19:33 +030047.. code-block:: xml
48
49 <?xml version="1.0"?>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020050 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +030051 <country name="Liechtenstein">
Eli Bendersky3a4875e2012-03-26 20:43:32 +020052 <rank>1</rank>
53 <year>2008</year>
54 <gdppc>141100</gdppc>
55 <neighbor name="Austria" direction="E"/>
56 <neighbor name="Switzerland" direction="W"/>
57 </country>
58 <country name="Singapore">
59 <rank>4</rank>
60 <year>2011</year>
61 <gdppc>59900</gdppc>
62 <neighbor name="Malaysia" direction="N"/>
63 </country>
64 <country name="Panama">
65 <rank>68</rank>
66 <year>2011</year>
67 <gdppc>13600</gdppc>
68 <neighbor name="Costa Rica" direction="W"/>
69 <neighbor name="Colombia" direction="E"/>
70 </country>
71 </data>
Eli Bendersky3a4875e2012-03-26 20:43:32 +020072
Eli Bendersky0f4e9342012-08-14 07:19:33 +030073We can import this data by reading from a file::
Eli Benderskyc1d98692012-03-30 11:44:15 +030074
75 import xml.etree.ElementTree as ET
Eli Bendersky0f4e9342012-08-14 07:19:33 +030076 tree = ET.parse('country_data.xml')
77 root = tree.getroot()
Eli Benderskyc1d98692012-03-30 11:44:15 +030078
Eli Bendersky0f4e9342012-08-14 07:19:33 +030079Or directly from a string::
80
81 root = ET.fromstring(country_data_as_string)
Eli Benderskyc1d98692012-03-30 11:44:15 +030082
83:func:`fromstring` parses XML from a string directly into an :class:`Element`,
84which is the root element of the parsed tree. Other parsing functions may
Eli Bendersky0f4e9342012-08-14 07:19:33 +030085create an :class:`ElementTree`. Check the documentation to be sure.
Eli Benderskyc1d98692012-03-30 11:44:15 +030086
87As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
88
89 >>> root.tag
90 'data'
91 >>> root.attrib
92 {}
93
94It also has children nodes over which we can iterate::
95
96 >>> for child in root:
97 ... print(child.tag, child.attrib)
98 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +030099 country {'name': 'Liechtenstein'}
Eli Benderskyc1d98692012-03-30 11:44:15 +0300100 country {'name': 'Singapore'}
101 country {'name': 'Panama'}
102
103Children are nested, and we can access specific child nodes by index::
104
105 >>> root[0][1].text
106 '2008'
107
Eli Bendersky3bdead12013-04-20 09:06:27 -0700108Incremental parsing
109^^^^^^^^^^^^^^^^^^^
110
111It's possible to parse XML incrementally (i.e. not the whole document at once).
112The most powerful tool for doing this is :class:`IncrementalParser`. It does
113not require a blocking read to obtain the XML data, and is instead fed with
114data incrementally with :meth:`IncrementalParser.data_received` calls. To get
115the parsed XML elements, call :meth:`IncrementalParser.events`. Here's an
116example::
117
118 >>> incparser = ET.IncrementalParser(['start', 'end'])
119 >>> incparser.data_received('<mytag>sometext')
120 >>> list(incparser.events())
121 [('start', <Element 'mytag' at 0x7fba3f2a8688>)]
122 >>> incparser.data_received(' more text</mytag>')
123 >>> for event, elem in incparser.events():
124 ... print(event)
125 ... print(elem.tag, 'text=', elem.text)
126 ...
127 end
128 mytag text= sometext more text
129
130The obvious use case is applications that operate in an asynchronous fashion
131where the XML data is being received from a socket or read incrementally from
132some storage device. In such cases, blocking reads are unacceptable.
133
134Because it's so flexible, :class:`IncrementalParser` can be inconvenient
135to use for simpler use-cases. If you don't mind your application blocking on
136reading XML data but would still like to have incremental parsing capabilities,
137take a look at :func:`iterparse`. It can be useful when you're reading a large
138XML document and don't want to hold it wholly in memory.
139
Eli Benderskyc1d98692012-03-30 11:44:15 +0300140Finding interesting elements
141^^^^^^^^^^^^^^^^^^^^^^^^^^^^
142
143:class:`Element` has some useful methods that help iterate recursively over all
144the sub-tree below it (its children, their children, and so on). For example,
145:meth:`Element.iter`::
146
147 >>> for neighbor in root.iter('neighbor'):
148 ... print(neighbor.attrib)
149 ...
150 {'name': 'Austria', 'direction': 'E'}
151 {'name': 'Switzerland', 'direction': 'W'}
152 {'name': 'Malaysia', 'direction': 'N'}
153 {'name': 'Costa Rica', 'direction': 'W'}
154 {'name': 'Colombia', 'direction': 'E'}
155
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300156:meth:`Element.findall` finds only elements with a tag which are direct
157children of the current element. :meth:`Element.find` finds the *first* child
158with a particular tag, and :meth:`Element.text` accesses the element's text
159content. :meth:`Element.get` accesses the element's attributes::
160
161 >>> for country in root.findall('country'):
162 ... rank = country.find('rank').text
163 ... name = country.get('name')
164 ... print(name, rank)
165 ...
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300166 Liechtenstein 1
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300167 Singapore 4
168 Panama 68
169
Eli Benderskyc1d98692012-03-30 11:44:15 +0300170More sophisticated specification of which elements to look for is possible by
171using :ref:`XPath <elementtree-xpath>`.
172
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300173Modifying an XML File
174^^^^^^^^^^^^^^^^^^^^^
Eli Benderskyc1d98692012-03-30 11:44:15 +0300175
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300176:class:`ElementTree` provides a simple way to build XML documents and write them to files.
Eli Benderskyc1d98692012-03-30 11:44:15 +0300177The :meth:`ElementTree.write` method serves this purpose.
178
179Once created, an :class:`Element` object may be manipulated by directly changing
180its fields (such as :attr:`Element.text`), adding and modifying attributes
181(:meth:`Element.set` method), as well as adding new children (for example
182with :meth:`Element.append`).
183
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300184Let's say we want to add one to each country's rank, and add an ``updated``
185attribute to the rank element::
186
187 >>> for rank in root.iter('rank'):
188 ... new_rank = int(rank.text) + 1
189 ... rank.text = str(new_rank)
190 ... rank.set('updated', 'yes')
191 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300192 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300193
194Our XML now looks like this:
195
196.. code-block:: xml
197
198 <?xml version="1.0"?>
199 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300200 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300201 <rank updated="yes">2</rank>
202 <year>2008</year>
203 <gdppc>141100</gdppc>
204 <neighbor name="Austria" direction="E"/>
205 <neighbor name="Switzerland" direction="W"/>
206 </country>
207 <country name="Singapore">
208 <rank updated="yes">5</rank>
209 <year>2011</year>
210 <gdppc>59900</gdppc>
211 <neighbor name="Malaysia" direction="N"/>
212 </country>
213 <country name="Panama">
214 <rank updated="yes">69</rank>
215 <year>2011</year>
216 <gdppc>13600</gdppc>
217 <neighbor name="Costa Rica" direction="W"/>
218 <neighbor name="Colombia" direction="E"/>
219 </country>
220 </data>
221
222We can remove elements using :meth:`Element.remove`. Let's say we want to
223remove all countries with a rank higher than 50::
224
225 >>> for country in root.findall('country'):
226 ... rank = int(country.find('rank').text)
227 ... if rank > 50:
228 ... root.remove(country)
229 ...
Eli Benderskya1b0f6d2012-08-18 05:42:22 +0300230 >>> tree.write('output.xml')
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300231
232Our XML now looks like this:
233
234.. code-block:: xml
235
236 <?xml version="1.0"?>
237 <data>
Eli Bendersky3115f0d2012-08-15 14:26:30 +0300238 <country name="Liechtenstein">
Eli Bendersky0f4e9342012-08-14 07:19:33 +0300239 <rank updated="yes">2</rank>
240 <year>2008</year>
241 <gdppc>141100</gdppc>
242 <neighbor name="Austria" direction="E"/>
243 <neighbor name="Switzerland" direction="W"/>
244 </country>
245 <country name="Singapore">
246 <rank updated="yes">5</rank>
247 <year>2011</year>
248 <gdppc>59900</gdppc>
249 <neighbor name="Malaysia" direction="N"/>
250 </country>
251 </data>
252
253Building XML documents
254^^^^^^^^^^^^^^^^^^^^^^
255
Eli Benderskyc1d98692012-03-30 11:44:15 +0300256The :func:`SubElement` function also provides a convenient way to create new
257sub-elements for a given element::
258
259 >>> a = ET.Element('a')
260 >>> b = ET.SubElement(a, 'b')
261 >>> c = ET.SubElement(a, 'c')
262 >>> d = ET.SubElement(c, 'd')
263 >>> ET.dump(a)
264 <a><b /><c><d /></c></a>
265
266Additional resources
267^^^^^^^^^^^^^^^^^^^^
268
269See http://effbot.org/zone/element-index.htm for tutorials and links to other
270docs.
271
272
273.. _elementtree-xpath:
274
275XPath support
276-------------
277
278This module provides limited support for
279`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
280tree. The goal is to support a small subset of the abbreviated syntax; a full
281XPath engine is outside the scope of the module.
282
283Example
284^^^^^^^
285
286Here's an example that demonstrates some of the XPath capabilities of the
287module. We'll be using the ``countrydata`` XML document from the
288:ref:`Parsing XML <elementtree-parsing-xml>` section::
289
290 import xml.etree.ElementTree as ET
291
292 root = ET.fromstring(countrydata)
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200293
294 # Top-level elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300295 root.findall(".")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200296
297 # All 'neighbor' grand-children of 'country' children of the top-level
298 # elements
Eli Benderskyc1d98692012-03-30 11:44:15 +0300299 root.findall("./country/neighbor")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200300
301 # Nodes with name='Singapore' that have a 'year' child
Eli Benderskyc1d98692012-03-30 11:44:15 +0300302 root.findall(".//year/..[@name='Singapore']")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200303
304 # 'year' nodes that are children of nodes with name='Singapore'
Eli Benderskyc1d98692012-03-30 11:44:15 +0300305 root.findall(".//*[@name='Singapore']/year")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200306
307 # All 'neighbor' nodes that are the second child of their parent
Eli Benderskyc1d98692012-03-30 11:44:15 +0300308 root.findall(".//neighbor[2]")
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200309
310Supported XPath syntax
311^^^^^^^^^^^^^^^^^^^^^^
312
Georg Brandl44ea77b2013-03-28 13:28:44 +0100313.. tabularcolumns:: |l|L|
314
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200315+-----------------------+------------------------------------------------------+
316| Syntax | Meaning |
317+=======================+======================================================+
318| ``tag`` | Selects all child elements with the given tag. |
319| | For example, ``spam`` selects all child elements |
320| | named ``spam``, ``spam/egg`` selects all |
321| | grandchildren named ``egg`` in all children named |
322| | ``spam``. |
323+-----------------------+------------------------------------------------------+
324| ``*`` | Selects all child elements. For example, ``*/egg`` |
325| | selects all grandchildren named ``egg``. |
326+-----------------------+------------------------------------------------------+
327| ``.`` | Selects the current node. This is mostly useful |
328| | at the beginning of the path, to indicate that it's |
329| | a relative path. |
330+-----------------------+------------------------------------------------------+
331| ``//`` | Selects all subelements, on all levels beneath the |
Eli Benderskyede001a2012-03-27 04:57:23 +0200332| | current element. For example, ``.//egg`` selects |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200333| | all ``egg`` elements in the entire tree. |
334+-----------------------+------------------------------------------------------+
Eli Bendersky323a43a2012-10-09 06:46:33 -0700335| ``..`` | Selects the parent element. Returns ``None`` if the |
336| | path attempts to reach the ancestors of the start |
337| | element (the element ``find`` was called on). |
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200338+-----------------------+------------------------------------------------------+
339| ``[@attrib]`` | Selects all elements that have the given attribute. |
340+-----------------------+------------------------------------------------------+
341| ``[@attrib='value']`` | Selects all elements for which the given attribute |
342| | has the given value. The value cannot contain |
343| | quotes. |
344+-----------------------+------------------------------------------------------+
345| ``[tag]`` | Selects all elements that have a child named |
346| | ``tag``. Only immediate children are supported. |
347+-----------------------+------------------------------------------------------+
348| ``[position]`` | Selects all elements that are located at the given |
349| | position. The position can be either an integer |
350| | (1 is the first position), the expression ``last()`` |
351| | (for the last position), or a position relative to |
352| | the last position (e.g. ``last()-1``). |
353+-----------------------+------------------------------------------------------+
354
355Predicates (expressions within square brackets) must be preceded by a tag
356name, an asterisk, or another predicate. ``position`` predicates must be
357preceded by a tag name.
358
359Reference
360---------
361
Georg Brandl116aa622007-08-15 14:28:22 +0000362.. _elementtree-functions:
363
364Functions
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200365^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000366
367
Georg Brandl7f01a132009-09-16 15:58:14 +0000368.. function:: Comment(text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000369
Georg Brandlf6945182008-02-01 11:56:49 +0000370 Comment element factory. This factory function creates a special element
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000371 that will be serialized as an XML comment by the standard serializer. The
372 comment string can be either a bytestring or a Unicode string. *text* is a
373 string containing the comment string. Returns an element instance
Georg Brandlf6945182008-02-01 11:56:49 +0000374 representing a comment.
Georg Brandl116aa622007-08-15 14:28:22 +0000375
376
377.. function:: dump(elem)
378
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000379 Writes an element tree or element structure to sys.stdout. This function
380 should be used for debugging only.
Georg Brandl116aa622007-08-15 14:28:22 +0000381
382 The exact output format is implementation dependent. In this version, it's
383 written as an ordinary XML file.
384
385 *elem* is an element tree or an individual element.
386
387
Georg Brandl116aa622007-08-15 14:28:22 +0000388.. function:: fromstring(text)
389
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000390 Parses an XML section from a string constant. Same as :func:`XML`. *text*
391 is a string containing XML data. Returns an :class:`Element` instance.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000392
393
394.. function:: fromstringlist(sequence, parser=None)
395
396 Parses an XML document from a sequence of string fragments. *sequence* is a
397 list or other sequence containing XML data fragments. *parser* is an
398 optional parser instance. If not given, the standard :class:`XMLParser`
399 parser is used. Returns an :class:`Element` instance.
400
Ezio Melottif8754a62010-03-21 07:16:43 +0000401 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000402
403
404.. function:: iselement(element)
405
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000406 Checks if an object appears to be a valid element object. *element* is an
407 element instance. Returns a true value if this is an element object.
Georg Brandl116aa622007-08-15 14:28:22 +0000408
409
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000410.. function:: iterparse(source, events=None, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000411
412 Parses an XML section into an element tree incrementally, and reports what's
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200413 going on to the user. *source* is a filename or :term:`file object`
Eli Benderskyfb625442013-05-19 09:09:24 -0700414 containing XML data. *events* is a sequence of events to report back. The
Eli Bendersky604c4ff2012-03-16 08:41:30 +0200415 supported events are the strings ``"start"``, ``"end"``, ``"start-ns"``
416 and ``"end-ns"`` (the "ns" events are used to get detailed namespace
417 information). If *events* is omitted, only ``"end"`` events are reported.
418 *parser* is an optional parser instance. If not given, the standard
419 :class:`XMLParser` parser is used. Returns an :term:`iterator` providing
420 ``(event, elem)`` pairs.
Georg Brandl116aa622007-08-15 14:28:22 +0000421
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700422 Note that while :func:`iterparse` builds the tree incrementally, it issues
423 blocking reads on *source* (or the file it names). As such, it's unsuitable
424 for asynchronous applications where blocking reads can't be made. For fully
Eli Bendersky10e0af82013-04-20 05:54:29 -0700425 asynchronous parsing, see :class:`IncrementalParser`.
Eli Benderskyab2a76c2013-04-20 05:53:50 -0700426
Benjamin Peterson75edad02009-01-01 15:05:06 +0000427 .. note::
428
429 :func:`iterparse` only guarantees that it has seen the ">"
430 character of a starting tag when it emits a "start" event, so the
431 attributes are defined, but the contents of the text and tail attributes
432 are undefined at that point. The same applies to the element children;
433 they may or may not be present.
434
435 If you need a fully populated element, look for "end" events instead.
436
Georg Brandl7f01a132009-09-16 15:58:14 +0000437.. function:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000438
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000439 Parses an XML section into an element tree. *source* is a filename or file
440 object containing XML data. *parser* is an optional parser instance. If
441 not given, the standard :class:`XMLParser` parser is used. Returns an
442 :class:`ElementTree` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000443
444
Georg Brandl7f01a132009-09-16 15:58:14 +0000445.. function:: ProcessingInstruction(target, text=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000446
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000447 PI element factory. This factory function creates a special element that
448 will be serialized as an XML processing instruction. *target* is a string
449 containing the PI target. *text* is a string containing the PI contents, if
450 given. Returns an element instance, representing a processing instruction.
451
452
453.. function:: register_namespace(prefix, uri)
454
455 Registers a namespace prefix. The registry is global, and any existing
456 mapping for either the given prefix or the namespace URI will be removed.
457 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
458 attributes in this namespace will be serialized with the given prefix, if at
459 all possible.
460
Ezio Melottif8754a62010-03-21 07:16:43 +0000461 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000462
463
Georg Brandl7f01a132009-09-16 15:58:14 +0000464.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000465
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000466 Subelement factory. This function creates an element instance, and appends
467 it to an existing element.
Georg Brandl116aa622007-08-15 14:28:22 +0000468
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000469 The element name, attribute names, and attribute values can be either
470 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
471 the subelement name. *attrib* is an optional dictionary, containing element
472 attributes. *extra* contains additional attributes, given as keyword
473 arguments. Returns an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000474
475
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200476.. function:: tostring(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800477 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000478
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000479 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000480 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000481 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700482 generate a Unicode string (otherwise, a bytestring is generated). *method*
483 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800484 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700485 Returns an (optionally) encoded string containing the XML data.
Georg Brandl116aa622007-08-15 14:28:22 +0000486
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800487 .. versionadded:: 3.4
488 The *short_empty_elements* parameter.
Georg Brandl116aa622007-08-15 14:28:22 +0000489
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800490
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200491.. function:: tostringlist(element, encoding="us-ascii", method="xml", *, \
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800492 short_empty_elements=True)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000493
494 Generates a string representation of an XML element, including all
Florent Xiclunadddd5e92010-03-14 01:28:07 +0000495 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
Florent Xiclunac17f1722010-08-08 19:48:29 +0000496 the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to
Eli Bendersky831893a2012-10-09 07:18:16 -0700497 generate a Unicode string (otherwise, a bytestring is generated). *method*
498 is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800499 *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`.
Eli Bendersky831893a2012-10-09 07:18:16 -0700500 Returns a list of (optionally) encoded strings containing the XML data.
501 It does not guarantee any specific sequence, except that
502 ``"".join(tostringlist(element)) == tostring(element)``.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000503
Ezio Melottif8754a62010-03-21 07:16:43 +0000504 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000505
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800506 .. versionadded:: 3.4
507 The *short_empty_elements* parameter.
508
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000509
510.. function:: XML(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000511
512 Parses an XML section from a string constant. This function can be used to
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000513 embed "XML literals" in Python code. *text* is a string containing XML
514 data. *parser* is an optional parser instance. If not given, the standard
515 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000516
517
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000518.. function:: XMLID(text, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000519
520 Parses an XML section from a string constant, and also returns a dictionary
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000521 which maps from element id:s to elements. *text* is a string containing XML
522 data. *parser* is an optional parser instance. If not given, the standard
523 :class:`XMLParser` parser is used. Returns a tuple containing an
524 :class:`Element` instance and a dictionary.
Georg Brandl116aa622007-08-15 14:28:22 +0000525
526
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000527.. _elementtree-element-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000528
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000529Element Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200530^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000531
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000532.. class:: Element(tag, attrib={}, **extra)
Georg Brandl116aa622007-08-15 14:28:22 +0000533
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000534 Element class. This class defines the Element interface, and provides a
535 reference implementation of this interface.
Georg Brandl116aa622007-08-15 14:28:22 +0000536
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000537 The element name, attribute names, and attribute values can be either
538 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
539 an optional dictionary, containing element attributes. *extra* contains
540 additional attributes, given as keyword arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000541
542
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000543 .. attribute:: tag
Georg Brandl116aa622007-08-15 14:28:22 +0000544
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000545 A string identifying what kind of data this element represents (the
546 element type, in other words).
Georg Brandl116aa622007-08-15 14:28:22 +0000547
548
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000549 .. attribute:: text
Georg Brandl116aa622007-08-15 14:28:22 +0000550
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000551 The *text* attribute can be used to hold additional data associated with
552 the element. As the name implies this attribute is usually a string but
553 may be any application-specific object. If the element is created from
554 an XML file the attribute will contain any text found between the element
555 tags.
Georg Brandl116aa622007-08-15 14:28:22 +0000556
557
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000558 .. attribute:: tail
Georg Brandl116aa622007-08-15 14:28:22 +0000559
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000560 The *tail* attribute can be used to hold additional data associated with
561 the element. This attribute is usually a string but may be any
562 application-specific object. If the element is created from an XML file
563 the attribute will contain any text found after the element's end tag and
564 before the next tag.
Georg Brandl116aa622007-08-15 14:28:22 +0000565
Georg Brandl116aa622007-08-15 14:28:22 +0000566
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000567 .. attribute:: attrib
Georg Brandl116aa622007-08-15 14:28:22 +0000568
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000569 A dictionary containing the element's attributes. Note that while the
570 *attrib* value is always a real mutable Python dictionary, an ElementTree
571 implementation may choose to use another internal representation, and
572 create the dictionary only if someone asks for it. To take advantage of
573 such implementations, use the dictionary methods below whenever possible.
Georg Brandl116aa622007-08-15 14:28:22 +0000574
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000575 The following dictionary-like methods work on the element attributes.
Georg Brandl116aa622007-08-15 14:28:22 +0000576
577
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000578 .. method:: clear()
Georg Brandl116aa622007-08-15 14:28:22 +0000579
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000580 Resets an element. This function removes all subelements, clears all
Eli Bendersky323a43a2012-10-09 06:46:33 -0700581 attributes, and sets the text and tail attributes to ``None``.
Georg Brandl116aa622007-08-15 14:28:22 +0000582
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000583
584 .. method:: get(key, default=None)
585
586 Gets the element attribute named *key*.
587
588 Returns the attribute value, or *default* if the attribute was not found.
589
590
591 .. method:: items()
592
593 Returns the element attributes as a sequence of (name, value) pairs. The
594 attributes are returned in an arbitrary order.
595
596
597 .. method:: keys()
598
599 Returns the elements attribute names as a list. The names are returned
600 in an arbitrary order.
601
602
603 .. method:: set(key, value)
604
605 Set the attribute *key* on the element to *value*.
606
607 The following methods work on the element's children (subelements).
608
609
610 .. method:: append(subelement)
611
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200612 Adds the element *subelement* to the end of this element's internal list
613 of subelements. Raises :exc:`TypeError` if *subelement* is not an
614 :class:`Element`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000615
616
617 .. method:: extend(subelements)
Georg Brandl116aa622007-08-15 14:28:22 +0000618
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000619 Appends *subelements* from a sequence object with zero or more elements.
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200620 Raises :exc:`TypeError` if a subelement is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000621
Ezio Melottif8754a62010-03-21 07:16:43 +0000622 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000623
Georg Brandl116aa622007-08-15 14:28:22 +0000624
Eli Bendersky737b1732012-05-29 06:02:56 +0300625 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000626
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000627 Finds the first subelement matching *match*. *match* may be a tag name
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200628 or a :ref:`path <elementtree-xpath>`. Returns an element instance
Eli Bendersky737b1732012-05-29 06:02:56 +0300629 or ``None``. *namespaces* is an optional mapping from namespace prefix
630 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000631
Georg Brandl116aa622007-08-15 14:28:22 +0000632
Eli Bendersky737b1732012-05-29 06:02:56 +0300633 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000634
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200635 Finds all matching subelements, by tag name or
636 :ref:`path <elementtree-xpath>`. Returns a list containing all matching
Eli Bendersky737b1732012-05-29 06:02:56 +0300637 elements in document order. *namespaces* is an optional mapping from
638 namespace prefix to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000639
Georg Brandl116aa622007-08-15 14:28:22 +0000640
Eli Bendersky737b1732012-05-29 06:02:56 +0300641 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000642
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000643 Finds text for the first subelement matching *match*. *match* may be
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200644 a tag name or a :ref:`path <elementtree-xpath>`. Returns the text content
645 of the first matching element, or *default* if no element was found.
646 Note that if the matching element has no text content an empty string
Eli Bendersky737b1732012-05-29 06:02:56 +0300647 is returned. *namespaces* is an optional mapping from namespace prefix
648 to full name.
Georg Brandl116aa622007-08-15 14:28:22 +0000649
Georg Brandl116aa622007-08-15 14:28:22 +0000650
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000651 .. method:: getchildren()
Georg Brandl116aa622007-08-15 14:28:22 +0000652
Georg Brandl67b21b72010-08-17 15:07:14 +0000653 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000654 Use ``list(elem)`` or iteration.
Georg Brandl116aa622007-08-15 14:28:22 +0000655
Georg Brandl116aa622007-08-15 14:28:22 +0000656
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000657 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000658
Georg Brandl67b21b72010-08-17 15:07:14 +0000659 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000660 Use method :meth:`Element.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
Georg Brandl116aa622007-08-15 14:28:22 +0000662
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200663 .. method:: insert(index, subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000664
Eli Bendersky396e8fc2012-03-23 14:24:20 +0200665 Inserts *subelement* at the given position in this element. Raises
666 :exc:`TypeError` if *subelement* is not an :class:`Element`.
Georg Brandl116aa622007-08-15 14:28:22 +0000667
Georg Brandl116aa622007-08-15 14:28:22 +0000668
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000669 .. method:: iter(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000670
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000671 Creates a tree :term:`iterator` with the current element as the root.
672 The iterator iterates over this element and all elements below it, in
673 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
674 elements whose tag equals *tag* are returned from the iterator. If the
675 tree structure is modified during iteration, the result is undefined.
Georg Brandl116aa622007-08-15 14:28:22 +0000676
Ezio Melotti138fc892011-10-10 00:02:03 +0300677 .. versionadded:: 3.2
678
Georg Brandl116aa622007-08-15 14:28:22 +0000679
Eli Bendersky737b1732012-05-29 06:02:56 +0300680 .. method:: iterfind(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000681
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200682 Finds all matching subelements, by tag name or
683 :ref:`path <elementtree-xpath>`. Returns an iterable yielding all
Eli Bendersky737b1732012-05-29 06:02:56 +0300684 matching elements in document order. *namespaces* is an optional mapping
685 from namespace prefix to full name.
686
Georg Brandl116aa622007-08-15 14:28:22 +0000687
Ezio Melottif8754a62010-03-21 07:16:43 +0000688 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000689
Georg Brandl116aa622007-08-15 14:28:22 +0000690
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000691 .. method:: itertext()
Georg Brandl116aa622007-08-15 14:28:22 +0000692
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000693 Creates a text iterator. The iterator loops over this element and all
694 subelements, in document order, and returns all inner text.
Georg Brandl116aa622007-08-15 14:28:22 +0000695
Ezio Melottif8754a62010-03-21 07:16:43 +0000696 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000697
698
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000699 .. method:: makeelement(tag, attrib)
Georg Brandl116aa622007-08-15 14:28:22 +0000700
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000701 Creates a new element object of the same type as this element. Do not
702 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000703
704
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000705 .. method:: remove(subelement)
Georg Brandl116aa622007-08-15 14:28:22 +0000706
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000707 Removes *subelement* from the element. Unlike the find\* methods this
708 method compares elements based on the instance identity, not on tag value
709 or contents.
Georg Brandl116aa622007-08-15 14:28:22 +0000710
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000711 :class:`Element` objects also support the following sequence type methods
712 for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
713 :meth:`__setitem__`, :meth:`__len__`.
Georg Brandl116aa622007-08-15 14:28:22 +0000714
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000715 Caution: Elements with no subelements will test as ``False``. This behavior
716 will change in future versions. Use specific ``len(elem)`` or ``elem is
717 None`` test instead. ::
Georg Brandl116aa622007-08-15 14:28:22 +0000718
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000719 element = root.find('foo')
Georg Brandl116aa622007-08-15 14:28:22 +0000720
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000721 if not element: # careful!
722 print("element not found, or element has no subelements")
Georg Brandl116aa622007-08-15 14:28:22 +0000723
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000724 if element is None:
725 print("element not found")
Georg Brandl116aa622007-08-15 14:28:22 +0000726
727
728.. _elementtree-elementtree-objects:
729
730ElementTree Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200731^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000732
733
Georg Brandl7f01a132009-09-16 15:58:14 +0000734.. class:: ElementTree(element=None, file=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000735
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000736 ElementTree wrapper class. This class represents an entire element
737 hierarchy, and adds some extra support for serialization to and from
738 standard XML.
Georg Brandl116aa622007-08-15 14:28:22 +0000739
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000740 *element* is the root element. The tree is initialized with the contents
741 of the XML *file* if given.
Georg Brandl116aa622007-08-15 14:28:22 +0000742
743
Benjamin Petersone41251e2008-04-25 01:59:09 +0000744 .. method:: _setroot(element)
Georg Brandl116aa622007-08-15 14:28:22 +0000745
Benjamin Petersone41251e2008-04-25 01:59:09 +0000746 Replaces the root element for this tree. This discards the current
747 contents of the tree, and replaces it with the given element. Use with
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000748 care. *element* is an element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000749
750
Eli Bendersky737b1732012-05-29 06:02:56 +0300751 .. method:: find(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000752
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200753 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000754
755
Eli Bendersky737b1732012-05-29 06:02:56 +0300756 .. method:: findall(match, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000757
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200758 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000759
760
Eli Bendersky737b1732012-05-29 06:02:56 +0300761 .. method:: findtext(match, default=None, namespaces=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000762
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200763 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000764
765
Georg Brandl7f01a132009-09-16 15:58:14 +0000766 .. method:: getiterator(tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000767
Georg Brandl67b21b72010-08-17 15:07:14 +0000768 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000769 Use method :meth:`ElementTree.iter` instead.
Georg Brandl116aa622007-08-15 14:28:22 +0000770
771
Benjamin Petersone41251e2008-04-25 01:59:09 +0000772 .. method:: getroot()
Florent Xiclunac17f1722010-08-08 19:48:29 +0000773
Benjamin Petersone41251e2008-04-25 01:59:09 +0000774 Returns the root element for this tree.
Georg Brandl116aa622007-08-15 14:28:22 +0000775
776
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000777 .. method:: iter(tag=None)
778
779 Creates and returns a tree iterator for the root element. The iterator
780 loops over all elements in this tree, in section order. *tag* is the tag
781 to look for (default is to return all elements)
782
783
Eli Bendersky737b1732012-05-29 06:02:56 +0300784 .. method:: iterfind(match, namespaces=None)
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000785
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200786 Same as :meth:`Element.iterfind`, starting at the root of the tree.
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000787
Ezio Melottif8754a62010-03-21 07:16:43 +0000788 .. versionadded:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000789
790
Georg Brandl7f01a132009-09-16 15:58:14 +0000791 .. method:: parse(source, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000792
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000793 Loads an external XML section into this element tree. *source* is a file
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000794 name or :term:`file object`. *parser* is an optional parser instance.
Eli Bendersky52467b12012-06-01 07:13:08 +0300795 If not given, the standard :class:`XMLParser` parser is used. Returns the
796 section root element.
Georg Brandl116aa622007-08-15 14:28:22 +0000797
798
Eli Benderskyf96cf912012-07-15 06:19:44 +0300799 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
Serhiy Storchaka9e189f02013-01-13 22:24:27 +0200800 default_namespace=None, method="xml", *, \
Eli Benderskye9af8272013-01-13 06:27:51 -0800801 short_empty_elements=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000802
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000803 Writes the element tree to a file, as XML. *file* is a file name, or a
Eli Benderskyf96cf912012-07-15 06:19:44 +0300804 :term:`file object` opened for writing. *encoding* [1]_ is the output
805 encoding (default is US-ASCII).
806 *xml_declaration* controls if an XML declaration should be added to the
807 file. Use ``False`` for never, ``True`` for always, ``None``
808 for only if not US-ASCII or UTF-8 or Unicode (default is ``None``).
Serhiy Storchaka03530b92013-01-13 21:58:04 +0200809 *default_namespace* sets the default XML namespace (for "xmlns").
Eli Benderskyf96cf912012-07-15 06:19:44 +0300810 *method* is either ``"xml"``, ``"html"`` or ``"text"`` (default is
811 ``"xml"``).
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800812 The keyword-only *short_empty_elements* parameter controls the formatting
813 of elements that contain no content. If *True* (the default), they are
814 emitted as a single self-closed tag, otherwise they are emitted as a pair
815 of start/end tags.
Eli Benderskyf96cf912012-07-15 06:19:44 +0300816
817 The output is either a string (:class:`str`) or binary (:class:`bytes`).
818 This is controlled by the *encoding* argument. If *encoding* is
819 ``"unicode"``, the output is a string; otherwise, it's binary. Note that
820 this may conflict with the type of *file* if it's an open
821 :term:`file object`; make sure you do not try to write a string to a
822 binary stream and vice versa.
823
Eli Benderskya9a2ef52013-01-13 06:04:43 -0800824 .. versionadded:: 3.4
825 The *short_empty_elements* parameter.
826
Georg Brandl116aa622007-08-15 14:28:22 +0000827
Christian Heimesd8654cf2007-12-02 15:22:16 +0000828This is the XML file that is going to be manipulated::
829
830 <html>
831 <head>
832 <title>Example page</title>
833 </head>
834 <body>
Georg Brandl48310cd2009-01-03 21:18:54 +0000835 <p>Moved to <a href="http://example.org/">example.org</a>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000836 or <a href="http://example.com/">example.com</a>.</p>
837 </body>
838 </html>
839
840Example of changing the attribute "target" of every link in first paragraph::
841
842 >>> from xml.etree.ElementTree import ElementTree
843 >>> tree = ElementTree()
844 >>> tree.parse("index.xhtml")
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000845 <Element 'html' at 0xb77e6fac>
Christian Heimesd8654cf2007-12-02 15:22:16 +0000846 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
847 >>> p
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000848 <Element 'p' at 0xb77ec26c>
849 >>> links = list(p.iter("a")) # Returns list of all links
Christian Heimesd8654cf2007-12-02 15:22:16 +0000850 >>> links
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000851 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Christian Heimesd8654cf2007-12-02 15:22:16 +0000852 >>> for i in links: # Iterates through all found links
853 ... i.attrib["target"] = "blank"
854 >>> tree.write("output.xhtml")
Georg Brandl116aa622007-08-15 14:28:22 +0000855
856.. _elementtree-qname-objects:
857
858QName Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200859^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000860
861
Georg Brandl7f01a132009-09-16 15:58:14 +0000862.. class:: QName(text_or_uri, tag=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000863
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000864 QName wrapper. This can be used to wrap a QName attribute value, in order
865 to get proper namespace handling on output. *text_or_uri* is a string
866 containing the QName value, in the form {uri}local, or, if the tag argument
867 is given, the URI part of a QName. If *tag* is given, the first argument is
868 interpreted as an URI, and this argument is interpreted as a local name.
869 :class:`QName` instances are opaque.
Georg Brandl116aa622007-08-15 14:28:22 +0000870
871
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200872IncrementalParser Objects
873^^^^^^^^^^^^^^^^^^^^^^^^^
874
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200875.. class:: IncrementalParser(events=None, parser=None)
876
877 An incremental, event-driven parser suitable for non-blocking applications.
Eli Benderskyfb625442013-05-19 09:09:24 -0700878 *events* is a sequence of events to report back. The supported events are
879 the strings ``"start"``, ``"end"``, ``"start-ns"`` and ``"end-ns"`` (the "ns"
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200880 events are used to get detailed namespace information). If *events* is
881 omitted, only ``"end"`` events are reported. *parser* is an optional
882 parser instance. If not given, the standard :class:`XMLParser` parser is
883 used.
884
885 .. method:: data_received(data)
886
887 Feed the given bytes data to the incremental parser.
888
889 .. method:: eof_received()
890
891 Signal the incremental parser that the data stream is terminated.
892
893 .. method:: events()
894
895 Iterate over the events which have been encountered in the data fed
896 to the parser. This method yields ``(event, elem)`` pairs, where
897 *event* is a string representing the type of event (e.g. ``"end"``)
Eli Bendersky3bdead12013-04-20 09:06:27 -0700898 and *elem* is the encountered :class:`Element` object. Events
899 provided in a previous call to :meth:`events` will not be yielded
900 again.
Antoine Pitrou5b235d02013-04-18 19:37:06 +0200901
902 .. note::
903
904 :class:`IncrementalParser` only guarantees that it has seen the ">"
905 character of a starting tag when it emits a "start" event, so the
906 attributes are defined, but the contents of the text and tail attributes
907 are undefined at that point. The same applies to the element children;
908 they may or may not be present.
909
910 If you need a fully populated element, look for "end" events instead.
911
912 .. versionadded:: 3.4
913
914
Georg Brandl116aa622007-08-15 14:28:22 +0000915.. _elementtree-treebuilder-objects:
916
917TreeBuilder Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200918^^^^^^^^^^^^^^^^^^^
Georg Brandl116aa622007-08-15 14:28:22 +0000919
920
Georg Brandl7f01a132009-09-16 15:58:14 +0000921.. class:: TreeBuilder(element_factory=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000922
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000923 Generic element structure builder. This builder converts a sequence of
924 start, data, and end method calls to a well-formed element structure. You
925 can use this class to build an element structure using a custom XML parser,
Eli Bendersky48d358b2012-05-30 17:57:50 +0300926 or a parser for some other XML-like format. *element_factory*, when given,
927 must be a callable accepting two positional arguments: a tag and
928 a dict of attributes. It is expected to return a new element instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000929
Benjamin Petersone41251e2008-04-25 01:59:09 +0000930 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000931
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000932 Flushes the builder buffers, and returns the toplevel document
933 element. Returns an :class:`Element` instance.
Georg Brandl116aa622007-08-15 14:28:22 +0000934
935
Benjamin Petersone41251e2008-04-25 01:59:09 +0000936 .. method:: data(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000937
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000938 Adds text to the current element. *data* is a string. This should be
939 either a bytestring, or a Unicode string.
Georg Brandl116aa622007-08-15 14:28:22 +0000940
941
Benjamin Petersone41251e2008-04-25 01:59:09 +0000942 .. method:: end(tag)
Georg Brandl116aa622007-08-15 14:28:22 +0000943
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000944 Closes the current element. *tag* is the element name. Returns the
945 closed element.
Georg Brandl116aa622007-08-15 14:28:22 +0000946
947
Benjamin Petersone41251e2008-04-25 01:59:09 +0000948 .. method:: start(tag, attrs)
Georg Brandl116aa622007-08-15 14:28:22 +0000949
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000950 Opens a new element. *tag* is the element name. *attrs* is a dictionary
951 containing element attributes. Returns the opened element.
Georg Brandl116aa622007-08-15 14:28:22 +0000952
953
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000954 In addition, a custom :class:`TreeBuilder` object can provide the
955 following method:
Georg Brandl116aa622007-08-15 14:28:22 +0000956
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000957 .. method:: doctype(name, pubid, system)
958
959 Handles a doctype declaration. *name* is the doctype name. *pubid* is
960 the public identifier. *system* is the system identifier. This method
961 does not exist on the default :class:`TreeBuilder` class.
962
Ezio Melottif8754a62010-03-21 07:16:43 +0000963 .. versionadded:: 3.2
Georg Brandl116aa622007-08-15 14:28:22 +0000964
965
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000966.. _elementtree-xmlparser-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000967
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000968XMLParser Objects
Eli Bendersky3a4875e2012-03-26 20:43:32 +0200969^^^^^^^^^^^^^^^^^
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000970
971
972.. class:: XMLParser(html=0, target=None, encoding=None)
973
974 :class:`Element` structure builder for XML source data, based on the expat
975 parser. *html* are predefined HTML entities. This flag is not supported by
976 the current implementation. *target* is the target object. If omitted, the
Eli Bendersky1bf23942012-06-01 07:15:00 +0300977 builder uses an instance of the standard :class:`TreeBuilder` class.
Eli Bendersky52467b12012-06-01 07:13:08 +0300978 *encoding* [1]_ is optional. If given, the value overrides the encoding
979 specified in the XML file.
Georg Brandl116aa622007-08-15 14:28:22 +0000980
981
Benjamin Petersone41251e2008-04-25 01:59:09 +0000982 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +0000983
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000984 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl116aa622007-08-15 14:28:22 +0000985
986
Benjamin Petersone41251e2008-04-25 01:59:09 +0000987 .. method:: doctype(name, pubid, system)
Georg Brandl116aa622007-08-15 14:28:22 +0000988
Georg Brandl67b21b72010-08-17 15:07:14 +0000989 .. deprecated:: 3.2
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000990 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
991 target.
Georg Brandl116aa622007-08-15 14:28:22 +0000992
993
Benjamin Petersone41251e2008-04-25 01:59:09 +0000994 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +0000995
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000996 Feeds data to the parser. *data* is encoded data.
Georg Brandl116aa622007-08-15 14:28:22 +0000997
Florent Xiclunaf15351d2010-03-13 23:24:31 +0000998:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Christian Heimesd8654cf2007-12-02 15:22:16 +0000999for each opening tag, its :meth:`end` method for each closing tag,
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001000and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandl48310cd2009-01-03 21:18:54 +00001001calls *target*\'s method :meth:`close`.
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001002:class:`XMLParser` can be used not only for building a tree structure.
Christian Heimesd8654cf2007-12-02 15:22:16 +00001003This is an example of counting the maximum depth of an XML file::
1004
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001005 >>> from xml.etree.ElementTree import XMLParser
Christian Heimesd8654cf2007-12-02 15:22:16 +00001006 >>> class MaxDepth: # The target object of the parser
1007 ... maxDepth = 0
1008 ... depth = 0
1009 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandl48310cd2009-01-03 21:18:54 +00001010 ... self.depth += 1
Christian Heimesd8654cf2007-12-02 15:22:16 +00001011 ... if self.depth > self.maxDepth:
1012 ... self.maxDepth = self.depth
1013 ... def end(self, tag): # Called for each closing tag.
1014 ... self.depth -= 1
Georg Brandl48310cd2009-01-03 21:18:54 +00001015 ... def data(self, data):
Christian Heimesd8654cf2007-12-02 15:22:16 +00001016 ... pass # We do not need to do anything with data.
1017 ... def close(self): # Called when all data has been parsed.
1018 ... return self.maxDepth
Georg Brandl48310cd2009-01-03 21:18:54 +00001019 ...
Christian Heimesd8654cf2007-12-02 15:22:16 +00001020 >>> target = MaxDepth()
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001021 >>> parser = XMLParser(target=target)
Christian Heimesd8654cf2007-12-02 15:22:16 +00001022 >>> exampleXml = """
1023 ... <a>
1024 ... <b>
1025 ... </b>
1026 ... <b>
1027 ... <c>
1028 ... <d>
1029 ... </d>
1030 ... </c>
1031 ... </b>
1032 ... </a>"""
1033 >>> parser.feed(exampleXml)
1034 >>> parser.close()
1035 4
Christian Heimesb186d002008-03-18 15:15:01 +00001036
Eli Bendersky5b77d812012-03-16 08:20:05 +02001037Exceptions
Eli Bendersky3a4875e2012-03-26 20:43:32 +02001038^^^^^^^^^^
Eli Bendersky5b77d812012-03-16 08:20:05 +02001039
1040.. class:: ParseError
1041
1042 XML parse error, raised by the various parsing methods in this module when
1043 parsing fails. The string representation of an instance of this exception
1044 will contain a user-friendly error message. In addition, it will have
1045 the following attributes available:
1046
1047 .. attribute:: code
1048
1049 A numeric error code from the expat parser. See the documentation of
1050 :mod:`xml.parsers.expat` for the list of error codes and their meanings.
1051
1052 .. attribute:: position
1053
1054 A tuple of *line*, *column* numbers, specifying where the error occurred.
Christian Heimesb186d002008-03-18 15:15:01 +00001055
1056.. rubric:: Footnotes
1057
1058.. [#] The encoding string included in XML output should conform to the
Florent Xiclunaf15351d2010-03-13 23:24:31 +00001059 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1060 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Benjamin Petersonad3d5c22009-02-26 03:38:59 +00001061 and http://www.iana.org/assignments/character-sets.