blob: 9ce781e52ab8660b0dad208594c5d9f1b4860a5b [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
8
9.. versionadded:: 2.5
10
Éric Araujo29a0b572011-08-19 02:14:03 +020011**Source code:** :source:`Lib/xml/etree/ElementTree.py`
12
13--------------
14
Florent Xicluna583302c2010-03-13 17:56:19 +000015The :class:`Element` type is a flexible container object, designed to store
16hierarchical data structures in memory. The type can be described as a cross
17between a list and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +000018
19Each element has a number of properties associated with it:
20
21* a tag which is a string identifying what kind of data this element represents
22 (the element type, in other words).
23
24* a number of attributes, stored in a Python dictionary.
25
26* a text string.
27
28* an optional tail string.
29
30* a number of child elements, stored in a Python sequence
31
Florent Xicluna3e8c1892010-03-11 14:36:19 +000032To create an element instance, use the :class:`Element` constructor or the
33:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000034
35The :class:`ElementTree` class can be used to wrap an element structure, and
36convert it from and to XML.
37
38A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
39
Georg Brandl39bd0592007-12-01 22:42:46 +000040See http://effbot.org/zone/element-index.htm for tutorials and links to other
Florent Xicluna583302c2010-03-13 17:56:19 +000041docs. Fredrik Lundh's page is also the location of the development version of
42the xml.etree.ElementTree.
43
44.. versionchanged:: 2.7
45 The ElementTree API is updated to 1.3. For more information, see
46 `Introducing ElementTree 1.3
47 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
48
Eli Bendersky6ee21872012-08-18 05:40:38 +030049Tutorial
50--------
51
52This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
53short). The goal is to demonstrate some of the building blocks and basic
54concepts of the module.
55
56XML tree and elements
57^^^^^^^^^^^^^^^^^^^^^
58
59XML is an inherently hierarchical data format, and the most natural way to
60represent it is with a tree. ``ET`` has two classes for this purpose -
61:class:`ElementTree` represents the whole XML document as a tree, and
62:class:`Element` represents a single node in this tree. Interactions with
63the whole document (reading and writing to/from files) are usually done
64on the :class:`ElementTree` level. Interactions with a single XML element
65and its sub-elements are done on the :class:`Element` level.
66
67.. _elementtree-parsing-xml:
68
69Parsing XML
70^^^^^^^^^^^
71
72We'll be using the following XML document as the sample data for this section:
73
74.. code-block:: xml
75
76 <?xml version="1.0"?>
77 <data>
78 <country name="Liechtenstein">
79 <rank>1</rank>
80 <year>2008</year>
81 <gdppc>141100</gdppc>
82 <neighbor name="Austria" direction="E"/>
83 <neighbor name="Switzerland" direction="W"/>
84 </country>
85 <country name="Singapore">
86 <rank>4</rank>
87 <year>2011</year>
88 <gdppc>59900</gdppc>
89 <neighbor name="Malaysia" direction="N"/>
90 </country>
91 <country name="Panama">
92 <rank>68</rank>
93 <year>2011</year>
94 <gdppc>13600</gdppc>
95 <neighbor name="Costa Rica" direction="W"/>
96 <neighbor name="Colombia" direction="E"/>
97 </country>
98 </data>
99
100We have a number of ways to import the data. Reading the file from disk::
101
102 import xml.etree.ElementTree as ET
103 tree = ET.parse('country_data.xml')
104 root = tree.getroot()
105
106Reading the data from a string::
107
108 root = ET.fromstring(country_data_as_string)
109
110:func:`fromstring` parses XML from a string directly into an :class:`Element`,
111which is the root element of the parsed tree. Other parsing functions may
112create an :class:`ElementTree`. Check the documentation to be sure.
113
114As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
115
116 >>> root.tag
117 'data'
118 >>> root.attrib
119 {}
120
121It also has children nodes over which we can iterate::
122
123 >>> for child in root:
124 ... print child.tag, child.attrib
125 ...
126 country {'name': 'Liechtenstein'}
127 country {'name': 'Singapore'}
128 country {'name': 'Panama'}
129
130Children are nested, and we can access specific child nodes by index::
131
132 >>> root[0][1].text
133 '2008'
134
135Finding interesting elements
136^^^^^^^^^^^^^^^^^^^^^^^^^^^^
137
138:class:`Element` has some useful methods that help iterate recursively over all
139the sub-tree below it (its children, their children, and so on). For example,
140:meth:`Element.iter`::
141
142 >>> for neighbor in root.iter('neighbor'):
143 ... print neighbor.attrib
144 ...
145 {'name': 'Austria', 'direction': 'E'}
146 {'name': 'Switzerland', 'direction': 'W'}
147 {'name': 'Malaysia', 'direction': 'N'}
148 {'name': 'Costa Rica', 'direction': 'W'}
149 {'name': 'Colombia', 'direction': 'E'}
150
151:meth:`Element.findall` finds only elements with a tag which are direct
152children of the current element. :meth:`Element.find` finds the *first* child
153with a particular tag, and :meth:`Element.text` accesses the element's text
154content. :meth:`Element.get` accesses the element's attributes::
155
156 >>> for country in root.findall('country'):
157 ... rank = country.find('rank').text
158 ... name = country.get('name')
159 ... print name, rank
160 ...
161 Liechtenstein 1
162 Singapore 4
163 Panama 68
164
165More sophisticated specification of which elements to look for is possible by
166using :ref:`XPath <elementtree-xpath>`.
167
168Modifying an XML File
169^^^^^^^^^^^^^^^^^^^^^
170
171:class:`ElementTree` provides a simple way to build XML documents and write them to files.
172The :meth:`ElementTree.write` method serves this purpose.
173
174Once created, an :class:`Element` object may be manipulated by directly changing
175its fields (such as :attr:`Element.text`), adding and modifying attributes
176(:meth:`Element.set` method), as well as adding new children (for example
177with :meth:`Element.append`).
178
179Let's say we want to add one to each country's rank, and add an ``updated``
180attribute to the rank element::
181
182 >>> for rank in root.iter('rank'):
183 ... new_rank = int(rank.text) + 1
184 ... rank.text = str(new_rank)
185 ... rank.set('updated', 'yes')
186 ...
187 >>> tree.write('output.xml')
188
189Our XML now looks like this:
190
191.. code-block:: xml
192
193 <?xml version="1.0"?>
194 <data>
195 <country name="Liechtenstein">
196 <rank updated="yes">2</rank>
197 <year>2008</year>
198 <gdppc>141100</gdppc>
199 <neighbor name="Austria" direction="E"/>
200 <neighbor name="Switzerland" direction="W"/>
201 </country>
202 <country name="Singapore">
203 <rank updated="yes">5</rank>
204 <year>2011</year>
205 <gdppc>59900</gdppc>
206 <neighbor name="Malaysia" direction="N"/>
207 </country>
208 <country name="Panama">
209 <rank updated="yes">69</rank>
210 <year>2011</year>
211 <gdppc>13600</gdppc>
212 <neighbor name="Costa Rica" direction="W"/>
213 <neighbor name="Colombia" direction="E"/>
214 </country>
215 </data>
216
217We can remove elements using :meth:`Element.remove`. Let's say we want to
218remove all countries with a rank higher than 50::
219
220 >>> for country in root.findall('country'):
221 ... rank = int(country.find('rank').text)
222 ... if rank > 50:
223 ... root.remove(country)
224 ...
225 >>> tree.write('output.xml')
226
227Our XML now looks like this:
228
229.. code-block:: xml
230
231 <?xml version="1.0"?>
232 <data>
233 <country name="Liechtenstein">
234 <rank updated="yes">2</rank>
235 <year>2008</year>
236 <gdppc>141100</gdppc>
237 <neighbor name="Austria" direction="E"/>
238 <neighbor name="Switzerland" direction="W"/>
239 </country>
240 <country name="Singapore">
241 <rank updated="yes">5</rank>
242 <year>2011</year>
243 <gdppc>59900</gdppc>
244 <neighbor name="Malaysia" direction="N"/>
245 </country>
246 </data>
247
248Building XML documents
249^^^^^^^^^^^^^^^^^^^^^^
250
251The :func:`SubElement` function also provides a convenient way to create new
252sub-elements for a given element::
253
254 >>> a = ET.Element('a')
255 >>> b = ET.SubElement(a, 'b')
256 >>> c = ET.SubElement(a, 'c')
257 >>> d = ET.SubElement(c, 'd')
258 >>> ET.dump(a)
259 <a><b /><c><d /></c></a>
260
261Additional resources
262^^^^^^^^^^^^^^^^^^^^
263
264See http://effbot.org/zone/element-index.htm for tutorials and links to other
265docs.
266
267.. _elementtree-xpath:
268
269XPath support
270-------------
271
272This module provides limited support for
273`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
274tree. The goal is to support a small subset of the abbreviated syntax; a full
275XPath engine is outside the scope of the module.
276
277Example
278^^^^^^^
279
280Here's an example that demonstrates some of the XPath capabilities of the
281module. We'll be using the ``countrydata`` XML document from the
282:ref:`Parsing XML <elementtree-parsing-xml>` section::
283
284 import xml.etree.ElementTree as ET
285
286 root = ET.fromstring(countrydata)
287
288 # Top-level elements
289 root.findall(".")
290
291 # All 'neighbor' grand-children of 'country' children of the top-level
292 # elements
293 root.findall("./country/neighbor")
294
295 # Nodes with name='Singapore' that have a 'year' child
296 root.findall(".//year/..[@name='Singapore']")
297
298 # 'year' nodes that are children of nodes with name='Singapore'
299 root.findall(".//*[@name='Singapore']/year")
300
301 # All 'neighbor' nodes that are the second child of their parent
302 root.findall(".//neighbor[2]")
303
304Supported XPath syntax
305^^^^^^^^^^^^^^^^^^^^^^
306
307+-----------------------+------------------------------------------------------+
308| Syntax | Meaning |
309+=======================+======================================================+
310| ``tag`` | Selects all child elements with the given tag. |
311| | For example, ``spam`` selects all child elements |
312| | named ``spam``, ``spam/egg`` selects all |
313| | grandchildren named ``egg`` in all children named |
314| | ``spam``. |
315+-----------------------+------------------------------------------------------+
316| ``*`` | Selects all child elements. For example, ``*/egg`` |
317| | selects all grandchildren named ``egg``. |
318+-----------------------+------------------------------------------------------+
319| ``.`` | Selects the current node. This is mostly useful |
320| | at the beginning of the path, to indicate that it's |
321| | a relative path. |
322+-----------------------+------------------------------------------------------+
323| ``//`` | Selects all subelements, on all levels beneath the |
324| | current element. For example, ``.//egg`` selects |
325| | all ``egg`` elements in the entire tree. |
326+-----------------------+------------------------------------------------------+
327| ``..`` | Selects the parent element. |
328+-----------------------+------------------------------------------------------+
329| ``[@attrib]`` | Selects all elements that have the given attribute. |
330+-----------------------+------------------------------------------------------+
331| ``[@attrib='value']`` | Selects all elements for which the given attribute |
332| | has the given value. The value cannot contain |
333| | quotes. |
334+-----------------------+------------------------------------------------------+
335| ``[tag]`` | Selects all elements that have a child named |
336| | ``tag``. Only immediate children are supported. |
337+-----------------------+------------------------------------------------------+
338| ``[position]`` | Selects all elements that are located at the given |
339| | position. The position can be either an integer |
340| | (1 is the first position), the expression ``last()`` |
341| | (for the last position), or a position relative to |
342| | the last position (e.g. ``last()-1``). |
343+-----------------------+------------------------------------------------------+
344
345Predicates (expressions within square brackets) must be preceded by a tag
346name, an asterisk, or another predicate. ``position`` predicates must be
347preceded by a tag name.
348
349Reference
350---------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000351
352.. _elementtree-functions:
353
354Functions
Eli Bendersky6ee21872012-08-18 05:40:38 +0300355^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356
357
Florent Xiclunaa231e452010-03-13 20:30:15 +0000358.. function:: Comment(text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000359
Florent Xicluna583302c2010-03-13 17:56:19 +0000360 Comment element factory. This factory function creates a special element
361 that will be serialized as an XML comment by the standard serializer. The
362 comment string can be either a bytestring or a Unicode string. *text* is a
363 string containing the comment string. Returns an element instance
364 representing a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000365
366
367.. function:: dump(elem)
368
Florent Xicluna583302c2010-03-13 17:56:19 +0000369 Writes an element tree or element structure to sys.stdout. This function
370 should be used for debugging only.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000371
372 The exact output format is implementation dependent. In this version, it's
373 written as an ordinary XML file.
374
375 *elem* is an element tree or an individual element.
376
377
Georg Brandl8ec7f652007-08-15 14:28:01 +0000378.. function:: fromstring(text)
379
Florent Xicluna88db6f42010-03-14 01:22:09 +0000380 Parses an XML section from a string constant. Same as :func:`XML`. *text*
381 is a string containing XML data. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000382
383
Florent Xiclunaa231e452010-03-13 20:30:15 +0000384.. function:: fromstringlist(sequence, parser=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000385
Florent Xicluna583302c2010-03-13 17:56:19 +0000386 Parses an XML document from a sequence of string fragments. *sequence* is a
387 list or other sequence containing XML data fragments. *parser* is an
388 optional parser instance. If not given, the standard :class:`XMLParser`
389 parser is used. Returns an :class:`Element` instance.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000390
391 .. versionadded:: 2.7
392
393
Georg Brandl8ec7f652007-08-15 14:28:01 +0000394.. function:: iselement(element)
395
Florent Xicluna583302c2010-03-13 17:56:19 +0000396 Checks if an object appears to be a valid element object. *element* is an
397 element instance. Returns a true value if this is an element object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000398
399
Florent Xiclunaa231e452010-03-13 20:30:15 +0000400.. function:: iterparse(source, events=None, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000401
402 Parses an XML section into an element tree incrementally, and reports what's
Florent Xicluna583302c2010-03-13 17:56:19 +0000403 going on to the user. *source* is a filename or file object containing XML
404 data. *events* is a list of events to report back. If omitted, only "end"
405 events are reported. *parser* is an optional parser instance. If not
406 given, the standard :class:`XMLParser` parser is used. Returns an
407 :term:`iterator` providing ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000408
Georg Brandlfb222632009-01-01 11:46:51 +0000409 .. note::
410
411 :func:`iterparse` only guarantees that it has seen the ">"
412 character of a starting tag when it emits a "start" event, so the
413 attributes are defined, but the contents of the text and tail attributes
414 are undefined at that point. The same applies to the element children;
415 they may or may not be present.
416
417 If you need a fully populated element, look for "end" events instead.
418
Georg Brandl8ec7f652007-08-15 14:28:01 +0000419
Florent Xiclunaa231e452010-03-13 20:30:15 +0000420.. function:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000421
Florent Xicluna583302c2010-03-13 17:56:19 +0000422 Parses an XML section into an element tree. *source* is a filename or file
423 object containing XML data. *parser* is an optional parser instance. If
424 not given, the standard :class:`XMLParser` parser is used. Returns an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000425 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000426
427
Florent Xiclunaa231e452010-03-13 20:30:15 +0000428.. function:: ProcessingInstruction(target, text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000429
Florent Xicluna583302c2010-03-13 17:56:19 +0000430 PI element factory. This factory function creates a special element that
431 will be serialized as an XML processing instruction. *target* is a string
432 containing the PI target. *text* is a string containing the PI contents, if
433 given. Returns an element instance, representing a processing instruction.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000434
435
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000436.. function:: register_namespace(prefix, uri)
437
Florent Xicluna583302c2010-03-13 17:56:19 +0000438 Registers a namespace prefix. The registry is global, and any existing
439 mapping for either the given prefix or the namespace URI will be removed.
440 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
441 attributes in this namespace will be serialized with the given prefix, if at
442 all possible.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000443
444 .. versionadded:: 2.7
445
446
Florent Xicluna88db6f42010-03-14 01:22:09 +0000447.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000448
Florent Xicluna583302c2010-03-13 17:56:19 +0000449 Subelement factory. This function creates an element instance, and appends
450 it to an existing element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000451
Florent Xicluna583302c2010-03-13 17:56:19 +0000452 The element name, attribute names, and attribute values can be either
453 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
454 the subelement name. *attrib* is an optional dictionary, containing element
455 attributes. *extra* contains additional attributes, given as keyword
456 arguments. Returns an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000457
458
Florent Xicluna88db6f42010-03-14 01:22:09 +0000459.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000460
Florent Xicluna583302c2010-03-13 17:56:19 +0000461 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000462 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
463 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
Florent Xiclunaa231e452010-03-13 20:30:15 +0000464 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
465 containing the XML data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000466
467
Florent Xicluna88db6f42010-03-14 01:22:09 +0000468.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000469
Florent Xicluna583302c2010-03-13 17:56:19 +0000470 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000471 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
472 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
473 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
474 strings containing the XML data. It does not guarantee any specific
475 sequence, except that ``"".join(tostringlist(element)) ==
476 tostring(element)``.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000477
478 .. versionadded:: 2.7
479
480
Florent Xiclunaa231e452010-03-13 20:30:15 +0000481.. function:: XML(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482
483 Parses an XML section from a string constant. This function can be used to
Florent Xicluna583302c2010-03-13 17:56:19 +0000484 embed "XML literals" in Python code. *text* is a string containing XML
485 data. *parser* is an optional parser instance. If not given, the standard
486 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000487
488
Florent Xiclunaa231e452010-03-13 20:30:15 +0000489.. function:: XMLID(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000490
491 Parses an XML section from a string constant, and also returns a dictionary
Florent Xicluna583302c2010-03-13 17:56:19 +0000492 which maps from element id:s to elements. *text* is a string containing XML
493 data. *parser* is an optional parser instance. If not given, the standard
494 :class:`XMLParser` parser is used. Returns a tuple containing an
495 :class:`Element` instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000496
497
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000498.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000499
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000500Element Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300501^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000502
Florent Xiclunaa231e452010-03-13 20:30:15 +0000503.. class:: Element(tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000504
Florent Xicluna583302c2010-03-13 17:56:19 +0000505 Element class. This class defines the Element interface, and provides a
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000506 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000507
Florent Xicluna583302c2010-03-13 17:56:19 +0000508 The element name, attribute names, and attribute values can be either
509 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
510 an optional dictionary, containing element attributes. *extra* contains
511 additional attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000512
513
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000514 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000515
Florent Xicluna583302c2010-03-13 17:56:19 +0000516 A string identifying what kind of data this element represents (the
517 element type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000518
519
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000520 .. attribute:: text
Georg Brandl8ec7f652007-08-15 14:28:01 +0000521
Florent Xicluna583302c2010-03-13 17:56:19 +0000522 The *text* attribute can be used to hold additional data associated with
523 the element. As the name implies this attribute is usually a string but
524 may be any application-specific object. If the element is created from
525 an XML file the attribute will contain any text found between the element
526 tags.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000527
528
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000529 .. attribute:: tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000530
Florent Xicluna583302c2010-03-13 17:56:19 +0000531 The *tail* attribute can be used to hold additional data associated with
532 the element. This attribute is usually a string but may be any
533 application-specific object. If the element is created from an XML file
534 the attribute will contain any text found after the element's end tag and
535 before the next tag.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000536
Georg Brandl8ec7f652007-08-15 14:28:01 +0000537
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000538 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000539
Florent Xicluna583302c2010-03-13 17:56:19 +0000540 A dictionary containing the element's attributes. Note that while the
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000541 *attrib* value is always a real mutable Python dictionary, an ElementTree
Florent Xicluna583302c2010-03-13 17:56:19 +0000542 implementation may choose to use another internal representation, and
543 create the dictionary only if someone asks for it. To take advantage of
544 such implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000545
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000546 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000547
548
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000549 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000550
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000551 Resets an element. This function removes all subelements, clears all
552 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000553
Georg Brandl8ec7f652007-08-15 14:28:01 +0000554
Florent Xiclunaa231e452010-03-13 20:30:15 +0000555 .. method:: get(key, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000556
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000557 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000558
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000559 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000560
561
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000562 .. method:: items()
563
Florent Xicluna583302c2010-03-13 17:56:19 +0000564 Returns the element attributes as a sequence of (name, value) pairs. The
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000565 attributes are returned in an arbitrary order.
566
567
568 .. method:: keys()
569
Florent Xicluna583302c2010-03-13 17:56:19 +0000570 Returns the elements attribute names as a list. The names are returned
571 in an arbitrary order.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000572
573
574 .. method:: set(key, value)
575
576 Set the attribute *key* on the element to *value*.
577
578 The following methods work on the element's children (subelements).
579
580
581 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000582
Florent Xicluna583302c2010-03-13 17:56:19 +0000583 Adds the element *subelement* to the end of this elements internal list
584 of subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000585
586
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000587 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000588
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000589 Appends *subelements* from a sequence object with zero or more elements.
590 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000591
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000592 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000593
594
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000595 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000596
Florent Xicluna583302c2010-03-13 17:56:19 +0000597 Finds the first subelement matching *match*. *match* may be a tag name
598 or path. Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000599
600
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000601 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000602
Florent Xicluna583302c2010-03-13 17:56:19 +0000603 Finds all matching subelements, by tag name or path. Returns a list
604 containing all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000605
606
Florent Xiclunaa231e452010-03-13 20:30:15 +0000607 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000608
Florent Xicluna583302c2010-03-13 17:56:19 +0000609 Finds text for the first subelement matching *match*. *match* may be
610 a tag name or path. Returns the text content of the first matching
611 element, or *default* if no element was found. Note that if the matching
612 element has no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000613
614
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000615 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000616
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000617 .. deprecated:: 2.7
618 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000619
620
Florent Xiclunaa231e452010-03-13 20:30:15 +0000621 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000622
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000623 .. deprecated:: 2.7
624 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000625
626
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000627 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000628
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000629 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000630
631
Florent Xiclunaa231e452010-03-13 20:30:15 +0000632 .. method:: iter(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000633
Florent Xicluna583302c2010-03-13 17:56:19 +0000634 Creates a tree :term:`iterator` with the current element as the root.
635 The iterator iterates over this element and all elements below it, in
636 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
637 elements whose tag equals *tag* are returned from the iterator. If the
638 tree structure is modified during iteration, the result is undefined.
639
Ezio Melottic54d97b2011-10-09 23:56:51 +0300640 .. versionadded:: 2.7
641
Florent Xicluna583302c2010-03-13 17:56:19 +0000642
643 .. method:: iterfind(match)
644
645 Finds all matching subelements, by tag name or path. Returns an iterable
646 yielding all matching elements in document order.
647
648 .. versionadded:: 2.7
649
650
651 .. method:: itertext()
652
653 Creates a text iterator. The iterator loops over this element and all
654 subelements, in document order, and returns all inner text.
655
656 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000657
658
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000659 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000660
Florent Xicluna583302c2010-03-13 17:56:19 +0000661 Creates a new element object of the same type as this element. Do not
662 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000663
664
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000665 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000666
Florent Xicluna583302c2010-03-13 17:56:19 +0000667 Removes *subelement* from the element. Unlike the find\* methods this
668 method compares elements based on the instance identity, not on tag value
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000669 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000670
Florent Xicluna583302c2010-03-13 17:56:19 +0000671 :class:`Element` objects also support the following sequence type methods
672 for working with subelements: :meth:`__delitem__`, :meth:`__getitem__`,
673 :meth:`__setitem__`, :meth:`__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000674
Florent Xicluna583302c2010-03-13 17:56:19 +0000675 Caution: Elements with no subelements will test as ``False``. This behavior
676 will change in future versions. Use specific ``len(elem)`` or ``elem is
677 None`` test instead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000678
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000679 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000680
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000681 if not element: # careful!
682 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000683
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000684 if element is None:
685 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000686
687
688.. _elementtree-elementtree-objects:
689
690ElementTree Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300691^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000692
693
Florent Xiclunaa231e452010-03-13 20:30:15 +0000694.. class:: ElementTree(element=None, file=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000695
Florent Xicluna583302c2010-03-13 17:56:19 +0000696 ElementTree wrapper class. This class represents an entire element
697 hierarchy, and adds some extra support for serialization to and from
698 standard XML.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000699
Florent Xicluna583302c2010-03-13 17:56:19 +0000700 *element* is the root element. The tree is initialized with the contents
701 of the XML *file* if given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000702
703
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000704 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000705
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000706 Replaces the root element for this tree. This discards the current
707 contents of the tree, and replaces it with the given element. Use with
Florent Xicluna583302c2010-03-13 17:56:19 +0000708 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000709
710
Florent Xicluna583302c2010-03-13 17:56:19 +0000711 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000712
Florent Xicluna583302c2010-03-13 17:56:19 +0000713 Finds the first toplevel element matching *match*. *match* may be a tag
714 name or path. Same as getroot().find(match). Returns the first matching
715 element, or ``None`` if no element was found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000716
717
Florent Xicluna583302c2010-03-13 17:56:19 +0000718 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000719
Florent Xicluna583302c2010-03-13 17:56:19 +0000720 Finds all matching subelements, by tag name or path. Same as
721 getroot().findall(match). *match* may be a tag name or path. Returns a
722 list containing all matching elements, in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000723
724
Florent Xiclunaa231e452010-03-13 20:30:15 +0000725 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000726
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000727 Finds the element text for the first toplevel element with given tag.
Florent Xicluna583302c2010-03-13 17:56:19 +0000728 Same as getroot().findtext(match). *match* may be a tag name or path.
729 *default* is the value to return if the element was not found. Returns
730 the text content of the first matching element, or the default value no
731 element was found. Note that if the element is found, but has no text
732 content, this method returns an empty string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000733
734
Florent Xiclunaa231e452010-03-13 20:30:15 +0000735 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000736
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000737 .. deprecated:: 2.7
738 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000739
740
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000741 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000742
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000743 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000744
745
Florent Xiclunaa231e452010-03-13 20:30:15 +0000746 .. method:: iter(tag=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000747
748 Creates and returns a tree iterator for the root element. The iterator
Florent Xicluna583302c2010-03-13 17:56:19 +0000749 loops over all elements in this tree, in section order. *tag* is the tag
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000750 to look for (default is to return all elements)
751
752
Florent Xicluna583302c2010-03-13 17:56:19 +0000753 .. method:: iterfind(match)
754
755 Finds all matching subelements, by tag name or path. Same as
756 getroot().iterfind(match). Returns an iterable yielding all matching
757 elements in document order.
758
759 .. versionadded:: 2.7
760
761
Florent Xiclunaa231e452010-03-13 20:30:15 +0000762 .. method:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000763
Florent Xicluna583302c2010-03-13 17:56:19 +0000764 Loads an external XML section into this element tree. *source* is a file
765 name or file object. *parser* is an optional parser instance. If not
766 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000767 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000768
769
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200770 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
771 default_namespace=None, method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000772
Florent Xicluna583302c2010-03-13 17:56:19 +0000773 Writes the element tree to a file, as XML. *file* is a file name, or a
774 file object opened for writing. *encoding* [1]_ is the output encoding
775 (default is US-ASCII). *xml_declaration* controls if an XML declaration
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000776 should be added to the file. Use False for never, True for always, None
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200777 for only if not US-ASCII or UTF-8 (default is None). *default_namespace*
778 sets the default XML namespace (for "xmlns"). *method* is either
Florent Xiclunaa231e452010-03-13 20:30:15 +0000779 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
780 encoded string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000781
Georg Brandl39bd0592007-12-01 22:42:46 +0000782This is the XML file that is going to be manipulated::
783
784 <html>
785 <head>
786 <title>Example page</title>
787 </head>
788 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000789 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000790 or <a href="http://example.com/">example.com</a>.</p>
791 </body>
792 </html>
793
794Example of changing the attribute "target" of every link in first paragraph::
795
796 >>> from xml.etree.ElementTree import ElementTree
797 >>> tree = ElementTree()
798 >>> tree.parse("index.xhtml")
Florent Xicluna583302c2010-03-13 17:56:19 +0000799 <Element 'html' at 0xb77e6fac>
Georg Brandl39bd0592007-12-01 22:42:46 +0000800 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
801 >>> p
Florent Xicluna583302c2010-03-13 17:56:19 +0000802 <Element 'p' at 0xb77ec26c>
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000803 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000804 >>> links
Florent Xicluna583302c2010-03-13 17:56:19 +0000805 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000806 >>> for i in links: # Iterates through all found links
807 ... i.attrib["target"] = "blank"
808 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000809
810.. _elementtree-qname-objects:
811
812QName Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300813^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000814
815
Florent Xiclunaa231e452010-03-13 20:30:15 +0000816.. class:: QName(text_or_uri, tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000817
Florent Xicluna583302c2010-03-13 17:56:19 +0000818 QName wrapper. This can be used to wrap a QName attribute value, in order
819 to get proper namespace handling on output. *text_or_uri* is a string
820 containing the QName value, in the form {uri}local, or, if the tag argument
821 is given, the URI part of a QName. If *tag* is given, the first argument is
822 interpreted as an URI, and this argument is interpreted as a local name.
823 :class:`QName` instances are opaque.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000824
825
826.. _elementtree-treebuilder-objects:
827
828TreeBuilder Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300829^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000830
831
Florent Xiclunaa231e452010-03-13 20:30:15 +0000832.. class:: TreeBuilder(element_factory=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000833
Florent Xicluna583302c2010-03-13 17:56:19 +0000834 Generic element structure builder. This builder converts a sequence of
835 start, data, and end method calls to a well-formed element structure. You
836 can use this class to build an element structure using a custom XML parser,
837 or a parser for some other XML-like format. The *element_factory* is called
838 to create new :class:`Element` instances when given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000839
840
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000841 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000842
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000843 Flushes the builder buffers, and returns the toplevel document
Florent Xicluna583302c2010-03-13 17:56:19 +0000844 element. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000845
846
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000847 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000848
Florent Xicluna583302c2010-03-13 17:56:19 +0000849 Adds text to the current element. *data* is a string. This should be
850 either a bytestring, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000851
852
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000853 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000854
Florent Xicluna583302c2010-03-13 17:56:19 +0000855 Closes the current element. *tag* is the element name. Returns the
856 closed element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000857
858
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000859 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000860
Florent Xicluna583302c2010-03-13 17:56:19 +0000861 Opens a new element. *tag* is the element name. *attrs* is a dictionary
862 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000863
864
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000865 In addition, a custom :class:`TreeBuilder` object can provide the
866 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000867
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000868 .. method:: doctype(name, pubid, system)
869
Florent Xicluna583302c2010-03-13 17:56:19 +0000870 Handles a doctype declaration. *name* is the doctype name. *pubid* is
871 the public identifier. *system* is the system identifier. This method
872 does not exist on the default :class:`TreeBuilder` class.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000873
874 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000875
876
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000877.. _elementtree-xmlparser-objects:
878
879XMLParser Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300880^^^^^^^^^^^^^^^^^
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000881
882
Florent Xiclunaa231e452010-03-13 20:30:15 +0000883.. class:: XMLParser(html=0, target=None, encoding=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000884
Florent Xicluna583302c2010-03-13 17:56:19 +0000885 :class:`Element` structure builder for XML source data, based on the expat
886 parser. *html* are predefined HTML entities. This flag is not supported by
887 the current implementation. *target* is the target object. If omitted, the
888 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
889 is optional. If given, the value overrides the encoding specified in the
890 XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000891
892
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000893 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000894
Florent Xicluna583302c2010-03-13 17:56:19 +0000895 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000896
897
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000898 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000899
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000900 .. deprecated:: 2.7
901 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
902 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000903
904
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000905 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000906
Florent Xicluna583302c2010-03-13 17:56:19 +0000907 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000908
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000909:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000910for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna583302c2010-03-13 17:56:19 +0000911and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000912calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000913:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000914This is an example of counting the maximum depth of an XML file::
915
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000916 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000917 >>> class MaxDepth: # The target object of the parser
918 ... maxDepth = 0
919 ... depth = 0
920 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000921 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +0000922 ... if self.depth > self.maxDepth:
923 ... self.maxDepth = self.depth
924 ... def end(self, tag): # Called for each closing tag.
925 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000926 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +0000927 ... pass # We do not need to do anything with data.
928 ... def close(self): # Called when all data has been parsed.
929 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000930 ...
Georg Brandl39bd0592007-12-01 22:42:46 +0000931 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000932 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +0000933 >>> exampleXml = """
934 ... <a>
935 ... <b>
936 ... </b>
937 ... <b>
938 ... <c>
939 ... <d>
940 ... </d>
941 ... </c>
942 ... </b>
943 ... </a>"""
944 >>> parser.feed(exampleXml)
945 >>> parser.close()
946 4
Mark Summerfield43da35d2008-03-17 08:28:15 +0000947
948
949.. rubric:: Footnotes
950
951.. [#] The encoding string included in XML output should conform to the
Florent Xicluna583302c2010-03-13 17:56:19 +0000952 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
953 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl8b8c2df2009-02-20 08:45:47 +0000954 and http://www.iana.org/assignments/character-sets.