blob: 3afabb43c222eca7a4af4a0e3cf7a28c4bd0e2ee [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
8
9.. versionadded:: 2.5
10
Éric Araujo29a0b572011-08-19 02:14:03 +020011**Source code:** :source:`Lib/xml/etree/ElementTree.py`
12
13--------------
14
Florent Xicluna583302c2010-03-13 17:56:19 +000015The :class:`Element` type is a flexible container object, designed to store
16hierarchical data structures in memory. The type can be described as a cross
17between a list and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +000018
Christian Heimes23790b42013-03-26 17:53:05 +010019
20.. warning::
21
22 The :mod:`xml.etree.ElementTree` module is not secure against
23 maliciously constructed data. If you need to parse untrusted or
24 unauthenticated data see :ref:`xml-vulnerabilities`.
25
26
Georg Brandl8ec7f652007-08-15 14:28:01 +000027Each element has a number of properties associated with it:
28
29* a tag which is a string identifying what kind of data this element represents
30 (the element type, in other words).
31
32* a number of attributes, stored in a Python dictionary.
33
34* a text string.
35
36* an optional tail string.
37
38* a number of child elements, stored in a Python sequence
39
Florent Xicluna3e8c1892010-03-11 14:36:19 +000040To create an element instance, use the :class:`Element` constructor or the
41:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43The :class:`ElementTree` class can be used to wrap an element structure, and
44convert it from and to XML.
45
46A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
47
Georg Brandl39bd0592007-12-01 22:42:46 +000048See http://effbot.org/zone/element-index.htm for tutorials and links to other
Florent Xicluna583302c2010-03-13 17:56:19 +000049docs. Fredrik Lundh's page is also the location of the development version of
50the xml.etree.ElementTree.
51
52.. versionchanged:: 2.7
53 The ElementTree API is updated to 1.3. For more information, see
54 `Introducing ElementTree 1.3
55 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
56
Eli Bendersky6ee21872012-08-18 05:40:38 +030057Tutorial
58--------
59
60This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
61short). The goal is to demonstrate some of the building blocks and basic
62concepts of the module.
63
64XML tree and elements
65^^^^^^^^^^^^^^^^^^^^^
66
67XML is an inherently hierarchical data format, and the most natural way to
68represent it is with a tree. ``ET`` has two classes for this purpose -
69:class:`ElementTree` represents the whole XML document as a tree, and
70:class:`Element` represents a single node in this tree. Interactions with
71the whole document (reading and writing to/from files) are usually done
72on the :class:`ElementTree` level. Interactions with a single XML element
73and its sub-elements are done on the :class:`Element` level.
74
75.. _elementtree-parsing-xml:
76
77Parsing XML
78^^^^^^^^^^^
79
80We'll be using the following XML document as the sample data for this section:
81
82.. code-block:: xml
83
84 <?xml version="1.0"?>
85 <data>
86 <country name="Liechtenstein">
87 <rank>1</rank>
88 <year>2008</year>
89 <gdppc>141100</gdppc>
90 <neighbor name="Austria" direction="E"/>
91 <neighbor name="Switzerland" direction="W"/>
92 </country>
93 <country name="Singapore">
94 <rank>4</rank>
95 <year>2011</year>
96 <gdppc>59900</gdppc>
97 <neighbor name="Malaysia" direction="N"/>
98 </country>
99 <country name="Panama">
100 <rank>68</rank>
101 <year>2011</year>
102 <gdppc>13600</gdppc>
103 <neighbor name="Costa Rica" direction="W"/>
104 <neighbor name="Colombia" direction="E"/>
105 </country>
106 </data>
107
108We have a number of ways to import the data. Reading the file from disk::
109
110 import xml.etree.ElementTree as ET
111 tree = ET.parse('country_data.xml')
112 root = tree.getroot()
113
114Reading the data from a string::
115
116 root = ET.fromstring(country_data_as_string)
117
118:func:`fromstring` parses XML from a string directly into an :class:`Element`,
119which is the root element of the parsed tree. Other parsing functions may
120create an :class:`ElementTree`. Check the documentation to be sure.
121
122As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
123
124 >>> root.tag
125 'data'
126 >>> root.attrib
127 {}
128
129It also has children nodes over which we can iterate::
130
131 >>> for child in root:
132 ... print child.tag, child.attrib
133 ...
134 country {'name': 'Liechtenstein'}
135 country {'name': 'Singapore'}
136 country {'name': 'Panama'}
137
138Children are nested, and we can access specific child nodes by index::
139
140 >>> root[0][1].text
141 '2008'
142
143Finding interesting elements
144^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145
146:class:`Element` has some useful methods that help iterate recursively over all
147the sub-tree below it (its children, their children, and so on). For example,
148:meth:`Element.iter`::
149
150 >>> for neighbor in root.iter('neighbor'):
151 ... print neighbor.attrib
152 ...
153 {'name': 'Austria', 'direction': 'E'}
154 {'name': 'Switzerland', 'direction': 'W'}
155 {'name': 'Malaysia', 'direction': 'N'}
156 {'name': 'Costa Rica', 'direction': 'W'}
157 {'name': 'Colombia', 'direction': 'E'}
158
159:meth:`Element.findall` finds only elements with a tag which are direct
160children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandl5e7f16e2013-10-06 09:23:03 +0200161with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky6ee21872012-08-18 05:40:38 +0300162content. :meth:`Element.get` accesses the element's attributes::
163
164 >>> for country in root.findall('country'):
165 ... rank = country.find('rank').text
166 ... name = country.get('name')
167 ... print name, rank
168 ...
169 Liechtenstein 1
170 Singapore 4
171 Panama 68
172
173More sophisticated specification of which elements to look for is possible by
174using :ref:`XPath <elementtree-xpath>`.
175
176Modifying an XML File
177^^^^^^^^^^^^^^^^^^^^^
178
179:class:`ElementTree` provides a simple way to build XML documents and write them to files.
180The :meth:`ElementTree.write` method serves this purpose.
181
182Once created, an :class:`Element` object may be manipulated by directly changing
183its fields (such as :attr:`Element.text`), adding and modifying attributes
184(:meth:`Element.set` method), as well as adding new children (for example
185with :meth:`Element.append`).
186
187Let's say we want to add one to each country's rank, and add an ``updated``
188attribute to the rank element::
189
190 >>> for rank in root.iter('rank'):
191 ... new_rank = int(rank.text) + 1
192 ... rank.text = str(new_rank)
193 ... rank.set('updated', 'yes')
194 ...
195 >>> tree.write('output.xml')
196
197Our XML now looks like this:
198
199.. code-block:: xml
200
201 <?xml version="1.0"?>
202 <data>
203 <country name="Liechtenstein">
204 <rank updated="yes">2</rank>
205 <year>2008</year>
206 <gdppc>141100</gdppc>
207 <neighbor name="Austria" direction="E"/>
208 <neighbor name="Switzerland" direction="W"/>
209 </country>
210 <country name="Singapore">
211 <rank updated="yes">5</rank>
212 <year>2011</year>
213 <gdppc>59900</gdppc>
214 <neighbor name="Malaysia" direction="N"/>
215 </country>
216 <country name="Panama">
217 <rank updated="yes">69</rank>
218 <year>2011</year>
219 <gdppc>13600</gdppc>
220 <neighbor name="Costa Rica" direction="W"/>
221 <neighbor name="Colombia" direction="E"/>
222 </country>
223 </data>
224
225We can remove elements using :meth:`Element.remove`. Let's say we want to
226remove all countries with a rank higher than 50::
227
228 >>> for country in root.findall('country'):
229 ... rank = int(country.find('rank').text)
230 ... if rank > 50:
231 ... root.remove(country)
232 ...
233 >>> tree.write('output.xml')
234
235Our XML now looks like this:
236
237.. code-block:: xml
238
239 <?xml version="1.0"?>
240 <data>
241 <country name="Liechtenstein">
242 <rank updated="yes">2</rank>
243 <year>2008</year>
244 <gdppc>141100</gdppc>
245 <neighbor name="Austria" direction="E"/>
246 <neighbor name="Switzerland" direction="W"/>
247 </country>
248 <country name="Singapore">
249 <rank updated="yes">5</rank>
250 <year>2011</year>
251 <gdppc>59900</gdppc>
252 <neighbor name="Malaysia" direction="N"/>
253 </country>
254 </data>
255
256Building XML documents
257^^^^^^^^^^^^^^^^^^^^^^
258
259The :func:`SubElement` function also provides a convenient way to create new
260sub-elements for a given element::
261
262 >>> a = ET.Element('a')
263 >>> b = ET.SubElement(a, 'b')
264 >>> c = ET.SubElement(a, 'c')
265 >>> d = ET.SubElement(c, 'd')
266 >>> ET.dump(a)
267 <a><b /><c><d /></c></a>
268
269Additional resources
270^^^^^^^^^^^^^^^^^^^^
271
272See http://effbot.org/zone/element-index.htm for tutorials and links to other
273docs.
274
275.. _elementtree-xpath:
276
277XPath support
278-------------
279
280This module provides limited support for
281`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
282tree. The goal is to support a small subset of the abbreviated syntax; a full
283XPath engine is outside the scope of the module.
284
285Example
286^^^^^^^
287
288Here's an example that demonstrates some of the XPath capabilities of the
289module. We'll be using the ``countrydata`` XML document from the
290:ref:`Parsing XML <elementtree-parsing-xml>` section::
291
292 import xml.etree.ElementTree as ET
293
294 root = ET.fromstring(countrydata)
295
296 # Top-level elements
297 root.findall(".")
298
299 # All 'neighbor' grand-children of 'country' children of the top-level
300 # elements
301 root.findall("./country/neighbor")
302
303 # Nodes with name='Singapore' that have a 'year' child
304 root.findall(".//year/..[@name='Singapore']")
305
306 # 'year' nodes that are children of nodes with name='Singapore'
307 root.findall(".//*[@name='Singapore']/year")
308
309 # All 'neighbor' nodes that are the second child of their parent
310 root.findall(".//neighbor[2]")
311
312Supported XPath syntax
313^^^^^^^^^^^^^^^^^^^^^^
314
Georg Brandl44ea77b2013-03-28 13:28:44 +0100315.. tabularcolumns:: |l|L|
316
Eli Bendersky6ee21872012-08-18 05:40:38 +0300317+-----------------------+------------------------------------------------------+
318| Syntax | Meaning |
319+=======================+======================================================+
320| ``tag`` | Selects all child elements with the given tag. |
321| | For example, ``spam`` selects all child elements |
Raymond Hettinger37083492014-03-29 11:49:11 -0700322| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky6ee21872012-08-18 05:40:38 +0300323| | grandchildren named ``egg`` in all children named |
324| | ``spam``. |
325+-----------------------+------------------------------------------------------+
326| ``*`` | Selects all child elements. For example, ``*/egg`` |
327| | selects all grandchildren named ``egg``. |
328+-----------------------+------------------------------------------------------+
329| ``.`` | Selects the current node. This is mostly useful |
330| | at the beginning of the path, to indicate that it's |
331| | a relative path. |
332+-----------------------+------------------------------------------------------+
333| ``//`` | Selects all subelements, on all levels beneath the |
334| | current element. For example, ``.//egg`` selects |
335| | all ``egg`` elements in the entire tree. |
336+-----------------------+------------------------------------------------------+
337| ``..`` | Selects the parent element. |
338+-----------------------+------------------------------------------------------+
339| ``[@attrib]`` | Selects all elements that have the given attribute. |
340+-----------------------+------------------------------------------------------+
341| ``[@attrib='value']`` | Selects all elements for which the given attribute |
342| | has the given value. The value cannot contain |
343| | quotes. |
344+-----------------------+------------------------------------------------------+
345| ``[tag]`` | Selects all elements that have a child named |
346| | ``tag``. Only immediate children are supported. |
347+-----------------------+------------------------------------------------------+
348| ``[position]`` | Selects all elements that are located at the given |
349| | position. The position can be either an integer |
350| | (1 is the first position), the expression ``last()`` |
351| | (for the last position), or a position relative to |
352| | the last position (e.g. ``last()-1``). |
353+-----------------------+------------------------------------------------------+
354
355Predicates (expressions within square brackets) must be preceded by a tag
356name, an asterisk, or another predicate. ``position`` predicates must be
357preceded by a tag name.
358
359Reference
360---------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000361
362.. _elementtree-functions:
363
364Functions
Eli Bendersky6ee21872012-08-18 05:40:38 +0300365^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000366
367
Florent Xiclunaa231e452010-03-13 20:30:15 +0000368.. function:: Comment(text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000369
Florent Xicluna583302c2010-03-13 17:56:19 +0000370 Comment element factory. This factory function creates a special element
371 that will be serialized as an XML comment by the standard serializer. The
372 comment string can be either a bytestring or a Unicode string. *text* is a
373 string containing the comment string. Returns an element instance
374 representing a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000375
376
377.. function:: dump(elem)
378
Florent Xicluna583302c2010-03-13 17:56:19 +0000379 Writes an element tree or element structure to sys.stdout. This function
380 should be used for debugging only.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000381
382 The exact output format is implementation dependent. In this version, it's
383 written as an ordinary XML file.
384
385 *elem* is an element tree or an individual element.
386
387
Georg Brandl8ec7f652007-08-15 14:28:01 +0000388.. function:: fromstring(text)
389
Florent Xicluna88db6f42010-03-14 01:22:09 +0000390 Parses an XML section from a string constant. Same as :func:`XML`. *text*
391 is a string containing XML data. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000392
393
Florent Xiclunaa231e452010-03-13 20:30:15 +0000394.. function:: fromstringlist(sequence, parser=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000395
Florent Xicluna583302c2010-03-13 17:56:19 +0000396 Parses an XML document from a sequence of string fragments. *sequence* is a
397 list or other sequence containing XML data fragments. *parser* is an
398 optional parser instance. If not given, the standard :class:`XMLParser`
399 parser is used. Returns an :class:`Element` instance.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000400
401 .. versionadded:: 2.7
402
403
Georg Brandl8ec7f652007-08-15 14:28:01 +0000404.. function:: iselement(element)
405
Florent Xicluna583302c2010-03-13 17:56:19 +0000406 Checks if an object appears to be a valid element object. *element* is an
407 element instance. Returns a true value if this is an element object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000408
409
Florent Xiclunaa231e452010-03-13 20:30:15 +0000410.. function:: iterparse(source, events=None, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000411
412 Parses an XML section into an element tree incrementally, and reports what's
Florent Xicluna583302c2010-03-13 17:56:19 +0000413 going on to the user. *source* is a filename or file object containing XML
414 data. *events* is a list of events to report back. If omitted, only "end"
415 events are reported. *parser* is an optional parser instance. If not
Eli Benderskyf4fbf242013-01-24 07:28:33 -0800416 given, the standard :class:`XMLParser` parser is used. *parser* is not
417 supported by ``cElementTree``. Returns an :term:`iterator` providing
418 ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000419
Georg Brandlfb222632009-01-01 11:46:51 +0000420 .. note::
421
422 :func:`iterparse` only guarantees that it has seen the ">"
423 character of a starting tag when it emits a "start" event, so the
424 attributes are defined, but the contents of the text and tail attributes
425 are undefined at that point. The same applies to the element children;
426 they may or may not be present.
427
428 If you need a fully populated element, look for "end" events instead.
429
Georg Brandl8ec7f652007-08-15 14:28:01 +0000430
Florent Xiclunaa231e452010-03-13 20:30:15 +0000431.. function:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432
Florent Xicluna583302c2010-03-13 17:56:19 +0000433 Parses an XML section into an element tree. *source* is a filename or file
434 object containing XML data. *parser* is an optional parser instance. If
435 not given, the standard :class:`XMLParser` parser is used. Returns an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000436 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000437
438
Florent Xiclunaa231e452010-03-13 20:30:15 +0000439.. function:: ProcessingInstruction(target, text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000440
Florent Xicluna583302c2010-03-13 17:56:19 +0000441 PI element factory. This factory function creates a special element that
442 will be serialized as an XML processing instruction. *target* is a string
443 containing the PI target. *text* is a string containing the PI contents, if
444 given. Returns an element instance, representing a processing instruction.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000445
446
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000447.. function:: register_namespace(prefix, uri)
448
Florent Xicluna583302c2010-03-13 17:56:19 +0000449 Registers a namespace prefix. The registry is global, and any existing
450 mapping for either the given prefix or the namespace URI will be removed.
451 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
452 attributes in this namespace will be serialized with the given prefix, if at
453 all possible.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000454
455 .. versionadded:: 2.7
456
457
Florent Xicluna88db6f42010-03-14 01:22:09 +0000458.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000459
Florent Xicluna583302c2010-03-13 17:56:19 +0000460 Subelement factory. This function creates an element instance, and appends
461 it to an existing element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000462
Florent Xicluna583302c2010-03-13 17:56:19 +0000463 The element name, attribute names, and attribute values can be either
464 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
465 the subelement name. *attrib* is an optional dictionary, containing element
466 attributes. *extra* contains additional attributes, given as keyword
467 arguments. Returns an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000468
469
Florent Xicluna88db6f42010-03-14 01:22:09 +0000470.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000471
Florent Xicluna583302c2010-03-13 17:56:19 +0000472 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000473 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
474 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
Florent Xiclunaa231e452010-03-13 20:30:15 +0000475 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
476 containing the XML data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000477
478
Florent Xicluna88db6f42010-03-14 01:22:09 +0000479.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000480
Florent Xicluna583302c2010-03-13 17:56:19 +0000481 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000482 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
483 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
484 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
485 strings containing the XML data. It does not guarantee any specific
486 sequence, except that ``"".join(tostringlist(element)) ==
487 tostring(element)``.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000488
489 .. versionadded:: 2.7
490
491
Florent Xiclunaa231e452010-03-13 20:30:15 +0000492.. function:: XML(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000493
494 Parses an XML section from a string constant. This function can be used to
Florent Xicluna583302c2010-03-13 17:56:19 +0000495 embed "XML literals" in Python code. *text* is a string containing XML
496 data. *parser* is an optional parser instance. If not given, the standard
497 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000498
499
Florent Xiclunaa231e452010-03-13 20:30:15 +0000500.. function:: XMLID(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000501
502 Parses an XML section from a string constant, and also returns a dictionary
Florent Xicluna583302c2010-03-13 17:56:19 +0000503 which maps from element id:s to elements. *text* is a string containing XML
504 data. *parser* is an optional parser instance. If not given, the standard
505 :class:`XMLParser` parser is used. Returns a tuple containing an
506 :class:`Element` instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000507
508
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000509.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000511Element Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300512^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000513
Florent Xiclunaa231e452010-03-13 20:30:15 +0000514.. class:: Element(tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000515
Florent Xicluna583302c2010-03-13 17:56:19 +0000516 Element class. This class defines the Element interface, and provides a
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000517 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000518
Florent Xicluna583302c2010-03-13 17:56:19 +0000519 The element name, attribute names, and attribute values can be either
520 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
521 an optional dictionary, containing element attributes. *extra* contains
522 additional attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000523
524
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000525 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000526
Florent Xicluna583302c2010-03-13 17:56:19 +0000527 A string identifying what kind of data this element represents (the
528 element type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000529
530
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000531 .. attribute:: text
Georg Brandl8ec7f652007-08-15 14:28:01 +0000532
Florent Xicluna583302c2010-03-13 17:56:19 +0000533 The *text* attribute can be used to hold additional data associated with
534 the element. As the name implies this attribute is usually a string but
535 may be any application-specific object. If the element is created from
536 an XML file the attribute will contain any text found between the element
537 tags.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000538
539
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000540 .. attribute:: tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000541
Florent Xicluna583302c2010-03-13 17:56:19 +0000542 The *tail* attribute can be used to hold additional data associated with
543 the element. This attribute is usually a string but may be any
544 application-specific object. If the element is created from an XML file
545 the attribute will contain any text found after the element's end tag and
546 before the next tag.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000547
Georg Brandl8ec7f652007-08-15 14:28:01 +0000548
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000549 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000550
Florent Xicluna583302c2010-03-13 17:56:19 +0000551 A dictionary containing the element's attributes. Note that while the
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000552 *attrib* value is always a real mutable Python dictionary, an ElementTree
Florent Xicluna583302c2010-03-13 17:56:19 +0000553 implementation may choose to use another internal representation, and
554 create the dictionary only if someone asks for it. To take advantage of
555 such implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000556
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000557 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000558
559
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000560 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000561
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000562 Resets an element. This function removes all subelements, clears all
563 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000564
Georg Brandl8ec7f652007-08-15 14:28:01 +0000565
Florent Xiclunaa231e452010-03-13 20:30:15 +0000566 .. method:: get(key, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000567
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000568 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000569
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000570 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000571
572
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000573 .. method:: items()
574
Florent Xicluna583302c2010-03-13 17:56:19 +0000575 Returns the element attributes as a sequence of (name, value) pairs. The
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000576 attributes are returned in an arbitrary order.
577
578
579 .. method:: keys()
580
Florent Xicluna583302c2010-03-13 17:56:19 +0000581 Returns the elements attribute names as a list. The names are returned
582 in an arbitrary order.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000583
584
585 .. method:: set(key, value)
586
587 Set the attribute *key* on the element to *value*.
588
589 The following methods work on the element's children (subelements).
590
591
592 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000593
Florent Xicluna583302c2010-03-13 17:56:19 +0000594 Adds the element *subelement* to the end of this elements internal list
595 of subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000596
597
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000598 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000599
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000600 Appends *subelements* from a sequence object with zero or more elements.
601 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000602
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000603 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000604
605
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000606 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000607
Florent Xicluna583302c2010-03-13 17:56:19 +0000608 Finds the first subelement matching *match*. *match* may be a tag name
609 or path. Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000610
611
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000612 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000613
Florent Xicluna583302c2010-03-13 17:56:19 +0000614 Finds all matching subelements, by tag name or path. Returns a list
615 containing all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000616
617
Florent Xiclunaa231e452010-03-13 20:30:15 +0000618 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000619
Florent Xicluna583302c2010-03-13 17:56:19 +0000620 Finds text for the first subelement matching *match*. *match* may be
621 a tag name or path. Returns the text content of the first matching
622 element, or *default* if no element was found. Note that if the matching
623 element has no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000624
625
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000626 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000627
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000628 .. deprecated:: 2.7
629 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000630
631
Florent Xiclunaa231e452010-03-13 20:30:15 +0000632 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000633
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000634 .. deprecated:: 2.7
635 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000636
637
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000638 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000639
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000640 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000641
642
Florent Xiclunaa231e452010-03-13 20:30:15 +0000643 .. method:: iter(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000644
Florent Xicluna583302c2010-03-13 17:56:19 +0000645 Creates a tree :term:`iterator` with the current element as the root.
646 The iterator iterates over this element and all elements below it, in
647 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
648 elements whose tag equals *tag* are returned from the iterator. If the
649 tree structure is modified during iteration, the result is undefined.
650
Ezio Melottic54d97b2011-10-09 23:56:51 +0300651 .. versionadded:: 2.7
652
Florent Xicluna583302c2010-03-13 17:56:19 +0000653
654 .. method:: iterfind(match)
655
656 Finds all matching subelements, by tag name or path. Returns an iterable
657 yielding all matching elements in document order.
658
659 .. versionadded:: 2.7
660
661
662 .. method:: itertext()
663
664 Creates a text iterator. The iterator loops over this element and all
665 subelements, in document order, and returns all inner text.
666
667 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000668
669
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000670 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000671
Florent Xicluna583302c2010-03-13 17:56:19 +0000672 Creates a new element object of the same type as this element. Do not
673 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000674
675
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000676 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000677
Florent Xicluna583302c2010-03-13 17:56:19 +0000678 Removes *subelement* from the element. Unlike the find\* methods this
679 method compares elements based on the instance identity, not on tag value
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000680 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000681
Florent Xicluna583302c2010-03-13 17:56:19 +0000682 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka7653e262013-08-29 10:34:23 +0300683 for working with subelements: :meth:`~object.__delitem__`,
684 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
685 :meth:`~object.__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000686
Florent Xicluna583302c2010-03-13 17:56:19 +0000687 Caution: Elements with no subelements will test as ``False``. This behavior
688 will change in future versions. Use specific ``len(elem)`` or ``elem is
689 None`` test instead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000690
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000691 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000692
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000693 if not element: # careful!
694 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000695
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000696 if element is None:
697 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000698
699
700.. _elementtree-elementtree-objects:
701
702ElementTree Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300703^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000704
705
Florent Xiclunaa231e452010-03-13 20:30:15 +0000706.. class:: ElementTree(element=None, file=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000707
Florent Xicluna583302c2010-03-13 17:56:19 +0000708 ElementTree wrapper class. This class represents an entire element
709 hierarchy, and adds some extra support for serialization to and from
710 standard XML.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000711
Florent Xicluna583302c2010-03-13 17:56:19 +0000712 *element* is the root element. The tree is initialized with the contents
713 of the XML *file* if given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000714
715
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000716 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000717
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000718 Replaces the root element for this tree. This discards the current
719 contents of the tree, and replaces it with the given element. Use with
Florent Xicluna583302c2010-03-13 17:56:19 +0000720 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000721
722
Florent Xicluna583302c2010-03-13 17:56:19 +0000723 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000724
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700725 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000726
727
Florent Xicluna583302c2010-03-13 17:56:19 +0000728 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000729
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700730 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000731
732
Florent Xiclunaa231e452010-03-13 20:30:15 +0000733 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000734
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700735 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000736
737
Florent Xiclunaa231e452010-03-13 20:30:15 +0000738 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000739
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000740 .. deprecated:: 2.7
741 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000742
743
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000744 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000745
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000746 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000747
748
Florent Xiclunaa231e452010-03-13 20:30:15 +0000749 .. method:: iter(tag=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000750
751 Creates and returns a tree iterator for the root element. The iterator
Florent Xicluna583302c2010-03-13 17:56:19 +0000752 loops over all elements in this tree, in section order. *tag* is the tag
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000753 to look for (default is to return all elements)
754
755
Florent Xicluna583302c2010-03-13 17:56:19 +0000756 .. method:: iterfind(match)
757
758 Finds all matching subelements, by tag name or path. Same as
759 getroot().iterfind(match). Returns an iterable yielding all matching
760 elements in document order.
761
762 .. versionadded:: 2.7
763
764
Florent Xiclunaa231e452010-03-13 20:30:15 +0000765 .. method:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000766
Florent Xicluna583302c2010-03-13 17:56:19 +0000767 Loads an external XML section into this element tree. *source* is a file
768 name or file object. *parser* is an optional parser instance. If not
769 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000770 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000771
772
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200773 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
774 default_namespace=None, method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000775
Florent Xicluna583302c2010-03-13 17:56:19 +0000776 Writes the element tree to a file, as XML. *file* is a file name, or a
777 file object opened for writing. *encoding* [1]_ is the output encoding
778 (default is US-ASCII). *xml_declaration* controls if an XML declaration
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000779 should be added to the file. Use False for never, True for always, None
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200780 for only if not US-ASCII or UTF-8 (default is None). *default_namespace*
781 sets the default XML namespace (for "xmlns"). *method* is either
Florent Xiclunaa231e452010-03-13 20:30:15 +0000782 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
783 encoded string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000784
Georg Brandl39bd0592007-12-01 22:42:46 +0000785This is the XML file that is going to be manipulated::
786
787 <html>
788 <head>
789 <title>Example page</title>
790 </head>
791 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000792 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000793 or <a href="http://example.com/">example.com</a>.</p>
794 </body>
795 </html>
796
797Example of changing the attribute "target" of every link in first paragraph::
798
799 >>> from xml.etree.ElementTree import ElementTree
800 >>> tree = ElementTree()
801 >>> tree.parse("index.xhtml")
Florent Xicluna583302c2010-03-13 17:56:19 +0000802 <Element 'html' at 0xb77e6fac>
Georg Brandl39bd0592007-12-01 22:42:46 +0000803 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
804 >>> p
Florent Xicluna583302c2010-03-13 17:56:19 +0000805 <Element 'p' at 0xb77ec26c>
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000806 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000807 >>> links
Florent Xicluna583302c2010-03-13 17:56:19 +0000808 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000809 >>> for i in links: # Iterates through all found links
810 ... i.attrib["target"] = "blank"
811 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000812
813.. _elementtree-qname-objects:
814
815QName Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300816^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000817
818
Florent Xiclunaa231e452010-03-13 20:30:15 +0000819.. class:: QName(text_or_uri, tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000820
Florent Xicluna583302c2010-03-13 17:56:19 +0000821 QName wrapper. This can be used to wrap a QName attribute value, in order
822 to get proper namespace handling on output. *text_or_uri* is a string
823 containing the QName value, in the form {uri}local, or, if the tag argument
824 is given, the URI part of a QName. If *tag* is given, the first argument is
825 interpreted as an URI, and this argument is interpreted as a local name.
826 :class:`QName` instances are opaque.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000827
828
829.. _elementtree-treebuilder-objects:
830
831TreeBuilder Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300832^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000833
834
Florent Xiclunaa231e452010-03-13 20:30:15 +0000835.. class:: TreeBuilder(element_factory=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000836
Florent Xicluna583302c2010-03-13 17:56:19 +0000837 Generic element structure builder. This builder converts a sequence of
838 start, data, and end method calls to a well-formed element structure. You
839 can use this class to build an element structure using a custom XML parser,
840 or a parser for some other XML-like format. The *element_factory* is called
841 to create new :class:`Element` instances when given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000842
843
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000844 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000845
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000846 Flushes the builder buffers, and returns the toplevel document
Florent Xicluna583302c2010-03-13 17:56:19 +0000847 element. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000848
849
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000850 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000851
Florent Xicluna583302c2010-03-13 17:56:19 +0000852 Adds text to the current element. *data* is a string. This should be
853 either a bytestring, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000854
855
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000856 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000857
Florent Xicluna583302c2010-03-13 17:56:19 +0000858 Closes the current element. *tag* is the element name. Returns the
859 closed element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000860
861
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000862 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000863
Florent Xicluna583302c2010-03-13 17:56:19 +0000864 Opens a new element. *tag* is the element name. *attrs* is a dictionary
865 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000866
867
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000868 In addition, a custom :class:`TreeBuilder` object can provide the
869 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000870
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000871 .. method:: doctype(name, pubid, system)
872
Florent Xicluna583302c2010-03-13 17:56:19 +0000873 Handles a doctype declaration. *name* is the doctype name. *pubid* is
874 the public identifier. *system* is the system identifier. This method
875 does not exist on the default :class:`TreeBuilder` class.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000876
877 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000878
879
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000880.. _elementtree-xmlparser-objects:
881
882XMLParser Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300883^^^^^^^^^^^^^^^^^
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000884
885
Florent Xiclunaa231e452010-03-13 20:30:15 +0000886.. class:: XMLParser(html=0, target=None, encoding=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000887
Florent Xicluna583302c2010-03-13 17:56:19 +0000888 :class:`Element` structure builder for XML source data, based on the expat
889 parser. *html* are predefined HTML entities. This flag is not supported by
890 the current implementation. *target* is the target object. If omitted, the
891 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
892 is optional. If given, the value overrides the encoding specified in the
893 XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000894
895
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000896 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000897
Florent Xicluna583302c2010-03-13 17:56:19 +0000898 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000899
900
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000901 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000902
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000903 .. deprecated:: 2.7
904 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
905 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000906
907
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000908 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000909
Florent Xicluna583302c2010-03-13 17:56:19 +0000910 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000911
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000912:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000913for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna583302c2010-03-13 17:56:19 +0000914and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000915calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000916:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000917This is an example of counting the maximum depth of an XML file::
918
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000919 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000920 >>> class MaxDepth: # The target object of the parser
921 ... maxDepth = 0
922 ... depth = 0
923 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000924 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +0000925 ... if self.depth > self.maxDepth:
926 ... self.maxDepth = self.depth
927 ... def end(self, tag): # Called for each closing tag.
928 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000929 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +0000930 ... pass # We do not need to do anything with data.
931 ... def close(self): # Called when all data has been parsed.
932 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000933 ...
Georg Brandl39bd0592007-12-01 22:42:46 +0000934 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000935 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +0000936 >>> exampleXml = """
937 ... <a>
938 ... <b>
939 ... </b>
940 ... <b>
941 ... <c>
942 ... <d>
943 ... </d>
944 ... </c>
945 ... </b>
946 ... </a>"""
947 >>> parser.feed(exampleXml)
948 >>> parser.close()
949 4
Mark Summerfield43da35d2008-03-17 08:28:15 +0000950
951
952.. rubric:: Footnotes
953
954.. [#] The encoding string included in XML output should conform to the
Florent Xicluna583302c2010-03-13 17:56:19 +0000955 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
956 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl0f5d6c02014-10-29 10:57:37 +0100957 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.