blob: 4f9b2a37cbdbb503ce971acf84fa8913bf98ed87 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
8
9.. versionadded:: 2.5
10
Éric Araujo29a0b572011-08-19 02:14:03 +020011**Source code:** :source:`Lib/xml/etree/ElementTree.py`
12
13--------------
14
Florent Xicluna583302c2010-03-13 17:56:19 +000015The :class:`Element` type is a flexible container object, designed to store
16hierarchical data structures in memory. The type can be described as a cross
17between a list and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +000018
Christian Heimes23790b42013-03-26 17:53:05 +010019
20.. warning::
21
22 The :mod:`xml.etree.ElementTree` module is not secure against
23 maliciously constructed data. If you need to parse untrusted or
24 unauthenticated data see :ref:`xml-vulnerabilities`.
25
26
Georg Brandl8ec7f652007-08-15 14:28:01 +000027Each element has a number of properties associated with it:
28
29* a tag which is a string identifying what kind of data this element represents
30 (the element type, in other words).
31
32* a number of attributes, stored in a Python dictionary.
33
34* a text string.
35
36* an optional tail string.
37
38* a number of child elements, stored in a Python sequence
39
Florent Xicluna3e8c1892010-03-11 14:36:19 +000040To create an element instance, use the :class:`Element` constructor or the
41:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43The :class:`ElementTree` class can be used to wrap an element structure, and
44convert it from and to XML.
45
46A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
47
Georg Brandl39bd0592007-12-01 22:42:46 +000048See http://effbot.org/zone/element-index.htm for tutorials and links to other
Florent Xicluna583302c2010-03-13 17:56:19 +000049docs. Fredrik Lundh's page is also the location of the development version of
50the xml.etree.ElementTree.
51
52.. versionchanged:: 2.7
53 The ElementTree API is updated to 1.3. For more information, see
54 `Introducing ElementTree 1.3
55 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
56
Eli Bendersky6ee21872012-08-18 05:40:38 +030057Tutorial
58--------
59
60This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
61short). The goal is to demonstrate some of the building blocks and basic
62concepts of the module.
63
64XML tree and elements
65^^^^^^^^^^^^^^^^^^^^^
66
67XML is an inherently hierarchical data format, and the most natural way to
68represent it is with a tree. ``ET`` has two classes for this purpose -
69:class:`ElementTree` represents the whole XML document as a tree, and
70:class:`Element` represents a single node in this tree. Interactions with
71the whole document (reading and writing to/from files) are usually done
72on the :class:`ElementTree` level. Interactions with a single XML element
73and its sub-elements are done on the :class:`Element` level.
74
75.. _elementtree-parsing-xml:
76
77Parsing XML
78^^^^^^^^^^^
79
80We'll be using the following XML document as the sample data for this section:
81
82.. code-block:: xml
83
84 <?xml version="1.0"?>
85 <data>
86 <country name="Liechtenstein">
87 <rank>1</rank>
88 <year>2008</year>
89 <gdppc>141100</gdppc>
90 <neighbor name="Austria" direction="E"/>
91 <neighbor name="Switzerland" direction="W"/>
92 </country>
93 <country name="Singapore">
94 <rank>4</rank>
95 <year>2011</year>
96 <gdppc>59900</gdppc>
97 <neighbor name="Malaysia" direction="N"/>
98 </country>
99 <country name="Panama">
100 <rank>68</rank>
101 <year>2011</year>
102 <gdppc>13600</gdppc>
103 <neighbor name="Costa Rica" direction="W"/>
104 <neighbor name="Colombia" direction="E"/>
105 </country>
106 </data>
107
108We have a number of ways to import the data. Reading the file from disk::
109
110 import xml.etree.ElementTree as ET
111 tree = ET.parse('country_data.xml')
112 root = tree.getroot()
113
114Reading the data from a string::
115
116 root = ET.fromstring(country_data_as_string)
117
118:func:`fromstring` parses XML from a string directly into an :class:`Element`,
119which is the root element of the parsed tree. Other parsing functions may
120create an :class:`ElementTree`. Check the documentation to be sure.
121
122As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
123
124 >>> root.tag
125 'data'
126 >>> root.attrib
127 {}
128
129It also has children nodes over which we can iterate::
130
131 >>> for child in root:
132 ... print child.tag, child.attrib
133 ...
134 country {'name': 'Liechtenstein'}
135 country {'name': 'Singapore'}
136 country {'name': 'Panama'}
137
138Children are nested, and we can access specific child nodes by index::
139
140 >>> root[0][1].text
141 '2008'
142
143Finding interesting elements
144^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145
146:class:`Element` has some useful methods that help iterate recursively over all
147the sub-tree below it (its children, their children, and so on). For example,
148:meth:`Element.iter`::
149
150 >>> for neighbor in root.iter('neighbor'):
151 ... print neighbor.attrib
152 ...
153 {'name': 'Austria', 'direction': 'E'}
154 {'name': 'Switzerland', 'direction': 'W'}
155 {'name': 'Malaysia', 'direction': 'N'}
156 {'name': 'Costa Rica', 'direction': 'W'}
157 {'name': 'Colombia', 'direction': 'E'}
158
159:meth:`Element.findall` finds only elements with a tag which are direct
160children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandl5e7f16e2013-10-06 09:23:03 +0200161with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky6ee21872012-08-18 05:40:38 +0300162content. :meth:`Element.get` accesses the element's attributes::
163
164 >>> for country in root.findall('country'):
165 ... rank = country.find('rank').text
166 ... name = country.get('name')
167 ... print name, rank
168 ...
169 Liechtenstein 1
170 Singapore 4
171 Panama 68
172
173More sophisticated specification of which elements to look for is possible by
174using :ref:`XPath <elementtree-xpath>`.
175
176Modifying an XML File
177^^^^^^^^^^^^^^^^^^^^^
178
179:class:`ElementTree` provides a simple way to build XML documents and write them to files.
180The :meth:`ElementTree.write` method serves this purpose.
181
182Once created, an :class:`Element` object may be manipulated by directly changing
183its fields (such as :attr:`Element.text`), adding and modifying attributes
184(:meth:`Element.set` method), as well as adding new children (for example
185with :meth:`Element.append`).
186
187Let's say we want to add one to each country's rank, and add an ``updated``
188attribute to the rank element::
189
190 >>> for rank in root.iter('rank'):
191 ... new_rank = int(rank.text) + 1
192 ... rank.text = str(new_rank)
193 ... rank.set('updated', 'yes')
194 ...
195 >>> tree.write('output.xml')
196
197Our XML now looks like this:
198
199.. code-block:: xml
200
201 <?xml version="1.0"?>
202 <data>
203 <country name="Liechtenstein">
204 <rank updated="yes">2</rank>
205 <year>2008</year>
206 <gdppc>141100</gdppc>
207 <neighbor name="Austria" direction="E"/>
208 <neighbor name="Switzerland" direction="W"/>
209 </country>
210 <country name="Singapore">
211 <rank updated="yes">5</rank>
212 <year>2011</year>
213 <gdppc>59900</gdppc>
214 <neighbor name="Malaysia" direction="N"/>
215 </country>
216 <country name="Panama">
217 <rank updated="yes">69</rank>
218 <year>2011</year>
219 <gdppc>13600</gdppc>
220 <neighbor name="Costa Rica" direction="W"/>
221 <neighbor name="Colombia" direction="E"/>
222 </country>
223 </data>
224
225We can remove elements using :meth:`Element.remove`. Let's say we want to
226remove all countries with a rank higher than 50::
227
228 >>> for country in root.findall('country'):
229 ... rank = int(country.find('rank').text)
230 ... if rank > 50:
231 ... root.remove(country)
232 ...
233 >>> tree.write('output.xml')
234
235Our XML now looks like this:
236
237.. code-block:: xml
238
239 <?xml version="1.0"?>
240 <data>
241 <country name="Liechtenstein">
242 <rank updated="yes">2</rank>
243 <year>2008</year>
244 <gdppc>141100</gdppc>
245 <neighbor name="Austria" direction="E"/>
246 <neighbor name="Switzerland" direction="W"/>
247 </country>
248 <country name="Singapore">
249 <rank updated="yes">5</rank>
250 <year>2011</year>
251 <gdppc>59900</gdppc>
252 <neighbor name="Malaysia" direction="N"/>
253 </country>
254 </data>
255
256Building XML documents
257^^^^^^^^^^^^^^^^^^^^^^
258
259The :func:`SubElement` function also provides a convenient way to create new
260sub-elements for a given element::
261
262 >>> a = ET.Element('a')
263 >>> b = ET.SubElement(a, 'b')
264 >>> c = ET.SubElement(a, 'c')
265 >>> d = ET.SubElement(c, 'd')
266 >>> ET.dump(a)
267 <a><b /><c><d /></c></a>
268
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700269Parsing XML with Namespaces
270^^^^^^^^^^^^^^^^^^^^^^^^^^^
271
272If the XML input has `namespaces
273<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
274with prefixes in the form ``prefix:sometag`` get expanded to
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700275``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
276Also, if there is a `default namespace
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700277<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
278that full URI gets prepended to all of the non-prefixed tags.
279
280Here is an XML example that incorporates two namespaces, one with the
281prefix "fictional" and the other serving as the default namespace:
282
283.. code-block:: xml
284
285 <?xml version="1.0"?>
286 <actors xmlns:fictional="http://characters.example.com"
287 xmlns="http://people.example.com">
288 <actor>
289 <name>John Cleese</name>
290 <fictional:character>Lancelot</fictional:character>
291 <fictional:character>Archie Leach</fictional:character>
292 </actor>
293 <actor>
294 <name>Eric Idle</name>
295 <fictional:character>Sir Robin</fictional:character>
296 <fictional:character>Gunther</fictional:character>
297 <fictional:character>Commander Clement</fictional:character>
298 </actor>
299 </actors>
300
301One way to search and explore this XML example is to manually add the
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700302URI to every tag or attribute in the xpath of a
303:meth:`~Element.find` or :meth:`~Element.findall`::
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700304
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700305 root = fromstring(xml_text)
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700306 for actor in root.findall('{http://people.example.com}actor'):
307 name = actor.find('{http://people.example.com}name')
308 print name.text
309 for char in actor.findall('{http://characters.example.com}character'):
310 print ' |-->', char.text
311
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700312
313A better way to search the namespaced XML example is to create a
314dictionary with your own prefixes and use those in the search functions::
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700315
316 ns = {'real_person': 'http://people.example.com',
317 'role': 'http://characters.example.com'}
318
319 for actor in root.findall('real_person:actor', ns):
320 name = actor.find('real_person:name', ns)
321 print name.text
322 for char in actor.findall('role:character', ns):
323 print ' |-->', char.text
324
325These two approaches both output::
326
327 John Cleese
328 |--> Lancelot
329 |--> Archie Leach
330 Eric Idle
331 |--> Sir Robin
332 |--> Gunther
333 |--> Commander Clement
334
335
Eli Bendersky6ee21872012-08-18 05:40:38 +0300336Additional resources
337^^^^^^^^^^^^^^^^^^^^
338
339See http://effbot.org/zone/element-index.htm for tutorials and links to other
340docs.
341
342.. _elementtree-xpath:
343
344XPath support
345-------------
346
347This module provides limited support for
348`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
349tree. The goal is to support a small subset of the abbreviated syntax; a full
350XPath engine is outside the scope of the module.
351
352Example
353^^^^^^^
354
355Here's an example that demonstrates some of the XPath capabilities of the
356module. We'll be using the ``countrydata`` XML document from the
357:ref:`Parsing XML <elementtree-parsing-xml>` section::
358
359 import xml.etree.ElementTree as ET
360
361 root = ET.fromstring(countrydata)
362
363 # Top-level elements
364 root.findall(".")
365
366 # All 'neighbor' grand-children of 'country' children of the top-level
367 # elements
368 root.findall("./country/neighbor")
369
370 # Nodes with name='Singapore' that have a 'year' child
371 root.findall(".//year/..[@name='Singapore']")
372
373 # 'year' nodes that are children of nodes with name='Singapore'
374 root.findall(".//*[@name='Singapore']/year")
375
376 # All 'neighbor' nodes that are the second child of their parent
377 root.findall(".//neighbor[2]")
378
379Supported XPath syntax
380^^^^^^^^^^^^^^^^^^^^^^
381
Georg Brandl44ea77b2013-03-28 13:28:44 +0100382.. tabularcolumns:: |l|L|
383
Eli Bendersky6ee21872012-08-18 05:40:38 +0300384+-----------------------+------------------------------------------------------+
385| Syntax | Meaning |
386+=======================+======================================================+
387| ``tag`` | Selects all child elements with the given tag. |
388| | For example, ``spam`` selects all child elements |
Raymond Hettinger37083492014-03-29 11:49:11 -0700389| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky6ee21872012-08-18 05:40:38 +0300390| | grandchildren named ``egg`` in all children named |
391| | ``spam``. |
392+-----------------------+------------------------------------------------------+
393| ``*`` | Selects all child elements. For example, ``*/egg`` |
394| | selects all grandchildren named ``egg``. |
395+-----------------------+------------------------------------------------------+
396| ``.`` | Selects the current node. This is mostly useful |
397| | at the beginning of the path, to indicate that it's |
398| | a relative path. |
399+-----------------------+------------------------------------------------------+
400| ``//`` | Selects all subelements, on all levels beneath the |
401| | current element. For example, ``.//egg`` selects |
402| | all ``egg`` elements in the entire tree. |
403+-----------------------+------------------------------------------------------+
404| ``..`` | Selects the parent element. |
405+-----------------------+------------------------------------------------------+
406| ``[@attrib]`` | Selects all elements that have the given attribute. |
407+-----------------------+------------------------------------------------------+
408| ``[@attrib='value']`` | Selects all elements for which the given attribute |
409| | has the given value. The value cannot contain |
410| | quotes. |
411+-----------------------+------------------------------------------------------+
412| ``[tag]`` | Selects all elements that have a child named |
413| | ``tag``. Only immediate children are supported. |
414+-----------------------+------------------------------------------------------+
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700415| ``[tag='text']`` | Selects all elements that have a child named |
416| | ``tag`` whose complete text content, including |
417| | descendants, equals the given ``text``. |
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700418+-----------------------+------------------------------------------------------+
Eli Bendersky6ee21872012-08-18 05:40:38 +0300419| ``[position]`` | Selects all elements that are located at the given |
420| | position. The position can be either an integer |
421| | (1 is the first position), the expression ``last()`` |
422| | (for the last position), or a position relative to |
423| | the last position (e.g. ``last()-1``). |
424+-----------------------+------------------------------------------------------+
425
426Predicates (expressions within square brackets) must be preceded by a tag
427name, an asterisk, or another predicate. ``position`` predicates must be
428preceded by a tag name.
429
430Reference
431---------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432
433.. _elementtree-functions:
434
435Functions
Eli Bendersky6ee21872012-08-18 05:40:38 +0300436^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000437
438
Florent Xiclunaa231e452010-03-13 20:30:15 +0000439.. function:: Comment(text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000440
Florent Xicluna583302c2010-03-13 17:56:19 +0000441 Comment element factory. This factory function creates a special element
442 that will be serialized as an XML comment by the standard serializer. The
443 comment string can be either a bytestring or a Unicode string. *text* is a
444 string containing the comment string. Returns an element instance
445 representing a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000446
447
448.. function:: dump(elem)
449
Florent Xicluna583302c2010-03-13 17:56:19 +0000450 Writes an element tree or element structure to sys.stdout. This function
451 should be used for debugging only.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000452
453 The exact output format is implementation dependent. In this version, it's
454 written as an ordinary XML file.
455
456 *elem* is an element tree or an individual element.
457
458
Georg Brandl8ec7f652007-08-15 14:28:01 +0000459.. function:: fromstring(text)
460
Florent Xicluna88db6f42010-03-14 01:22:09 +0000461 Parses an XML section from a string constant. Same as :func:`XML`. *text*
462 is a string containing XML data. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000463
464
Florent Xiclunaa231e452010-03-13 20:30:15 +0000465.. function:: fromstringlist(sequence, parser=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000466
Florent Xicluna583302c2010-03-13 17:56:19 +0000467 Parses an XML document from a sequence of string fragments. *sequence* is a
468 list or other sequence containing XML data fragments. *parser* is an
469 optional parser instance. If not given, the standard :class:`XMLParser`
470 parser is used. Returns an :class:`Element` instance.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000471
472 .. versionadded:: 2.7
473
474
Georg Brandl8ec7f652007-08-15 14:28:01 +0000475.. function:: iselement(element)
476
Florent Xicluna583302c2010-03-13 17:56:19 +0000477 Checks if an object appears to be a valid element object. *element* is an
478 element instance. Returns a true value if this is an element object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000479
480
Florent Xiclunaa231e452010-03-13 20:30:15 +0000481.. function:: iterparse(source, events=None, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482
483 Parses an XML section into an element tree incrementally, and reports what's
Florent Xicluna583302c2010-03-13 17:56:19 +0000484 going on to the user. *source* is a filename or file object containing XML
485 data. *events* is a list of events to report back. If omitted, only "end"
486 events are reported. *parser* is an optional parser instance. If not
Eli Benderskyf4fbf242013-01-24 07:28:33 -0800487 given, the standard :class:`XMLParser` parser is used. *parser* is not
488 supported by ``cElementTree``. Returns an :term:`iterator` providing
489 ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000490
Georg Brandlfb222632009-01-01 11:46:51 +0000491 .. note::
492
493 :func:`iterparse` only guarantees that it has seen the ">"
494 character of a starting tag when it emits a "start" event, so the
495 attributes are defined, but the contents of the text and tail attributes
496 are undefined at that point. The same applies to the element children;
497 they may or may not be present.
498
499 If you need a fully populated element, look for "end" events instead.
500
Georg Brandl8ec7f652007-08-15 14:28:01 +0000501
Florent Xiclunaa231e452010-03-13 20:30:15 +0000502.. function:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000503
Florent Xicluna583302c2010-03-13 17:56:19 +0000504 Parses an XML section into an element tree. *source* is a filename or file
505 object containing XML data. *parser* is an optional parser instance. If
506 not given, the standard :class:`XMLParser` parser is used. Returns an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000507 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000508
509
Florent Xiclunaa231e452010-03-13 20:30:15 +0000510.. function:: ProcessingInstruction(target, text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000511
Florent Xicluna583302c2010-03-13 17:56:19 +0000512 PI element factory. This factory function creates a special element that
513 will be serialized as an XML processing instruction. *target* is a string
514 containing the PI target. *text* is a string containing the PI contents, if
515 given. Returns an element instance, representing a processing instruction.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000516
517
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000518.. function:: register_namespace(prefix, uri)
519
Florent Xicluna583302c2010-03-13 17:56:19 +0000520 Registers a namespace prefix. The registry is global, and any existing
521 mapping for either the given prefix or the namespace URI will be removed.
522 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
523 attributes in this namespace will be serialized with the given prefix, if at
524 all possible.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000525
526 .. versionadded:: 2.7
527
528
Florent Xicluna88db6f42010-03-14 01:22:09 +0000529.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000530
Florent Xicluna583302c2010-03-13 17:56:19 +0000531 Subelement factory. This function creates an element instance, and appends
532 it to an existing element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000533
Florent Xicluna583302c2010-03-13 17:56:19 +0000534 The element name, attribute names, and attribute values can be either
535 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
536 the subelement name. *attrib* is an optional dictionary, containing element
537 attributes. *extra* contains additional attributes, given as keyword
538 arguments. Returns an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000539
540
Florent Xicluna88db6f42010-03-14 01:22:09 +0000541.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000542
Florent Xicluna583302c2010-03-13 17:56:19 +0000543 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000544 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
545 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
Florent Xiclunaa231e452010-03-13 20:30:15 +0000546 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
547 containing the XML data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000548
549
Florent Xicluna88db6f42010-03-14 01:22:09 +0000550.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000551
Florent Xicluna583302c2010-03-13 17:56:19 +0000552 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000553 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
554 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
555 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
556 strings containing the XML data. It does not guarantee any specific
557 sequence, except that ``"".join(tostringlist(element)) ==
558 tostring(element)``.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000559
560 .. versionadded:: 2.7
561
562
Florent Xiclunaa231e452010-03-13 20:30:15 +0000563.. function:: XML(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000564
565 Parses an XML section from a string constant. This function can be used to
Florent Xicluna583302c2010-03-13 17:56:19 +0000566 embed "XML literals" in Python code. *text* is a string containing XML
567 data. *parser* is an optional parser instance. If not given, the standard
568 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000569
570
Florent Xiclunaa231e452010-03-13 20:30:15 +0000571.. function:: XMLID(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000572
573 Parses an XML section from a string constant, and also returns a dictionary
Florent Xicluna583302c2010-03-13 17:56:19 +0000574 which maps from element id:s to elements. *text* is a string containing XML
575 data. *parser* is an optional parser instance. If not given, the standard
576 :class:`XMLParser` parser is used. Returns a tuple containing an
577 :class:`Element` instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000578
579
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000580.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000581
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000582Element Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300583^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000584
Florent Xiclunaa231e452010-03-13 20:30:15 +0000585.. class:: Element(tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000586
Florent Xicluna583302c2010-03-13 17:56:19 +0000587 Element class. This class defines the Element interface, and provides a
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000588 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000589
Florent Xicluna583302c2010-03-13 17:56:19 +0000590 The element name, attribute names, and attribute values can be either
591 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
592 an optional dictionary, containing element attributes. *extra* contains
593 additional attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000594
595
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000596 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000597
Florent Xicluna583302c2010-03-13 17:56:19 +0000598 A string identifying what kind of data this element represents (the
599 element type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000600
601
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000602 .. attribute:: text
Ned Deilyc9a5a192015-08-17 22:10:47 -0400603 tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000604
Ned Deilyc9a5a192015-08-17 22:10:47 -0400605 These attributes can be used to hold additional data associated with
606 the element. Their values are usually strings but may be any
607 application-specific object. If the element is created from
608 an XML file, the *text* attribute holds either the text between
609 the element's start tag and its first child or end tag, or ``None``, and
610 the *tail* attribute holds either the text between the element's
611 end tag and the next tag, or ``None``. For the XML data
Georg Brandl8ec7f652007-08-15 14:28:01 +0000612
Ned Deilyc9a5a192015-08-17 22:10:47 -0400613 .. code-block:: xml
Georg Brandl8ec7f652007-08-15 14:28:01 +0000614
Ned Deilyc9a5a192015-08-17 22:10:47 -0400615 <a><b>1<c>2<d/>3</c></b>4</a>
Georg Brandl8ec7f652007-08-15 14:28:01 +0000616
Ned Deilyc9a5a192015-08-17 22:10:47 -0400617 the *a* element has ``None`` for both *text* and *tail* attributes,
618 the *b* element has *text* ``"1"`` and *tail* ``"4"``,
619 the *c* element has *text* ``"2"`` and *tail* ``None``,
620 and the *d* element has *text* ``None`` and *tail* ``"3"``.
621
622 To collect the inner text of an element, see :meth:`itertext`, for
623 example ``"".join(element.itertext())``.
624
625 Applications may store arbitrary objects in these attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000626
Georg Brandl8ec7f652007-08-15 14:28:01 +0000627
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000628 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000629
Florent Xicluna583302c2010-03-13 17:56:19 +0000630 A dictionary containing the element's attributes. Note that while the
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000631 *attrib* value is always a real mutable Python dictionary, an ElementTree
Florent Xicluna583302c2010-03-13 17:56:19 +0000632 implementation may choose to use another internal representation, and
633 create the dictionary only if someone asks for it. To take advantage of
634 such implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000635
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000636 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000637
638
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000639 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000640
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000641 Resets an element. This function removes all subelements, clears all
642 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000643
Georg Brandl8ec7f652007-08-15 14:28:01 +0000644
Florent Xiclunaa231e452010-03-13 20:30:15 +0000645 .. method:: get(key, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000646
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000647 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000648
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000649 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000650
651
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000652 .. method:: items()
653
Florent Xicluna583302c2010-03-13 17:56:19 +0000654 Returns the element attributes as a sequence of (name, value) pairs. The
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000655 attributes are returned in an arbitrary order.
656
657
658 .. method:: keys()
659
Florent Xicluna583302c2010-03-13 17:56:19 +0000660 Returns the elements attribute names as a list. The names are returned
661 in an arbitrary order.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000662
663
664 .. method:: set(key, value)
665
666 Set the attribute *key* on the element to *value*.
667
668 The following methods work on the element's children (subelements).
669
670
671 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000672
Florent Xicluna583302c2010-03-13 17:56:19 +0000673 Adds the element *subelement* to the end of this elements internal list
674 of subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000675
676
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000677 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000678
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000679 Appends *subelements* from a sequence object with zero or more elements.
680 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000681
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000682 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000683
684
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000685 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000686
Florent Xicluna583302c2010-03-13 17:56:19 +0000687 Finds the first subelement matching *match*. *match* may be a tag name
688 or path. Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000689
690
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000691 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000692
Florent Xicluna583302c2010-03-13 17:56:19 +0000693 Finds all matching subelements, by tag name or path. Returns a list
694 containing all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000695
696
Florent Xiclunaa231e452010-03-13 20:30:15 +0000697 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000698
Florent Xicluna583302c2010-03-13 17:56:19 +0000699 Finds text for the first subelement matching *match*. *match* may be
700 a tag name or path. Returns the text content of the first matching
701 element, or *default* if no element was found. Note that if the matching
702 element has no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000703
704
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000705 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000706
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000707 .. deprecated:: 2.7
708 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000709
710
Florent Xiclunaa231e452010-03-13 20:30:15 +0000711 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000712
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000713 .. deprecated:: 2.7
714 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000715
716
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000717 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000718
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000719 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000720
721
Florent Xiclunaa231e452010-03-13 20:30:15 +0000722 .. method:: iter(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000723
Florent Xicluna583302c2010-03-13 17:56:19 +0000724 Creates a tree :term:`iterator` with the current element as the root.
725 The iterator iterates over this element and all elements below it, in
726 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
727 elements whose tag equals *tag* are returned from the iterator. If the
728 tree structure is modified during iteration, the result is undefined.
729
Ezio Melottic54d97b2011-10-09 23:56:51 +0300730 .. versionadded:: 2.7
731
Florent Xicluna583302c2010-03-13 17:56:19 +0000732
733 .. method:: iterfind(match)
734
735 Finds all matching subelements, by tag name or path. Returns an iterable
736 yielding all matching elements in document order.
737
738 .. versionadded:: 2.7
739
740
741 .. method:: itertext()
742
743 Creates a text iterator. The iterator loops over this element and all
744 subelements, in document order, and returns all inner text.
745
746 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000747
748
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000749 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000750
Florent Xicluna583302c2010-03-13 17:56:19 +0000751 Creates a new element object of the same type as this element. Do not
752 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000753
754
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000755 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000756
Florent Xicluna583302c2010-03-13 17:56:19 +0000757 Removes *subelement* from the element. Unlike the find\* methods this
758 method compares elements based on the instance identity, not on tag value
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000759 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000760
Florent Xicluna583302c2010-03-13 17:56:19 +0000761 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka7653e262013-08-29 10:34:23 +0300762 for working with subelements: :meth:`~object.__delitem__`,
763 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
764 :meth:`~object.__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000765
Florent Xicluna583302c2010-03-13 17:56:19 +0000766 Caution: Elements with no subelements will test as ``False``. This behavior
767 will change in future versions. Use specific ``len(elem)`` or ``elem is
768 None`` test instead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000769
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000770 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000771
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000772 if not element: # careful!
773 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000774
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000775 if element is None:
776 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000777
778
779.. _elementtree-elementtree-objects:
780
781ElementTree Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300782^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000783
784
Florent Xiclunaa231e452010-03-13 20:30:15 +0000785.. class:: ElementTree(element=None, file=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000786
Florent Xicluna583302c2010-03-13 17:56:19 +0000787 ElementTree wrapper class. This class represents an entire element
788 hierarchy, and adds some extra support for serialization to and from
789 standard XML.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000790
Florent Xicluna583302c2010-03-13 17:56:19 +0000791 *element* is the root element. The tree is initialized with the contents
792 of the XML *file* if given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000793
794
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000795 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000796
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000797 Replaces the root element for this tree. This discards the current
798 contents of the tree, and replaces it with the given element. Use with
Florent Xicluna583302c2010-03-13 17:56:19 +0000799 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000800
801
Florent Xicluna583302c2010-03-13 17:56:19 +0000802 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000803
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700804 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000805
806
Florent Xicluna583302c2010-03-13 17:56:19 +0000807 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000808
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700809 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000810
811
Florent Xiclunaa231e452010-03-13 20:30:15 +0000812 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000813
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700814 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000815
816
Florent Xiclunaa231e452010-03-13 20:30:15 +0000817 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000818
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000819 .. deprecated:: 2.7
820 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000821
822
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000823 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000824
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000825 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000826
827
Florent Xiclunaa231e452010-03-13 20:30:15 +0000828 .. method:: iter(tag=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000829
830 Creates and returns a tree iterator for the root element. The iterator
Florent Xicluna583302c2010-03-13 17:56:19 +0000831 loops over all elements in this tree, in section order. *tag* is the tag
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000832 to look for (default is to return all elements)
833
834
Florent Xicluna583302c2010-03-13 17:56:19 +0000835 .. method:: iterfind(match)
836
837 Finds all matching subelements, by tag name or path. Same as
838 getroot().iterfind(match). Returns an iterable yielding all matching
839 elements in document order.
840
841 .. versionadded:: 2.7
842
843
Florent Xiclunaa231e452010-03-13 20:30:15 +0000844 .. method:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000845
Florent Xicluna583302c2010-03-13 17:56:19 +0000846 Loads an external XML section into this element tree. *source* is a file
847 name or file object. *parser* is an optional parser instance. If not
848 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000849 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000850
851
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200852 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
853 default_namespace=None, method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000854
Florent Xicluna583302c2010-03-13 17:56:19 +0000855 Writes the element tree to a file, as XML. *file* is a file name, or a
856 file object opened for writing. *encoding* [1]_ is the output encoding
857 (default is US-ASCII). *xml_declaration* controls if an XML declaration
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000858 should be added to the file. Use False for never, True for always, None
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200859 for only if not US-ASCII or UTF-8 (default is None). *default_namespace*
860 sets the default XML namespace (for "xmlns"). *method* is either
Florent Xiclunaa231e452010-03-13 20:30:15 +0000861 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
862 encoded string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000863
Georg Brandl39bd0592007-12-01 22:42:46 +0000864This is the XML file that is going to be manipulated::
865
866 <html>
867 <head>
868 <title>Example page</title>
869 </head>
870 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000871 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000872 or <a href="http://example.com/">example.com</a>.</p>
873 </body>
874 </html>
875
876Example of changing the attribute "target" of every link in first paragraph::
877
878 >>> from xml.etree.ElementTree import ElementTree
879 >>> tree = ElementTree()
880 >>> tree.parse("index.xhtml")
Florent Xicluna583302c2010-03-13 17:56:19 +0000881 <Element 'html' at 0xb77e6fac>
Georg Brandl39bd0592007-12-01 22:42:46 +0000882 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
883 >>> p
Florent Xicluna583302c2010-03-13 17:56:19 +0000884 <Element 'p' at 0xb77ec26c>
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000885 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000886 >>> links
Florent Xicluna583302c2010-03-13 17:56:19 +0000887 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000888 >>> for i in links: # Iterates through all found links
889 ... i.attrib["target"] = "blank"
890 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000891
892.. _elementtree-qname-objects:
893
894QName Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300895^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000896
897
Florent Xiclunaa231e452010-03-13 20:30:15 +0000898.. class:: QName(text_or_uri, tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000899
Florent Xicluna583302c2010-03-13 17:56:19 +0000900 QName wrapper. This can be used to wrap a QName attribute value, in order
901 to get proper namespace handling on output. *text_or_uri* is a string
902 containing the QName value, in the form {uri}local, or, if the tag argument
903 is given, the URI part of a QName. If *tag* is given, the first argument is
904 interpreted as an URI, and this argument is interpreted as a local name.
905 :class:`QName` instances are opaque.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000906
907
908.. _elementtree-treebuilder-objects:
909
910TreeBuilder Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300911^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000912
913
Florent Xiclunaa231e452010-03-13 20:30:15 +0000914.. class:: TreeBuilder(element_factory=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000915
Florent Xicluna583302c2010-03-13 17:56:19 +0000916 Generic element structure builder. This builder converts a sequence of
917 start, data, and end method calls to a well-formed element structure. You
918 can use this class to build an element structure using a custom XML parser,
919 or a parser for some other XML-like format. The *element_factory* is called
920 to create new :class:`Element` instances when given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000921
922
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000923 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000924
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000925 Flushes the builder buffers, and returns the toplevel document
Florent Xicluna583302c2010-03-13 17:56:19 +0000926 element. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000927
928
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000929 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000930
Florent Xicluna583302c2010-03-13 17:56:19 +0000931 Adds text to the current element. *data* is a string. This should be
932 either a bytestring, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000933
934
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000935 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000936
Florent Xicluna583302c2010-03-13 17:56:19 +0000937 Closes the current element. *tag* is the element name. Returns the
938 closed element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000939
940
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000941 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000942
Florent Xicluna583302c2010-03-13 17:56:19 +0000943 Opens a new element. *tag* is the element name. *attrs* is a dictionary
944 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000945
946
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000947 In addition, a custom :class:`TreeBuilder` object can provide the
948 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000949
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000950 .. method:: doctype(name, pubid, system)
951
Florent Xicluna583302c2010-03-13 17:56:19 +0000952 Handles a doctype declaration. *name* is the doctype name. *pubid* is
953 the public identifier. *system* is the system identifier. This method
954 does not exist on the default :class:`TreeBuilder` class.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000955
956 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000957
958
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000959.. _elementtree-xmlparser-objects:
960
961XMLParser Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300962^^^^^^^^^^^^^^^^^
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000963
964
Florent Xiclunaa231e452010-03-13 20:30:15 +0000965.. class:: XMLParser(html=0, target=None, encoding=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000966
Florent Xicluna583302c2010-03-13 17:56:19 +0000967 :class:`Element` structure builder for XML source data, based on the expat
968 parser. *html* are predefined HTML entities. This flag is not supported by
969 the current implementation. *target* is the target object. If omitted, the
970 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
971 is optional. If given, the value overrides the encoding specified in the
972 XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000973
974
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000975 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000976
Florent Xicluna583302c2010-03-13 17:56:19 +0000977 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000978
979
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000980 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000981
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000982 .. deprecated:: 2.7
983 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
984 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000985
986
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000987 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000988
Florent Xicluna583302c2010-03-13 17:56:19 +0000989 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000990
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000991:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000992for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna583302c2010-03-13 17:56:19 +0000993and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000994calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000995:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000996This is an example of counting the maximum depth of an XML file::
997
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000998 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000999 >>> class MaxDepth: # The target object of the parser
1000 ... maxDepth = 0
1001 ... depth = 0
1002 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001003 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +00001004 ... if self.depth > self.maxDepth:
1005 ... self.maxDepth = self.depth
1006 ... def end(self, tag): # Called for each closing tag.
1007 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001008 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +00001009 ... pass # We do not need to do anything with data.
1010 ... def close(self): # Called when all data has been parsed.
1011 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001012 ...
Georg Brandl39bd0592007-12-01 22:42:46 +00001013 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +00001014 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +00001015 >>> exampleXml = """
1016 ... <a>
1017 ... <b>
1018 ... </b>
1019 ... <b>
1020 ... <c>
1021 ... <d>
1022 ... </d>
1023 ... </c>
1024 ... </b>
1025 ... </a>"""
1026 >>> parser.feed(exampleXml)
1027 >>> parser.close()
1028 4
Mark Summerfield43da35d2008-03-17 08:28:15 +00001029
1030
1031.. rubric:: Footnotes
1032
1033.. [#] The encoding string included in XML output should conform to the
Florent Xicluna583302c2010-03-13 17:56:19 +00001034 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1035 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl0f5d6c02014-10-29 10:57:37 +01001036 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.