blob: 46396dde90687cfa44b31848046ed54452bd031f [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
8
9.. versionadded:: 2.5
10
Éric Araujo29a0b572011-08-19 02:14:03 +020011**Source code:** :source:`Lib/xml/etree/ElementTree.py`
12
13--------------
14
Florent Xicluna583302c2010-03-13 17:56:19 +000015The :class:`Element` type is a flexible container object, designed to store
16hierarchical data structures in memory. The type can be described as a cross
17between a list and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +000018
Christian Heimes23790b42013-03-26 17:53:05 +010019
20.. warning::
21
22 The :mod:`xml.etree.ElementTree` module is not secure against
23 maliciously constructed data. If you need to parse untrusted or
24 unauthenticated data see :ref:`xml-vulnerabilities`.
25
26
Georg Brandl8ec7f652007-08-15 14:28:01 +000027Each element has a number of properties associated with it:
28
29* a tag which is a string identifying what kind of data this element represents
30 (the element type, in other words).
31
32* a number of attributes, stored in a Python dictionary.
33
34* a text string.
35
36* an optional tail string.
37
38* a number of child elements, stored in a Python sequence
39
Florent Xicluna3e8c1892010-03-11 14:36:19 +000040To create an element instance, use the :class:`Element` constructor or the
41:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43The :class:`ElementTree` class can be used to wrap an element structure, and
44convert it from and to XML.
45
46A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
47
Georg Brandl39bd0592007-12-01 22:42:46 +000048See http://effbot.org/zone/element-index.htm for tutorials and links to other
Florent Xicluna583302c2010-03-13 17:56:19 +000049docs. Fredrik Lundh's page is also the location of the development version of
50the xml.etree.ElementTree.
51
52.. versionchanged:: 2.7
53 The ElementTree API is updated to 1.3. For more information, see
54 `Introducing ElementTree 1.3
55 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
56
Eli Bendersky6ee21872012-08-18 05:40:38 +030057Tutorial
58--------
59
60This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
61short). The goal is to demonstrate some of the building blocks and basic
62concepts of the module.
63
64XML tree and elements
65^^^^^^^^^^^^^^^^^^^^^
66
67XML is an inherently hierarchical data format, and the most natural way to
68represent it is with a tree. ``ET`` has two classes for this purpose -
69:class:`ElementTree` represents the whole XML document as a tree, and
70:class:`Element` represents a single node in this tree. Interactions with
71the whole document (reading and writing to/from files) are usually done
72on the :class:`ElementTree` level. Interactions with a single XML element
73and its sub-elements are done on the :class:`Element` level.
74
75.. _elementtree-parsing-xml:
76
77Parsing XML
78^^^^^^^^^^^
79
80We'll be using the following XML document as the sample data for this section:
81
82.. code-block:: xml
83
84 <?xml version="1.0"?>
85 <data>
86 <country name="Liechtenstein">
87 <rank>1</rank>
88 <year>2008</year>
89 <gdppc>141100</gdppc>
90 <neighbor name="Austria" direction="E"/>
91 <neighbor name="Switzerland" direction="W"/>
92 </country>
93 <country name="Singapore">
94 <rank>4</rank>
95 <year>2011</year>
96 <gdppc>59900</gdppc>
97 <neighbor name="Malaysia" direction="N"/>
98 </country>
99 <country name="Panama">
100 <rank>68</rank>
101 <year>2011</year>
102 <gdppc>13600</gdppc>
103 <neighbor name="Costa Rica" direction="W"/>
104 <neighbor name="Colombia" direction="E"/>
105 </country>
106 </data>
107
108We have a number of ways to import the data. Reading the file from disk::
109
110 import xml.etree.ElementTree as ET
111 tree = ET.parse('country_data.xml')
112 root = tree.getroot()
113
114Reading the data from a string::
115
116 root = ET.fromstring(country_data_as_string)
117
118:func:`fromstring` parses XML from a string directly into an :class:`Element`,
119which is the root element of the parsed tree. Other parsing functions may
120create an :class:`ElementTree`. Check the documentation to be sure.
121
122As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
123
124 >>> root.tag
125 'data'
126 >>> root.attrib
127 {}
128
129It also has children nodes over which we can iterate::
130
131 >>> for child in root:
132 ... print child.tag, child.attrib
133 ...
134 country {'name': 'Liechtenstein'}
135 country {'name': 'Singapore'}
136 country {'name': 'Panama'}
137
138Children are nested, and we can access specific child nodes by index::
139
140 >>> root[0][1].text
141 '2008'
142
143Finding interesting elements
144^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145
146:class:`Element` has some useful methods that help iterate recursively over all
147the sub-tree below it (its children, their children, and so on). For example,
148:meth:`Element.iter`::
149
150 >>> for neighbor in root.iter('neighbor'):
151 ... print neighbor.attrib
152 ...
153 {'name': 'Austria', 'direction': 'E'}
154 {'name': 'Switzerland', 'direction': 'W'}
155 {'name': 'Malaysia', 'direction': 'N'}
156 {'name': 'Costa Rica', 'direction': 'W'}
157 {'name': 'Colombia', 'direction': 'E'}
158
159:meth:`Element.findall` finds only elements with a tag which are direct
160children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandl5e7f16e2013-10-06 09:23:03 +0200161with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky6ee21872012-08-18 05:40:38 +0300162content. :meth:`Element.get` accesses the element's attributes::
163
164 >>> for country in root.findall('country'):
165 ... rank = country.find('rank').text
166 ... name = country.get('name')
167 ... print name, rank
168 ...
169 Liechtenstein 1
170 Singapore 4
171 Panama 68
172
173More sophisticated specification of which elements to look for is possible by
174using :ref:`XPath <elementtree-xpath>`.
175
176Modifying an XML File
177^^^^^^^^^^^^^^^^^^^^^
178
179:class:`ElementTree` provides a simple way to build XML documents and write them to files.
180The :meth:`ElementTree.write` method serves this purpose.
181
182Once created, an :class:`Element` object may be manipulated by directly changing
183its fields (such as :attr:`Element.text`), adding and modifying attributes
184(:meth:`Element.set` method), as well as adding new children (for example
185with :meth:`Element.append`).
186
187Let's say we want to add one to each country's rank, and add an ``updated``
188attribute to the rank element::
189
190 >>> for rank in root.iter('rank'):
191 ... new_rank = int(rank.text) + 1
192 ... rank.text = str(new_rank)
193 ... rank.set('updated', 'yes')
194 ...
195 >>> tree.write('output.xml')
196
197Our XML now looks like this:
198
199.. code-block:: xml
200
201 <?xml version="1.0"?>
202 <data>
203 <country name="Liechtenstein">
204 <rank updated="yes">2</rank>
205 <year>2008</year>
206 <gdppc>141100</gdppc>
207 <neighbor name="Austria" direction="E"/>
208 <neighbor name="Switzerland" direction="W"/>
209 </country>
210 <country name="Singapore">
211 <rank updated="yes">5</rank>
212 <year>2011</year>
213 <gdppc>59900</gdppc>
214 <neighbor name="Malaysia" direction="N"/>
215 </country>
216 <country name="Panama">
217 <rank updated="yes">69</rank>
218 <year>2011</year>
219 <gdppc>13600</gdppc>
220 <neighbor name="Costa Rica" direction="W"/>
221 <neighbor name="Colombia" direction="E"/>
222 </country>
223 </data>
224
225We can remove elements using :meth:`Element.remove`. Let's say we want to
226remove all countries with a rank higher than 50::
227
228 >>> for country in root.findall('country'):
229 ... rank = int(country.find('rank').text)
230 ... if rank > 50:
231 ... root.remove(country)
232 ...
233 >>> tree.write('output.xml')
234
235Our XML now looks like this:
236
237.. code-block:: xml
238
239 <?xml version="1.0"?>
240 <data>
241 <country name="Liechtenstein">
242 <rank updated="yes">2</rank>
243 <year>2008</year>
244 <gdppc>141100</gdppc>
245 <neighbor name="Austria" direction="E"/>
246 <neighbor name="Switzerland" direction="W"/>
247 </country>
248 <country name="Singapore">
249 <rank updated="yes">5</rank>
250 <year>2011</year>
251 <gdppc>59900</gdppc>
252 <neighbor name="Malaysia" direction="N"/>
253 </country>
254 </data>
255
256Building XML documents
257^^^^^^^^^^^^^^^^^^^^^^
258
259The :func:`SubElement` function also provides a convenient way to create new
260sub-elements for a given element::
261
262 >>> a = ET.Element('a')
263 >>> b = ET.SubElement(a, 'b')
264 >>> c = ET.SubElement(a, 'c')
265 >>> d = ET.SubElement(c, 'd')
266 >>> ET.dump(a)
267 <a><b /><c><d /></c></a>
268
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700269Parsing XML with Namespaces
270^^^^^^^^^^^^^^^^^^^^^^^^^^^
271
272If the XML input has `namespaces
273<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
274with prefixes in the form ``prefix:sometag`` get expanded to
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700275``{uri}sometag`` where the *prefix* is replaced by the full *URI*.
276Also, if there is a `default namespace
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700277<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
278that full URI gets prepended to all of the non-prefixed tags.
279
280Here is an XML example that incorporates two namespaces, one with the
281prefix "fictional" and the other serving as the default namespace:
282
283.. code-block:: xml
284
285 <?xml version="1.0"?>
286 <actors xmlns:fictional="http://characters.example.com"
287 xmlns="http://people.example.com">
288 <actor>
289 <name>John Cleese</name>
290 <fictional:character>Lancelot</fictional:character>
291 <fictional:character>Archie Leach</fictional:character>
292 </actor>
293 <actor>
294 <name>Eric Idle</name>
295 <fictional:character>Sir Robin</fictional:character>
296 <fictional:character>Gunther</fictional:character>
297 <fictional:character>Commander Clement</fictional:character>
298 </actor>
299 </actors>
300
301One way to search and explore this XML example is to manually add the
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700302URI to every tag or attribute in the xpath of a
303:meth:`~Element.find` or :meth:`~Element.findall`::
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700304
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700305 root = fromstring(xml_text)
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700306 for actor in root.findall('{http://people.example.com}actor'):
307 name = actor.find('{http://people.example.com}name')
308 print name.text
309 for char in actor.findall('{http://characters.example.com}character'):
310 print ' |-->', char.text
311
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700312
313A better way to search the namespaced XML example is to create a
314dictionary with your own prefixes and use those in the search functions::
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700315
316 ns = {'real_person': 'http://people.example.com',
317 'role': 'http://characters.example.com'}
318
319 for actor in root.findall('real_person:actor', ns):
320 name = actor.find('real_person:name', ns)
321 print name.text
322 for char in actor.findall('role:character', ns):
323 print ' |-->', char.text
324
325These two approaches both output::
326
327 John Cleese
328 |--> Lancelot
329 |--> Archie Leach
330 Eric Idle
331 |--> Sir Robin
332 |--> Gunther
333 |--> Commander Clement
334
335
Eli Bendersky6ee21872012-08-18 05:40:38 +0300336Additional resources
337^^^^^^^^^^^^^^^^^^^^
338
339See http://effbot.org/zone/element-index.htm for tutorials and links to other
340docs.
341
342.. _elementtree-xpath:
343
344XPath support
345-------------
346
347This module provides limited support for
348`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
349tree. The goal is to support a small subset of the abbreviated syntax; a full
350XPath engine is outside the scope of the module.
351
352Example
353^^^^^^^
354
355Here's an example that demonstrates some of the XPath capabilities of the
356module. We'll be using the ``countrydata`` XML document from the
357:ref:`Parsing XML <elementtree-parsing-xml>` section::
358
359 import xml.etree.ElementTree as ET
360
361 root = ET.fromstring(countrydata)
362
363 # Top-level elements
364 root.findall(".")
365
366 # All 'neighbor' grand-children of 'country' children of the top-level
367 # elements
368 root.findall("./country/neighbor")
369
370 # Nodes with name='Singapore' that have a 'year' child
371 root.findall(".//year/..[@name='Singapore']")
372
373 # 'year' nodes that are children of nodes with name='Singapore'
374 root.findall(".//*[@name='Singapore']/year")
375
376 # All 'neighbor' nodes that are the second child of their parent
377 root.findall(".//neighbor[2]")
378
379Supported XPath syntax
380^^^^^^^^^^^^^^^^^^^^^^
381
Georg Brandl44ea77b2013-03-28 13:28:44 +0100382.. tabularcolumns:: |l|L|
383
Eli Bendersky6ee21872012-08-18 05:40:38 +0300384+-----------------------+------------------------------------------------------+
385| Syntax | Meaning |
386+=======================+======================================================+
387| ``tag`` | Selects all child elements with the given tag. |
388| | For example, ``spam`` selects all child elements |
Raymond Hettinger37083492014-03-29 11:49:11 -0700389| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky6ee21872012-08-18 05:40:38 +0300390| | grandchildren named ``egg`` in all children named |
391| | ``spam``. |
392+-----------------------+------------------------------------------------------+
393| ``*`` | Selects all child elements. For example, ``*/egg`` |
394| | selects all grandchildren named ``egg``. |
395+-----------------------+------------------------------------------------------+
396| ``.`` | Selects the current node. This is mostly useful |
397| | at the beginning of the path, to indicate that it's |
398| | a relative path. |
399+-----------------------+------------------------------------------------------+
400| ``//`` | Selects all subelements, on all levels beneath the |
401| | current element. For example, ``.//egg`` selects |
402| | all ``egg`` elements in the entire tree. |
403+-----------------------+------------------------------------------------------+
404| ``..`` | Selects the parent element. |
405+-----------------------+------------------------------------------------------+
406| ``[@attrib]`` | Selects all elements that have the given attribute. |
407+-----------------------+------------------------------------------------------+
408| ``[@attrib='value']`` | Selects all elements for which the given attribute |
409| | has the given value. The value cannot contain |
410| | quotes. |
411+-----------------------+------------------------------------------------------+
412| ``[tag]`` | Selects all elements that have a child named |
413| | ``tag``. Only immediate children are supported. |
414+-----------------------+------------------------------------------------------+
Raymond Hettinger2baaba82015-03-30 20:46:54 -0700415| ``[tag='text']`` | Selects all elements that have a child named |
416| | ``tag`` whose complete text content, including |
417| | descendants, equals the given ``text``. |
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700418+-----------------------+------------------------------------------------------+
Eli Bendersky6ee21872012-08-18 05:40:38 +0300419| ``[position]`` | Selects all elements that are located at the given |
420| | position. The position can be either an integer |
421| | (1 is the first position), the expression ``last()`` |
422| | (for the last position), or a position relative to |
423| | the last position (e.g. ``last()-1``). |
424+-----------------------+------------------------------------------------------+
425
426Predicates (expressions within square brackets) must be preceded by a tag
427name, an asterisk, or another predicate. ``position`` predicates must be
428preceded by a tag name.
429
430Reference
431---------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000432
433.. _elementtree-functions:
434
435Functions
Eli Bendersky6ee21872012-08-18 05:40:38 +0300436^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000437
438
Florent Xiclunaa231e452010-03-13 20:30:15 +0000439.. function:: Comment(text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000440
Florent Xicluna583302c2010-03-13 17:56:19 +0000441 Comment element factory. This factory function creates a special element
442 that will be serialized as an XML comment by the standard serializer. The
443 comment string can be either a bytestring or a Unicode string. *text* is a
444 string containing the comment string. Returns an element instance
445 representing a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000446
447
448.. function:: dump(elem)
449
Florent Xicluna583302c2010-03-13 17:56:19 +0000450 Writes an element tree or element structure to sys.stdout. This function
451 should be used for debugging only.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000452
453 The exact output format is implementation dependent. In this version, it's
454 written as an ordinary XML file.
455
456 *elem* is an element tree or an individual element.
457
458
Georg Brandl8ec7f652007-08-15 14:28:01 +0000459.. function:: fromstring(text)
460
Florent Xicluna88db6f42010-03-14 01:22:09 +0000461 Parses an XML section from a string constant. Same as :func:`XML`. *text*
462 is a string containing XML data. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000463
464
Florent Xiclunaa231e452010-03-13 20:30:15 +0000465.. function:: fromstringlist(sequence, parser=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000466
Florent Xicluna583302c2010-03-13 17:56:19 +0000467 Parses an XML document from a sequence of string fragments. *sequence* is a
468 list or other sequence containing XML data fragments. *parser* is an
469 optional parser instance. If not given, the standard :class:`XMLParser`
470 parser is used. Returns an :class:`Element` instance.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000471
472 .. versionadded:: 2.7
473
474
Georg Brandl8ec7f652007-08-15 14:28:01 +0000475.. function:: iselement(element)
476
Florent Xicluna583302c2010-03-13 17:56:19 +0000477 Checks if an object appears to be a valid element object. *element* is an
478 element instance. Returns a true value if this is an element object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000479
480
Florent Xiclunaa231e452010-03-13 20:30:15 +0000481.. function:: iterparse(source, events=None, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000482
483 Parses an XML section into an element tree incrementally, and reports what's
Florent Xicluna583302c2010-03-13 17:56:19 +0000484 going on to the user. *source* is a filename or file object containing XML
485 data. *events* is a list of events to report back. If omitted, only "end"
486 events are reported. *parser* is an optional parser instance. If not
Eli Benderskyf4fbf242013-01-24 07:28:33 -0800487 given, the standard :class:`XMLParser` parser is used. *parser* is not
488 supported by ``cElementTree``. Returns an :term:`iterator` providing
489 ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000490
Georg Brandlfb222632009-01-01 11:46:51 +0000491 .. note::
492
493 :func:`iterparse` only guarantees that it has seen the ">"
494 character of a starting tag when it emits a "start" event, so the
495 attributes are defined, but the contents of the text and tail attributes
496 are undefined at that point. The same applies to the element children;
497 they may or may not be present.
498
499 If you need a fully populated element, look for "end" events instead.
500
Georg Brandl8ec7f652007-08-15 14:28:01 +0000501
Florent Xiclunaa231e452010-03-13 20:30:15 +0000502.. function:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000503
Florent Xicluna583302c2010-03-13 17:56:19 +0000504 Parses an XML section into an element tree. *source* is a filename or file
505 object containing XML data. *parser* is an optional parser instance. If
506 not given, the standard :class:`XMLParser` parser is used. Returns an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000507 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000508
509
Florent Xiclunaa231e452010-03-13 20:30:15 +0000510.. function:: ProcessingInstruction(target, text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000511
Florent Xicluna583302c2010-03-13 17:56:19 +0000512 PI element factory. This factory function creates a special element that
513 will be serialized as an XML processing instruction. *target* is a string
514 containing the PI target. *text* is a string containing the PI contents, if
515 given. Returns an element instance, representing a processing instruction.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000516
517
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000518.. function:: register_namespace(prefix, uri)
519
Florent Xicluna583302c2010-03-13 17:56:19 +0000520 Registers a namespace prefix. The registry is global, and any existing
521 mapping for either the given prefix or the namespace URI will be removed.
522 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
523 attributes in this namespace will be serialized with the given prefix, if at
524 all possible.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000525
526 .. versionadded:: 2.7
527
528
Florent Xicluna88db6f42010-03-14 01:22:09 +0000529.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000530
Florent Xicluna583302c2010-03-13 17:56:19 +0000531 Subelement factory. This function creates an element instance, and appends
532 it to an existing element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000533
Florent Xicluna583302c2010-03-13 17:56:19 +0000534 The element name, attribute names, and attribute values can be either
535 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
536 the subelement name. *attrib* is an optional dictionary, containing element
537 attributes. *extra* contains additional attributes, given as keyword
538 arguments. Returns an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000539
540
Florent Xicluna88db6f42010-03-14 01:22:09 +0000541.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000542
Florent Xicluna583302c2010-03-13 17:56:19 +0000543 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000544 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
545 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
Florent Xiclunaa231e452010-03-13 20:30:15 +0000546 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
547 containing the XML data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000548
549
Florent Xicluna88db6f42010-03-14 01:22:09 +0000550.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000551
Florent Xicluna583302c2010-03-13 17:56:19 +0000552 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000553 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
554 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
555 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
556 strings containing the XML data. It does not guarantee any specific
557 sequence, except that ``"".join(tostringlist(element)) ==
558 tostring(element)``.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000559
560 .. versionadded:: 2.7
561
562
Florent Xiclunaa231e452010-03-13 20:30:15 +0000563.. function:: XML(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000564
565 Parses an XML section from a string constant. This function can be used to
Florent Xicluna583302c2010-03-13 17:56:19 +0000566 embed "XML literals" in Python code. *text* is a string containing XML
567 data. *parser* is an optional parser instance. If not given, the standard
568 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000569
570
Florent Xiclunaa231e452010-03-13 20:30:15 +0000571.. function:: XMLID(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000572
573 Parses an XML section from a string constant, and also returns a dictionary
Florent Xicluna583302c2010-03-13 17:56:19 +0000574 which maps from element id:s to elements. *text* is a string containing XML
575 data. *parser* is an optional parser instance. If not given, the standard
576 :class:`XMLParser` parser is used. Returns a tuple containing an
577 :class:`Element` instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000578
579
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000580.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000581
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000582Element Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300583^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000584
Florent Xiclunaa231e452010-03-13 20:30:15 +0000585.. class:: Element(tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000586
Florent Xicluna583302c2010-03-13 17:56:19 +0000587 Element class. This class defines the Element interface, and provides a
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000588 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000589
Florent Xicluna583302c2010-03-13 17:56:19 +0000590 The element name, attribute names, and attribute values can be either
591 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
592 an optional dictionary, containing element attributes. *extra* contains
593 additional attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000594
595
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000596 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000597
Florent Xicluna583302c2010-03-13 17:56:19 +0000598 A string identifying what kind of data this element represents (the
599 element type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000600
601
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000602 .. attribute:: text
Georg Brandl8ec7f652007-08-15 14:28:01 +0000603
Florent Xicluna583302c2010-03-13 17:56:19 +0000604 The *text* attribute can be used to hold additional data associated with
605 the element. As the name implies this attribute is usually a string but
606 may be any application-specific object. If the element is created from
607 an XML file the attribute will contain any text found between the element
608 tags.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000609
610
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000611 .. attribute:: tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000612
Florent Xicluna583302c2010-03-13 17:56:19 +0000613 The *tail* attribute can be used to hold additional data associated with
614 the element. This attribute is usually a string but may be any
615 application-specific object. If the element is created from an XML file
616 the attribute will contain any text found after the element's end tag and
617 before the next tag.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000618
Georg Brandl8ec7f652007-08-15 14:28:01 +0000619
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000620 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000621
Florent Xicluna583302c2010-03-13 17:56:19 +0000622 A dictionary containing the element's attributes. Note that while the
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000623 *attrib* value is always a real mutable Python dictionary, an ElementTree
Florent Xicluna583302c2010-03-13 17:56:19 +0000624 implementation may choose to use another internal representation, and
625 create the dictionary only if someone asks for it. To take advantage of
626 such implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000627
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000628 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000629
630
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000631 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000632
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000633 Resets an element. This function removes all subelements, clears all
634 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000635
Georg Brandl8ec7f652007-08-15 14:28:01 +0000636
Florent Xiclunaa231e452010-03-13 20:30:15 +0000637 .. method:: get(key, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000638
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000639 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000640
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000641 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000642
643
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000644 .. method:: items()
645
Florent Xicluna583302c2010-03-13 17:56:19 +0000646 Returns the element attributes as a sequence of (name, value) pairs. The
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000647 attributes are returned in an arbitrary order.
648
649
650 .. method:: keys()
651
Florent Xicluna583302c2010-03-13 17:56:19 +0000652 Returns the elements attribute names as a list. The names are returned
653 in an arbitrary order.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000654
655
656 .. method:: set(key, value)
657
658 Set the attribute *key* on the element to *value*.
659
660 The following methods work on the element's children (subelements).
661
662
663 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000664
Florent Xicluna583302c2010-03-13 17:56:19 +0000665 Adds the element *subelement* to the end of this elements internal list
666 of subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000667
668
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000669 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000670
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000671 Appends *subelements* from a sequence object with zero or more elements.
672 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000673
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000674 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000675
676
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000677 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000678
Florent Xicluna583302c2010-03-13 17:56:19 +0000679 Finds the first subelement matching *match*. *match* may be a tag name
680 or path. Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000681
682
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000683 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000684
Florent Xicluna583302c2010-03-13 17:56:19 +0000685 Finds all matching subelements, by tag name or path. Returns a list
686 containing all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000687
688
Florent Xiclunaa231e452010-03-13 20:30:15 +0000689 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000690
Florent Xicluna583302c2010-03-13 17:56:19 +0000691 Finds text for the first subelement matching *match*. *match* may be
692 a tag name or path. Returns the text content of the first matching
693 element, or *default* if no element was found. Note that if the matching
694 element has no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000695
696
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000697 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000698
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000699 .. deprecated:: 2.7
700 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000701
702
Florent Xiclunaa231e452010-03-13 20:30:15 +0000703 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000704
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000705 .. deprecated:: 2.7
706 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000707
708
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000709 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000710
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000711 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000712
713
Florent Xiclunaa231e452010-03-13 20:30:15 +0000714 .. method:: iter(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000715
Florent Xicluna583302c2010-03-13 17:56:19 +0000716 Creates a tree :term:`iterator` with the current element as the root.
717 The iterator iterates over this element and all elements below it, in
718 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
719 elements whose tag equals *tag* are returned from the iterator. If the
720 tree structure is modified during iteration, the result is undefined.
721
Ezio Melottic54d97b2011-10-09 23:56:51 +0300722 .. versionadded:: 2.7
723
Florent Xicluna583302c2010-03-13 17:56:19 +0000724
725 .. method:: iterfind(match)
726
727 Finds all matching subelements, by tag name or path. Returns an iterable
728 yielding all matching elements in document order.
729
730 .. versionadded:: 2.7
731
732
733 .. method:: itertext()
734
735 Creates a text iterator. The iterator loops over this element and all
736 subelements, in document order, and returns all inner text.
737
738 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000739
740
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000741 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000742
Florent Xicluna583302c2010-03-13 17:56:19 +0000743 Creates a new element object of the same type as this element. Do not
744 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000745
746
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000747 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000748
Florent Xicluna583302c2010-03-13 17:56:19 +0000749 Removes *subelement* from the element. Unlike the find\* methods this
750 method compares elements based on the instance identity, not on tag value
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000751 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000752
Florent Xicluna583302c2010-03-13 17:56:19 +0000753 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka7653e262013-08-29 10:34:23 +0300754 for working with subelements: :meth:`~object.__delitem__`,
755 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
756 :meth:`~object.__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000757
Florent Xicluna583302c2010-03-13 17:56:19 +0000758 Caution: Elements with no subelements will test as ``False``. This behavior
759 will change in future versions. Use specific ``len(elem)`` or ``elem is
760 None`` test instead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000761
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000762 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000763
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000764 if not element: # careful!
765 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000766
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000767 if element is None:
768 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000769
770
771.. _elementtree-elementtree-objects:
772
773ElementTree Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300774^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000775
776
Florent Xiclunaa231e452010-03-13 20:30:15 +0000777.. class:: ElementTree(element=None, file=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000778
Florent Xicluna583302c2010-03-13 17:56:19 +0000779 ElementTree wrapper class. This class represents an entire element
780 hierarchy, and adds some extra support for serialization to and from
781 standard XML.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000782
Florent Xicluna583302c2010-03-13 17:56:19 +0000783 *element* is the root element. The tree is initialized with the contents
784 of the XML *file* if given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000785
786
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000787 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000788
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000789 Replaces the root element for this tree. This discards the current
790 contents of the tree, and replaces it with the given element. Use with
Florent Xicluna583302c2010-03-13 17:56:19 +0000791 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000792
793
Florent Xicluna583302c2010-03-13 17:56:19 +0000794 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000795
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700796 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000797
798
Florent Xicluna583302c2010-03-13 17:56:19 +0000799 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000800
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700801 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000802
803
Florent Xiclunaa231e452010-03-13 20:30:15 +0000804 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000805
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700806 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000807
808
Florent Xiclunaa231e452010-03-13 20:30:15 +0000809 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000810
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000811 .. deprecated:: 2.7
812 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000813
814
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000815 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000816
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000817 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000818
819
Florent Xiclunaa231e452010-03-13 20:30:15 +0000820 .. method:: iter(tag=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000821
822 Creates and returns a tree iterator for the root element. The iterator
Florent Xicluna583302c2010-03-13 17:56:19 +0000823 loops over all elements in this tree, in section order. *tag* is the tag
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000824 to look for (default is to return all elements)
825
826
Florent Xicluna583302c2010-03-13 17:56:19 +0000827 .. method:: iterfind(match)
828
829 Finds all matching subelements, by tag name or path. Same as
830 getroot().iterfind(match). Returns an iterable yielding all matching
831 elements in document order.
832
833 .. versionadded:: 2.7
834
835
Florent Xiclunaa231e452010-03-13 20:30:15 +0000836 .. method:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000837
Florent Xicluna583302c2010-03-13 17:56:19 +0000838 Loads an external XML section into this element tree. *source* is a file
839 name or file object. *parser* is an optional parser instance. If not
840 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000841 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000842
843
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200844 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
845 default_namespace=None, method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000846
Florent Xicluna583302c2010-03-13 17:56:19 +0000847 Writes the element tree to a file, as XML. *file* is a file name, or a
848 file object opened for writing. *encoding* [1]_ is the output encoding
849 (default is US-ASCII). *xml_declaration* controls if an XML declaration
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000850 should be added to the file. Use False for never, True for always, None
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200851 for only if not US-ASCII or UTF-8 (default is None). *default_namespace*
852 sets the default XML namespace (for "xmlns"). *method* is either
Florent Xiclunaa231e452010-03-13 20:30:15 +0000853 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
854 encoded string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000855
Georg Brandl39bd0592007-12-01 22:42:46 +0000856This is the XML file that is going to be manipulated::
857
858 <html>
859 <head>
860 <title>Example page</title>
861 </head>
862 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000863 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000864 or <a href="http://example.com/">example.com</a>.</p>
865 </body>
866 </html>
867
868Example of changing the attribute "target" of every link in first paragraph::
869
870 >>> from xml.etree.ElementTree import ElementTree
871 >>> tree = ElementTree()
872 >>> tree.parse("index.xhtml")
Florent Xicluna583302c2010-03-13 17:56:19 +0000873 <Element 'html' at 0xb77e6fac>
Georg Brandl39bd0592007-12-01 22:42:46 +0000874 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
875 >>> p
Florent Xicluna583302c2010-03-13 17:56:19 +0000876 <Element 'p' at 0xb77ec26c>
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000877 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000878 >>> links
Florent Xicluna583302c2010-03-13 17:56:19 +0000879 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000880 >>> for i in links: # Iterates through all found links
881 ... i.attrib["target"] = "blank"
882 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000883
884.. _elementtree-qname-objects:
885
886QName Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300887^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000888
889
Florent Xiclunaa231e452010-03-13 20:30:15 +0000890.. class:: QName(text_or_uri, tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000891
Florent Xicluna583302c2010-03-13 17:56:19 +0000892 QName wrapper. This can be used to wrap a QName attribute value, in order
893 to get proper namespace handling on output. *text_or_uri* is a string
894 containing the QName value, in the form {uri}local, or, if the tag argument
895 is given, the URI part of a QName. If *tag* is given, the first argument is
896 interpreted as an URI, and this argument is interpreted as a local name.
897 :class:`QName` instances are opaque.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000898
899
900.. _elementtree-treebuilder-objects:
901
902TreeBuilder Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300903^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000904
905
Florent Xiclunaa231e452010-03-13 20:30:15 +0000906.. class:: TreeBuilder(element_factory=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000907
Florent Xicluna583302c2010-03-13 17:56:19 +0000908 Generic element structure builder. This builder converts a sequence of
909 start, data, and end method calls to a well-formed element structure. You
910 can use this class to build an element structure using a custom XML parser,
911 or a parser for some other XML-like format. The *element_factory* is called
912 to create new :class:`Element` instances when given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000913
914
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000915 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000916
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000917 Flushes the builder buffers, and returns the toplevel document
Florent Xicluna583302c2010-03-13 17:56:19 +0000918 element. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000919
920
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000921 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000922
Florent Xicluna583302c2010-03-13 17:56:19 +0000923 Adds text to the current element. *data* is a string. This should be
924 either a bytestring, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000925
926
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000927 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000928
Florent Xicluna583302c2010-03-13 17:56:19 +0000929 Closes the current element. *tag* is the element name. Returns the
930 closed element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000931
932
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000933 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000934
Florent Xicluna583302c2010-03-13 17:56:19 +0000935 Opens a new element. *tag* is the element name. *attrs* is a dictionary
936 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000937
938
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000939 In addition, a custom :class:`TreeBuilder` object can provide the
940 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000941
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000942 .. method:: doctype(name, pubid, system)
943
Florent Xicluna583302c2010-03-13 17:56:19 +0000944 Handles a doctype declaration. *name* is the doctype name. *pubid* is
945 the public identifier. *system* is the system identifier. This method
946 does not exist on the default :class:`TreeBuilder` class.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000947
948 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000949
950
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000951.. _elementtree-xmlparser-objects:
952
953XMLParser Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300954^^^^^^^^^^^^^^^^^
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000955
956
Florent Xiclunaa231e452010-03-13 20:30:15 +0000957.. class:: XMLParser(html=0, target=None, encoding=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000958
Florent Xicluna583302c2010-03-13 17:56:19 +0000959 :class:`Element` structure builder for XML source data, based on the expat
960 parser. *html* are predefined HTML entities. This flag is not supported by
961 the current implementation. *target* is the target object. If omitted, the
962 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
963 is optional. If given, the value overrides the encoding specified in the
964 XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000965
966
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000967 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000968
Florent Xicluna583302c2010-03-13 17:56:19 +0000969 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000970
971
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000972 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000973
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000974 .. deprecated:: 2.7
975 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
976 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000977
978
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000979 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000980
Florent Xicluna583302c2010-03-13 17:56:19 +0000981 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000982
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000983:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000984for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna583302c2010-03-13 17:56:19 +0000985and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000986calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000987:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000988This is an example of counting the maximum depth of an XML file::
989
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000990 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000991 >>> class MaxDepth: # The target object of the parser
992 ... maxDepth = 0
993 ... depth = 0
994 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000995 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +0000996 ... if self.depth > self.maxDepth:
997 ... self.maxDepth = self.depth
998 ... def end(self, tag): # Called for each closing tag.
999 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001000 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +00001001 ... pass # We do not need to do anything with data.
1002 ... def close(self): # Called when all data has been parsed.
1003 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001004 ...
Georg Brandl39bd0592007-12-01 22:42:46 +00001005 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +00001006 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +00001007 >>> exampleXml = """
1008 ... <a>
1009 ... <b>
1010 ... </b>
1011 ... <b>
1012 ... <c>
1013 ... <d>
1014 ... </d>
1015 ... </c>
1016 ... </b>
1017 ... </a>"""
1018 >>> parser.feed(exampleXml)
1019 >>> parser.close()
1020 4
Mark Summerfield43da35d2008-03-17 08:28:15 +00001021
1022
1023.. rubric:: Footnotes
1024
1025.. [#] The encoding string included in XML output should conform to the
Florent Xicluna583302c2010-03-13 17:56:19 +00001026 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1027 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl0f5d6c02014-10-29 10:57:37 +01001028 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.