blob: fb9772db5c78ee5a61409b6cb00c88b276d0b352 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`xml.etree.ElementTree` --- The ElementTree XML API
2========================================================
3
4.. module:: xml.etree.ElementTree
5 :synopsis: Implementation of the ElementTree API.
6.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
7
8
9.. versionadded:: 2.5
10
Éric Araujo29a0b572011-08-19 02:14:03 +020011**Source code:** :source:`Lib/xml/etree/ElementTree.py`
12
13--------------
14
Florent Xicluna583302c2010-03-13 17:56:19 +000015The :class:`Element` type is a flexible container object, designed to store
16hierarchical data structures in memory. The type can be described as a cross
17between a list and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +000018
Christian Heimes23790b42013-03-26 17:53:05 +010019
20.. warning::
21
22 The :mod:`xml.etree.ElementTree` module is not secure against
23 maliciously constructed data. If you need to parse untrusted or
24 unauthenticated data see :ref:`xml-vulnerabilities`.
25
26
Georg Brandl8ec7f652007-08-15 14:28:01 +000027Each element has a number of properties associated with it:
28
29* a tag which is a string identifying what kind of data this element represents
30 (the element type, in other words).
31
32* a number of attributes, stored in a Python dictionary.
33
34* a text string.
35
36* an optional tail string.
37
38* a number of child elements, stored in a Python sequence
39
Florent Xicluna3e8c1892010-03-11 14:36:19 +000040To create an element instance, use the :class:`Element` constructor or the
41:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000042
43The :class:`ElementTree` class can be used to wrap an element structure, and
44convert it from and to XML.
45
46A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
47
Georg Brandl39bd0592007-12-01 22:42:46 +000048See http://effbot.org/zone/element-index.htm for tutorials and links to other
Florent Xicluna583302c2010-03-13 17:56:19 +000049docs. Fredrik Lundh's page is also the location of the development version of
50the xml.etree.ElementTree.
51
52.. versionchanged:: 2.7
53 The ElementTree API is updated to 1.3. For more information, see
54 `Introducing ElementTree 1.3
55 <http://effbot.org/zone/elementtree-13-intro.htm>`_.
56
Eli Bendersky6ee21872012-08-18 05:40:38 +030057Tutorial
58--------
59
60This is a short tutorial for using :mod:`xml.etree.ElementTree` (``ET`` in
61short). The goal is to demonstrate some of the building blocks and basic
62concepts of the module.
63
64XML tree and elements
65^^^^^^^^^^^^^^^^^^^^^
66
67XML is an inherently hierarchical data format, and the most natural way to
68represent it is with a tree. ``ET`` has two classes for this purpose -
69:class:`ElementTree` represents the whole XML document as a tree, and
70:class:`Element` represents a single node in this tree. Interactions with
71the whole document (reading and writing to/from files) are usually done
72on the :class:`ElementTree` level. Interactions with a single XML element
73and its sub-elements are done on the :class:`Element` level.
74
75.. _elementtree-parsing-xml:
76
77Parsing XML
78^^^^^^^^^^^
79
80We'll be using the following XML document as the sample data for this section:
81
82.. code-block:: xml
83
84 <?xml version="1.0"?>
85 <data>
86 <country name="Liechtenstein">
87 <rank>1</rank>
88 <year>2008</year>
89 <gdppc>141100</gdppc>
90 <neighbor name="Austria" direction="E"/>
91 <neighbor name="Switzerland" direction="W"/>
92 </country>
93 <country name="Singapore">
94 <rank>4</rank>
95 <year>2011</year>
96 <gdppc>59900</gdppc>
97 <neighbor name="Malaysia" direction="N"/>
98 </country>
99 <country name="Panama">
100 <rank>68</rank>
101 <year>2011</year>
102 <gdppc>13600</gdppc>
103 <neighbor name="Costa Rica" direction="W"/>
104 <neighbor name="Colombia" direction="E"/>
105 </country>
106 </data>
107
108We have a number of ways to import the data. Reading the file from disk::
109
110 import xml.etree.ElementTree as ET
111 tree = ET.parse('country_data.xml')
112 root = tree.getroot()
113
114Reading the data from a string::
115
116 root = ET.fromstring(country_data_as_string)
117
118:func:`fromstring` parses XML from a string directly into an :class:`Element`,
119which is the root element of the parsed tree. Other parsing functions may
120create an :class:`ElementTree`. Check the documentation to be sure.
121
122As an :class:`Element`, ``root`` has a tag and a dictionary of attributes::
123
124 >>> root.tag
125 'data'
126 >>> root.attrib
127 {}
128
129It also has children nodes over which we can iterate::
130
131 >>> for child in root:
132 ... print child.tag, child.attrib
133 ...
134 country {'name': 'Liechtenstein'}
135 country {'name': 'Singapore'}
136 country {'name': 'Panama'}
137
138Children are nested, and we can access specific child nodes by index::
139
140 >>> root[0][1].text
141 '2008'
142
143Finding interesting elements
144^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145
146:class:`Element` has some useful methods that help iterate recursively over all
147the sub-tree below it (its children, their children, and so on). For example,
148:meth:`Element.iter`::
149
150 >>> for neighbor in root.iter('neighbor'):
151 ... print neighbor.attrib
152 ...
153 {'name': 'Austria', 'direction': 'E'}
154 {'name': 'Switzerland', 'direction': 'W'}
155 {'name': 'Malaysia', 'direction': 'N'}
156 {'name': 'Costa Rica', 'direction': 'W'}
157 {'name': 'Colombia', 'direction': 'E'}
158
159:meth:`Element.findall` finds only elements with a tag which are direct
160children of the current element. :meth:`Element.find` finds the *first* child
Georg Brandl5e7f16e2013-10-06 09:23:03 +0200161with a particular tag, and :attr:`Element.text` accesses the element's text
Eli Bendersky6ee21872012-08-18 05:40:38 +0300162content. :meth:`Element.get` accesses the element's attributes::
163
164 >>> for country in root.findall('country'):
165 ... rank = country.find('rank').text
166 ... name = country.get('name')
167 ... print name, rank
168 ...
169 Liechtenstein 1
170 Singapore 4
171 Panama 68
172
173More sophisticated specification of which elements to look for is possible by
174using :ref:`XPath <elementtree-xpath>`.
175
176Modifying an XML File
177^^^^^^^^^^^^^^^^^^^^^
178
179:class:`ElementTree` provides a simple way to build XML documents and write them to files.
180The :meth:`ElementTree.write` method serves this purpose.
181
182Once created, an :class:`Element` object may be manipulated by directly changing
183its fields (such as :attr:`Element.text`), adding and modifying attributes
184(:meth:`Element.set` method), as well as adding new children (for example
185with :meth:`Element.append`).
186
187Let's say we want to add one to each country's rank, and add an ``updated``
188attribute to the rank element::
189
190 >>> for rank in root.iter('rank'):
191 ... new_rank = int(rank.text) + 1
192 ... rank.text = str(new_rank)
193 ... rank.set('updated', 'yes')
194 ...
195 >>> tree.write('output.xml')
196
197Our XML now looks like this:
198
199.. code-block:: xml
200
201 <?xml version="1.0"?>
202 <data>
203 <country name="Liechtenstein">
204 <rank updated="yes">2</rank>
205 <year>2008</year>
206 <gdppc>141100</gdppc>
207 <neighbor name="Austria" direction="E"/>
208 <neighbor name="Switzerland" direction="W"/>
209 </country>
210 <country name="Singapore">
211 <rank updated="yes">5</rank>
212 <year>2011</year>
213 <gdppc>59900</gdppc>
214 <neighbor name="Malaysia" direction="N"/>
215 </country>
216 <country name="Panama">
217 <rank updated="yes">69</rank>
218 <year>2011</year>
219 <gdppc>13600</gdppc>
220 <neighbor name="Costa Rica" direction="W"/>
221 <neighbor name="Colombia" direction="E"/>
222 </country>
223 </data>
224
225We can remove elements using :meth:`Element.remove`. Let's say we want to
226remove all countries with a rank higher than 50::
227
228 >>> for country in root.findall('country'):
229 ... rank = int(country.find('rank').text)
230 ... if rank > 50:
231 ... root.remove(country)
232 ...
233 >>> tree.write('output.xml')
234
235Our XML now looks like this:
236
237.. code-block:: xml
238
239 <?xml version="1.0"?>
240 <data>
241 <country name="Liechtenstein">
242 <rank updated="yes">2</rank>
243 <year>2008</year>
244 <gdppc>141100</gdppc>
245 <neighbor name="Austria" direction="E"/>
246 <neighbor name="Switzerland" direction="W"/>
247 </country>
248 <country name="Singapore">
249 <rank updated="yes">5</rank>
250 <year>2011</year>
251 <gdppc>59900</gdppc>
252 <neighbor name="Malaysia" direction="N"/>
253 </country>
254 </data>
255
256Building XML documents
257^^^^^^^^^^^^^^^^^^^^^^
258
259The :func:`SubElement` function also provides a convenient way to create new
260sub-elements for a given element::
261
262 >>> a = ET.Element('a')
263 >>> b = ET.SubElement(a, 'b')
264 >>> c = ET.SubElement(a, 'c')
265 >>> d = ET.SubElement(c, 'd')
266 >>> ET.dump(a)
267 <a><b /><c><d /></c></a>
268
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700269Parsing XML with Namespaces
270^^^^^^^^^^^^^^^^^^^^^^^^^^^
271
272If the XML input has `namespaces
273<https://en.wikipedia.org/wiki/XML_namespace>`__, tags and attributes
274with prefixes in the form ``prefix:sometag`` get expanded to
275``{uri}tag`` where the *prefix* is replaced by the full *URI*. Also,
276if there is a `default namespace
277<http://www.w3.org/TR/2006/REC-xml-names-20060816/#defaulting>`__,
278that full URI gets prepended to all of the non-prefixed tags.
279
280Here is an XML example that incorporates two namespaces, one with the
281prefix "fictional" and the other serving as the default namespace:
282
283.. code-block:: xml
284
285 <?xml version="1.0"?>
286 <actors xmlns:fictional="http://characters.example.com"
287 xmlns="http://people.example.com">
288 <actor>
289 <name>John Cleese</name>
290 <fictional:character>Lancelot</fictional:character>
291 <fictional:character>Archie Leach</fictional:character>
292 </actor>
293 <actor>
294 <name>Eric Idle</name>
295 <fictional:character>Sir Robin</fictional:character>
296 <fictional:character>Gunther</fictional:character>
297 <fictional:character>Commander Clement</fictional:character>
298 </actor>
299 </actors>
300
301One way to search and explore this XML example is to manually add the
302URI to every tag or attribute in the xpath of a *find()* or *findall()*::
303
304 root = from_string(xml_text)
305 for actor in root.findall('{http://people.example.com}actor'):
306 name = actor.find('{http://people.example.com}name')
307 print name.text
308 for char in actor.findall('{http://characters.example.com}character'):
309 print ' |-->', char.text
310
311Another way to search the namespaced XML example is to create a
312dictionary with your own prefixes and use those in the search::
313
314 ns = {'real_person': 'http://people.example.com',
315 'role': 'http://characters.example.com'}
316
317 for actor in root.findall('real_person:actor', ns):
318 name = actor.find('real_person:name', ns)
319 print name.text
320 for char in actor.findall('role:character', ns):
321 print ' |-->', char.text
322
323These two approaches both output::
324
325 John Cleese
326 |--> Lancelot
327 |--> Archie Leach
328 Eric Idle
329 |--> Sir Robin
330 |--> Gunther
331 |--> Commander Clement
332
333
Eli Bendersky6ee21872012-08-18 05:40:38 +0300334Additional resources
335^^^^^^^^^^^^^^^^^^^^
336
337See http://effbot.org/zone/element-index.htm for tutorials and links to other
338docs.
339
340.. _elementtree-xpath:
341
342XPath support
343-------------
344
345This module provides limited support for
346`XPath expressions <http://www.w3.org/TR/xpath>`_ for locating elements in a
347tree. The goal is to support a small subset of the abbreviated syntax; a full
348XPath engine is outside the scope of the module.
349
350Example
351^^^^^^^
352
353Here's an example that demonstrates some of the XPath capabilities of the
354module. We'll be using the ``countrydata`` XML document from the
355:ref:`Parsing XML <elementtree-parsing-xml>` section::
356
357 import xml.etree.ElementTree as ET
358
359 root = ET.fromstring(countrydata)
360
361 # Top-level elements
362 root.findall(".")
363
364 # All 'neighbor' grand-children of 'country' children of the top-level
365 # elements
366 root.findall("./country/neighbor")
367
368 # Nodes with name='Singapore' that have a 'year' child
369 root.findall(".//year/..[@name='Singapore']")
370
371 # 'year' nodes that are children of nodes with name='Singapore'
372 root.findall(".//*[@name='Singapore']/year")
373
374 # All 'neighbor' nodes that are the second child of their parent
375 root.findall(".//neighbor[2]")
376
377Supported XPath syntax
378^^^^^^^^^^^^^^^^^^^^^^
379
Georg Brandl44ea77b2013-03-28 13:28:44 +0100380.. tabularcolumns:: |l|L|
381
Eli Bendersky6ee21872012-08-18 05:40:38 +0300382+-----------------------+------------------------------------------------------+
383| Syntax | Meaning |
384+=======================+======================================================+
385| ``tag`` | Selects all child elements with the given tag. |
386| | For example, ``spam`` selects all child elements |
Raymond Hettinger37083492014-03-29 11:49:11 -0700387| | named ``spam``, and ``spam/egg`` selects all |
Eli Bendersky6ee21872012-08-18 05:40:38 +0300388| | grandchildren named ``egg`` in all children named |
389| | ``spam``. |
390+-----------------------+------------------------------------------------------+
391| ``*`` | Selects all child elements. For example, ``*/egg`` |
392| | selects all grandchildren named ``egg``. |
393+-----------------------+------------------------------------------------------+
394| ``.`` | Selects the current node. This is mostly useful |
395| | at the beginning of the path, to indicate that it's |
396| | a relative path. |
397+-----------------------+------------------------------------------------------+
398| ``//`` | Selects all subelements, on all levels beneath the |
399| | current element. For example, ``.//egg`` selects |
400| | all ``egg`` elements in the entire tree. |
401+-----------------------+------------------------------------------------------+
402| ``..`` | Selects the parent element. |
403+-----------------------+------------------------------------------------------+
404| ``[@attrib]`` | Selects all elements that have the given attribute. |
405+-----------------------+------------------------------------------------------+
406| ``[@attrib='value']`` | Selects all elements for which the given attribute |
407| | has the given value. The value cannot contain |
408| | quotes. |
409+-----------------------+------------------------------------------------------+
410| ``[tag]`` | Selects all elements that have a child named |
411| | ``tag``. Only immediate children are supported. |
412+-----------------------+------------------------------------------------------+
Raymond Hettinger510a6e92015-03-22 15:31:28 -0700413| ``[tag=text]`` | Selects all elements that have a child named |
414| | ``tag`` that includes the given ``text``. |
415+-----------------------+------------------------------------------------------+
Eli Bendersky6ee21872012-08-18 05:40:38 +0300416| ``[position]`` | Selects all elements that are located at the given |
417| | position. The position can be either an integer |
418| | (1 is the first position), the expression ``last()`` |
419| | (for the last position), or a position relative to |
420| | the last position (e.g. ``last()-1``). |
421+-----------------------+------------------------------------------------------+
422
423Predicates (expressions within square brackets) must be preceded by a tag
424name, an asterisk, or another predicate. ``position`` predicates must be
425preceded by a tag name.
426
427Reference
428---------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000429
430.. _elementtree-functions:
431
432Functions
Eli Bendersky6ee21872012-08-18 05:40:38 +0300433^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000434
435
Florent Xiclunaa231e452010-03-13 20:30:15 +0000436.. function:: Comment(text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000437
Florent Xicluna583302c2010-03-13 17:56:19 +0000438 Comment element factory. This factory function creates a special element
439 that will be serialized as an XML comment by the standard serializer. The
440 comment string can be either a bytestring or a Unicode string. *text* is a
441 string containing the comment string. Returns an element instance
442 representing a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000443
444
445.. function:: dump(elem)
446
Florent Xicluna583302c2010-03-13 17:56:19 +0000447 Writes an element tree or element structure to sys.stdout. This function
448 should be used for debugging only.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000449
450 The exact output format is implementation dependent. In this version, it's
451 written as an ordinary XML file.
452
453 *elem* is an element tree or an individual element.
454
455
Georg Brandl8ec7f652007-08-15 14:28:01 +0000456.. function:: fromstring(text)
457
Florent Xicluna88db6f42010-03-14 01:22:09 +0000458 Parses an XML section from a string constant. Same as :func:`XML`. *text*
459 is a string containing XML data. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000460
461
Florent Xiclunaa231e452010-03-13 20:30:15 +0000462.. function:: fromstringlist(sequence, parser=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000463
Florent Xicluna583302c2010-03-13 17:56:19 +0000464 Parses an XML document from a sequence of string fragments. *sequence* is a
465 list or other sequence containing XML data fragments. *parser* is an
466 optional parser instance. If not given, the standard :class:`XMLParser`
467 parser is used. Returns an :class:`Element` instance.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000468
469 .. versionadded:: 2.7
470
471
Georg Brandl8ec7f652007-08-15 14:28:01 +0000472.. function:: iselement(element)
473
Florent Xicluna583302c2010-03-13 17:56:19 +0000474 Checks if an object appears to be a valid element object. *element* is an
475 element instance. Returns a true value if this is an element object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000476
477
Florent Xiclunaa231e452010-03-13 20:30:15 +0000478.. function:: iterparse(source, events=None, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000479
480 Parses an XML section into an element tree incrementally, and reports what's
Florent Xicluna583302c2010-03-13 17:56:19 +0000481 going on to the user. *source* is a filename or file object containing XML
482 data. *events* is a list of events to report back. If omitted, only "end"
483 events are reported. *parser* is an optional parser instance. If not
Eli Benderskyf4fbf242013-01-24 07:28:33 -0800484 given, the standard :class:`XMLParser` parser is used. *parser* is not
485 supported by ``cElementTree``. Returns an :term:`iterator` providing
486 ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000487
Georg Brandlfb222632009-01-01 11:46:51 +0000488 .. note::
489
490 :func:`iterparse` only guarantees that it has seen the ">"
491 character of a starting tag when it emits a "start" event, so the
492 attributes are defined, but the contents of the text and tail attributes
493 are undefined at that point. The same applies to the element children;
494 they may or may not be present.
495
496 If you need a fully populated element, look for "end" events instead.
497
Georg Brandl8ec7f652007-08-15 14:28:01 +0000498
Florent Xiclunaa231e452010-03-13 20:30:15 +0000499.. function:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000500
Florent Xicluna583302c2010-03-13 17:56:19 +0000501 Parses an XML section into an element tree. *source* is a filename or file
502 object containing XML data. *parser* is an optional parser instance. If
503 not given, the standard :class:`XMLParser` parser is used. Returns an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000504 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000505
506
Florent Xiclunaa231e452010-03-13 20:30:15 +0000507.. function:: ProcessingInstruction(target, text=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000508
Florent Xicluna583302c2010-03-13 17:56:19 +0000509 PI element factory. This factory function creates a special element that
510 will be serialized as an XML processing instruction. *target* is a string
511 containing the PI target. *text* is a string containing the PI contents, if
512 given. Returns an element instance, representing a processing instruction.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000513
514
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000515.. function:: register_namespace(prefix, uri)
516
Florent Xicluna583302c2010-03-13 17:56:19 +0000517 Registers a namespace prefix. The registry is global, and any existing
518 mapping for either the given prefix or the namespace URI will be removed.
519 *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and
520 attributes in this namespace will be serialized with the given prefix, if at
521 all possible.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000522
523 .. versionadded:: 2.7
524
525
Florent Xicluna88db6f42010-03-14 01:22:09 +0000526.. function:: SubElement(parent, tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000527
Florent Xicluna583302c2010-03-13 17:56:19 +0000528 Subelement factory. This function creates an element instance, and appends
529 it to an existing element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000530
Florent Xicluna583302c2010-03-13 17:56:19 +0000531 The element name, attribute names, and attribute values can be either
532 bytestrings or Unicode strings. *parent* is the parent element. *tag* is
533 the subelement name. *attrib* is an optional dictionary, containing element
534 attributes. *extra* contains additional attributes, given as keyword
535 arguments. Returns an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000536
537
Florent Xicluna88db6f42010-03-14 01:22:09 +0000538.. function:: tostring(element, encoding="us-ascii", method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000539
Florent Xicluna583302c2010-03-13 17:56:19 +0000540 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000541 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
542 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
Florent Xiclunaa231e452010-03-13 20:30:15 +0000543 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an encoded string
544 containing the XML data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000545
546
Florent Xicluna88db6f42010-03-14 01:22:09 +0000547.. function:: tostringlist(element, encoding="us-ascii", method="xml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000548
Florent Xicluna583302c2010-03-13 17:56:19 +0000549 Generates a string representation of an XML element, including all
Florent Xicluna88db6f42010-03-14 01:22:09 +0000550 subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is
551 the output encoding (default is US-ASCII). *method* is either ``"xml"``,
552 ``"html"`` or ``"text"`` (default is ``"xml"``). Returns a list of encoded
553 strings containing the XML data. It does not guarantee any specific
554 sequence, except that ``"".join(tostringlist(element)) ==
555 tostring(element)``.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000556
557 .. versionadded:: 2.7
558
559
Florent Xiclunaa231e452010-03-13 20:30:15 +0000560.. function:: XML(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000561
562 Parses an XML section from a string constant. This function can be used to
Florent Xicluna583302c2010-03-13 17:56:19 +0000563 embed "XML literals" in Python code. *text* is a string containing XML
564 data. *parser* is an optional parser instance. If not given, the standard
565 :class:`XMLParser` parser is used. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000566
567
Florent Xiclunaa231e452010-03-13 20:30:15 +0000568.. function:: XMLID(text, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000569
570 Parses an XML section from a string constant, and also returns a dictionary
Florent Xicluna583302c2010-03-13 17:56:19 +0000571 which maps from element id:s to elements. *text* is a string containing XML
572 data. *parser* is an optional parser instance. If not given, the standard
573 :class:`XMLParser` parser is used. Returns a tuple containing an
574 :class:`Element` instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000575
576
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000577.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000578
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000579Element Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300580^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000581
Florent Xiclunaa231e452010-03-13 20:30:15 +0000582.. class:: Element(tag, attrib={}, **extra)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000583
Florent Xicluna583302c2010-03-13 17:56:19 +0000584 Element class. This class defines the Element interface, and provides a
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000585 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000586
Florent Xicluna583302c2010-03-13 17:56:19 +0000587 The element name, attribute names, and attribute values can be either
588 bytestrings or Unicode strings. *tag* is the element name. *attrib* is
589 an optional dictionary, containing element attributes. *extra* contains
590 additional attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000591
592
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000593 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000594
Florent Xicluna583302c2010-03-13 17:56:19 +0000595 A string identifying what kind of data this element represents (the
596 element type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000597
598
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000599 .. attribute:: text
Georg Brandl8ec7f652007-08-15 14:28:01 +0000600
Florent Xicluna583302c2010-03-13 17:56:19 +0000601 The *text* attribute can be used to hold additional data associated with
602 the element. As the name implies this attribute is usually a string but
603 may be any application-specific object. If the element is created from
604 an XML file the attribute will contain any text found between the element
605 tags.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000606
607
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000608 .. attribute:: tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000609
Florent Xicluna583302c2010-03-13 17:56:19 +0000610 The *tail* attribute can be used to hold additional data associated with
611 the element. This attribute is usually a string but may be any
612 application-specific object. If the element is created from an XML file
613 the attribute will contain any text found after the element's end tag and
614 before the next tag.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000615
Georg Brandl8ec7f652007-08-15 14:28:01 +0000616
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000617 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000618
Florent Xicluna583302c2010-03-13 17:56:19 +0000619 A dictionary containing the element's attributes. Note that while the
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000620 *attrib* value is always a real mutable Python dictionary, an ElementTree
Florent Xicluna583302c2010-03-13 17:56:19 +0000621 implementation may choose to use another internal representation, and
622 create the dictionary only if someone asks for it. To take advantage of
623 such implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000624
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000625 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000626
627
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000628 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000629
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000630 Resets an element. This function removes all subelements, clears all
631 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000632
Georg Brandl8ec7f652007-08-15 14:28:01 +0000633
Florent Xiclunaa231e452010-03-13 20:30:15 +0000634 .. method:: get(key, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000635
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000636 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000637
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000638 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000639
640
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000641 .. method:: items()
642
Florent Xicluna583302c2010-03-13 17:56:19 +0000643 Returns the element attributes as a sequence of (name, value) pairs. The
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000644 attributes are returned in an arbitrary order.
645
646
647 .. method:: keys()
648
Florent Xicluna583302c2010-03-13 17:56:19 +0000649 Returns the elements attribute names as a list. The names are returned
650 in an arbitrary order.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000651
652
653 .. method:: set(key, value)
654
655 Set the attribute *key* on the element to *value*.
656
657 The following methods work on the element's children (subelements).
658
659
660 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000661
Florent Xicluna583302c2010-03-13 17:56:19 +0000662 Adds the element *subelement* to the end of this elements internal list
663 of subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000664
665
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000666 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000667
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000668 Appends *subelements* from a sequence object with zero or more elements.
669 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000670
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000671 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000672
673
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000674 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000675
Florent Xicluna583302c2010-03-13 17:56:19 +0000676 Finds the first subelement matching *match*. *match* may be a tag name
677 or path. Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000678
679
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000680 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000681
Florent Xicluna583302c2010-03-13 17:56:19 +0000682 Finds all matching subelements, by tag name or path. Returns a list
683 containing all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000684
685
Florent Xiclunaa231e452010-03-13 20:30:15 +0000686 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000687
Florent Xicluna583302c2010-03-13 17:56:19 +0000688 Finds text for the first subelement matching *match*. *match* may be
689 a tag name or path. Returns the text content of the first matching
690 element, or *default* if no element was found. Note that if the matching
691 element has no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000692
693
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000694 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000695
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000696 .. deprecated:: 2.7
697 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000698
699
Florent Xiclunaa231e452010-03-13 20:30:15 +0000700 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000701
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000702 .. deprecated:: 2.7
703 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000704
705
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000706 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000707
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000708 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000709
710
Florent Xiclunaa231e452010-03-13 20:30:15 +0000711 .. method:: iter(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000712
Florent Xicluna583302c2010-03-13 17:56:19 +0000713 Creates a tree :term:`iterator` with the current element as the root.
714 The iterator iterates over this element and all elements below it, in
715 document (depth first) order. If *tag* is not ``None`` or ``'*'``, only
716 elements whose tag equals *tag* are returned from the iterator. If the
717 tree structure is modified during iteration, the result is undefined.
718
Ezio Melottic54d97b2011-10-09 23:56:51 +0300719 .. versionadded:: 2.7
720
Florent Xicluna583302c2010-03-13 17:56:19 +0000721
722 .. method:: iterfind(match)
723
724 Finds all matching subelements, by tag name or path. Returns an iterable
725 yielding all matching elements in document order.
726
727 .. versionadded:: 2.7
728
729
730 .. method:: itertext()
731
732 Creates a text iterator. The iterator loops over this element and all
733 subelements, in document order, and returns all inner text.
734
735 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000736
737
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000738 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000739
Florent Xicluna583302c2010-03-13 17:56:19 +0000740 Creates a new element object of the same type as this element. Do not
741 call this method, use the :func:`SubElement` factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000742
743
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000744 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000745
Florent Xicluna583302c2010-03-13 17:56:19 +0000746 Removes *subelement* from the element. Unlike the find\* methods this
747 method compares elements based on the instance identity, not on tag value
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000748 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000749
Florent Xicluna583302c2010-03-13 17:56:19 +0000750 :class:`Element` objects also support the following sequence type methods
Serhiy Storchaka7653e262013-08-29 10:34:23 +0300751 for working with subelements: :meth:`~object.__delitem__`,
752 :meth:`~object.__getitem__`, :meth:`~object.__setitem__`,
753 :meth:`~object.__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000754
Florent Xicluna583302c2010-03-13 17:56:19 +0000755 Caution: Elements with no subelements will test as ``False``. This behavior
756 will change in future versions. Use specific ``len(elem)`` or ``elem is
757 None`` test instead. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000758
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000759 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000760
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000761 if not element: # careful!
762 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000763
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000764 if element is None:
765 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000766
767
768.. _elementtree-elementtree-objects:
769
770ElementTree Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300771^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000772
773
Florent Xiclunaa231e452010-03-13 20:30:15 +0000774.. class:: ElementTree(element=None, file=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000775
Florent Xicluna583302c2010-03-13 17:56:19 +0000776 ElementTree wrapper class. This class represents an entire element
777 hierarchy, and adds some extra support for serialization to and from
778 standard XML.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000779
Florent Xicluna583302c2010-03-13 17:56:19 +0000780 *element* is the root element. The tree is initialized with the contents
781 of the XML *file* if given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000782
783
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000784 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000785
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000786 Replaces the root element for this tree. This discards the current
787 contents of the tree, and replaces it with the given element. Use with
Florent Xicluna583302c2010-03-13 17:56:19 +0000788 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000789
790
Florent Xicluna583302c2010-03-13 17:56:19 +0000791 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000792
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700793 Same as :meth:`Element.find`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000794
795
Florent Xicluna583302c2010-03-13 17:56:19 +0000796 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000797
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700798 Same as :meth:`Element.findall`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000799
800
Florent Xiclunaa231e452010-03-13 20:30:15 +0000801 .. method:: findtext(match, default=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000802
Eli Bendersky981c3bd2013-03-12 06:08:04 -0700803 Same as :meth:`Element.findtext`, starting at the root of the tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000804
805
Florent Xiclunaa231e452010-03-13 20:30:15 +0000806 .. method:: getiterator(tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000807
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000808 .. deprecated:: 2.7
809 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000810
811
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000812 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000813
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000814 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000815
816
Florent Xiclunaa231e452010-03-13 20:30:15 +0000817 .. method:: iter(tag=None)
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000818
819 Creates and returns a tree iterator for the root element. The iterator
Florent Xicluna583302c2010-03-13 17:56:19 +0000820 loops over all elements in this tree, in section order. *tag* is the tag
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000821 to look for (default is to return all elements)
822
823
Florent Xicluna583302c2010-03-13 17:56:19 +0000824 .. method:: iterfind(match)
825
826 Finds all matching subelements, by tag name or path. Same as
827 getroot().iterfind(match). Returns an iterable yielding all matching
828 elements in document order.
829
830 .. versionadded:: 2.7
831
832
Florent Xiclunaa231e452010-03-13 20:30:15 +0000833 .. method:: parse(source, parser=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000834
Florent Xicluna583302c2010-03-13 17:56:19 +0000835 Loads an external XML section into this element tree. *source* is a file
836 name or file object. *parser* is an optional parser instance. If not
837 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000838 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000839
840
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200841 .. method:: write(file, encoding="us-ascii", xml_declaration=None, \
842 default_namespace=None, method="xml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000843
Florent Xicluna583302c2010-03-13 17:56:19 +0000844 Writes the element tree to a file, as XML. *file* is a file name, or a
845 file object opened for writing. *encoding* [1]_ is the output encoding
846 (default is US-ASCII). *xml_declaration* controls if an XML declaration
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000847 should be added to the file. Use False for never, True for always, None
Serhiy Storchaka3d4a02a2013-01-13 21:57:14 +0200848 for only if not US-ASCII or UTF-8 (default is None). *default_namespace*
849 sets the default XML namespace (for "xmlns"). *method* is either
Florent Xiclunaa231e452010-03-13 20:30:15 +0000850 ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). Returns an
851 encoded string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000852
Georg Brandl39bd0592007-12-01 22:42:46 +0000853This is the XML file that is going to be manipulated::
854
855 <html>
856 <head>
857 <title>Example page</title>
858 </head>
859 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000860 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000861 or <a href="http://example.com/">example.com</a>.</p>
862 </body>
863 </html>
864
865Example of changing the attribute "target" of every link in first paragraph::
866
867 >>> from xml.etree.ElementTree import ElementTree
868 >>> tree = ElementTree()
869 >>> tree.parse("index.xhtml")
Florent Xicluna583302c2010-03-13 17:56:19 +0000870 <Element 'html' at 0xb77e6fac>
Georg Brandl39bd0592007-12-01 22:42:46 +0000871 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
872 >>> p
Florent Xicluna583302c2010-03-13 17:56:19 +0000873 <Element 'p' at 0xb77ec26c>
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000874 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000875 >>> links
Florent Xicluna583302c2010-03-13 17:56:19 +0000876 [<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000877 >>> for i in links: # Iterates through all found links
878 ... i.attrib["target"] = "blank"
879 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000880
881.. _elementtree-qname-objects:
882
883QName Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300884^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000885
886
Florent Xiclunaa231e452010-03-13 20:30:15 +0000887.. class:: QName(text_or_uri, tag=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000888
Florent Xicluna583302c2010-03-13 17:56:19 +0000889 QName wrapper. This can be used to wrap a QName attribute value, in order
890 to get proper namespace handling on output. *text_or_uri* is a string
891 containing the QName value, in the form {uri}local, or, if the tag argument
892 is given, the URI part of a QName. If *tag* is given, the first argument is
893 interpreted as an URI, and this argument is interpreted as a local name.
894 :class:`QName` instances are opaque.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000895
896
897.. _elementtree-treebuilder-objects:
898
899TreeBuilder Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300900^^^^^^^^^^^^^^^^^^^
Georg Brandl8ec7f652007-08-15 14:28:01 +0000901
902
Florent Xiclunaa231e452010-03-13 20:30:15 +0000903.. class:: TreeBuilder(element_factory=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000904
Florent Xicluna583302c2010-03-13 17:56:19 +0000905 Generic element structure builder. This builder converts a sequence of
906 start, data, and end method calls to a well-formed element structure. You
907 can use this class to build an element structure using a custom XML parser,
908 or a parser for some other XML-like format. The *element_factory* is called
909 to create new :class:`Element` instances when given.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000910
911
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000912 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000913
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000914 Flushes the builder buffers, and returns the toplevel document
Florent Xicluna583302c2010-03-13 17:56:19 +0000915 element. Returns an :class:`Element` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000916
917
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000918 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000919
Florent Xicluna583302c2010-03-13 17:56:19 +0000920 Adds text to the current element. *data* is a string. This should be
921 either a bytestring, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000922
923
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000924 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000925
Florent Xicluna583302c2010-03-13 17:56:19 +0000926 Closes the current element. *tag* is the element name. Returns the
927 closed element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000928
929
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000930 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000931
Florent Xicluna583302c2010-03-13 17:56:19 +0000932 Opens a new element. *tag* is the element name. *attrs* is a dictionary
933 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000934
935
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000936 In addition, a custom :class:`TreeBuilder` object can provide the
937 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000938
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000939 .. method:: doctype(name, pubid, system)
940
Florent Xicluna583302c2010-03-13 17:56:19 +0000941 Handles a doctype declaration. *name* is the doctype name. *pubid* is
942 the public identifier. *system* is the system identifier. This method
943 does not exist on the default :class:`TreeBuilder` class.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000944
945 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000946
947
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000948.. _elementtree-xmlparser-objects:
949
950XMLParser Objects
Eli Bendersky6ee21872012-08-18 05:40:38 +0300951^^^^^^^^^^^^^^^^^
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000952
953
Florent Xiclunaa231e452010-03-13 20:30:15 +0000954.. class:: XMLParser(html=0, target=None, encoding=None)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000955
Florent Xicluna583302c2010-03-13 17:56:19 +0000956 :class:`Element` structure builder for XML source data, based on the expat
957 parser. *html* are predefined HTML entities. This flag is not supported by
958 the current implementation. *target* is the target object. If omitted, the
959 builder uses an instance of the standard TreeBuilder class. *encoding* [1]_
960 is optional. If given, the value overrides the encoding specified in the
961 XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000962
963
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000964 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000965
Florent Xicluna583302c2010-03-13 17:56:19 +0000966 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000967
968
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000969 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000970
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000971 .. deprecated:: 2.7
972 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
973 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000974
975
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000976 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000977
Florent Xicluna583302c2010-03-13 17:56:19 +0000978 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000979
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000980:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000981for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna583302c2010-03-13 17:56:19 +0000982and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000983calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000984:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000985This is an example of counting the maximum depth of an XML file::
986
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000987 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000988 >>> class MaxDepth: # The target object of the parser
989 ... maxDepth = 0
990 ... depth = 0
991 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000992 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +0000993 ... if self.depth > self.maxDepth:
994 ... self.maxDepth = self.depth
995 ... def end(self, tag): # Called for each closing tag.
996 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000997 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +0000998 ... pass # We do not need to do anything with data.
999 ... def close(self): # Called when all data has been parsed.
1000 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +00001001 ...
Georg Brandl39bd0592007-12-01 22:42:46 +00001002 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +00001003 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +00001004 >>> exampleXml = """
1005 ... <a>
1006 ... <b>
1007 ... </b>
1008 ... <b>
1009 ... <c>
1010 ... <d>
1011 ... </d>
1012 ... </c>
1013 ... </b>
1014 ... </a>"""
1015 >>> parser.feed(exampleXml)
1016 >>> parser.close()
1017 4
Mark Summerfield43da35d2008-03-17 08:28:15 +00001018
1019
1020.. rubric:: Footnotes
1021
1022.. [#] The encoding string included in XML output should conform to the
Florent Xicluna583302c2010-03-13 17:56:19 +00001023 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1024 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl0f5d6c02014-10-29 10:57:37 +01001025 and http://www.iana.org/assignments/character-sets/character-sets.xhtml.