blob: 30167730ff0ae6d6ccdc9f24bc7a54a9137e203a [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`xml.etree.ElementTree` --- The ElementTree XML API
3========================================================
4
5.. module:: xml.etree.ElementTree
6 :synopsis: Implementation of the ElementTree API.
7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
9
10.. versionadded:: 2.5
11
12The Element type is a flexible container object, designed to store hierarchical
13data structures in memory. The type can be described as a cross between a list
14and a dictionary.
15
16Each element has a number of properties associated with it:
17
18* a tag which is a string identifying what kind of data this element represents
19 (the element type, in other words).
20
21* a number of attributes, stored in a Python dictionary.
22
23* a text string.
24
25* an optional tail string.
26
27* a number of child elements, stored in a Python sequence
28
Florent Xicluna3e8c1892010-03-11 14:36:19 +000029To create an element instance, use the :class:`Element` constructor or the
30:func:`SubElement` factory function.
Georg Brandl8ec7f652007-08-15 14:28:01 +000031
32The :class:`ElementTree` class can be used to wrap an element structure, and
33convert it from and to XML.
34
35A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
36
Georg Brandl39bd0592007-12-01 22:42:46 +000037See http://effbot.org/zone/element-index.htm for tutorials and links to other
Georg Brandlc62ef8b2009-01-03 20:55:06 +000038docs. Fredrik Lundh's page is also the location of the development version of the
Georg Brandl39bd0592007-12-01 22:42:46 +000039xml.etree.ElementTree.
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
41.. _elementtree-functions:
42
43Functions
44---------
45
46
47.. function:: Comment([text])
48
49 Comment element factory. This factory function creates a special element that
Florent Xicluna3e8c1892010-03-11 14:36:19 +000050 will be serialized as an XML comment by the standard serializer. The comment
51 string can be either an 8-bit ASCII string or a Unicode string. *text* is a
52 string containing the comment string. Returns an element instance representing
53 a comment.
Georg Brandl8ec7f652007-08-15 14:28:01 +000054
55
56.. function:: dump(elem)
57
58 Writes an element tree or element structure to sys.stdout. This function should
59 be used for debugging only.
60
61 The exact output format is implementation dependent. In this version, it's
62 written as an ordinary XML file.
63
64 *elem* is an element tree or an individual element.
65
66
Georg Brandl8ec7f652007-08-15 14:28:01 +000067.. function:: fromstring(text)
68
69 Parses an XML section from a string constant. Same as XML. *text* is a string
70 containing XML data. Returns an Element instance.
71
72
Florent Xicluna3e8c1892010-03-11 14:36:19 +000073.. function:: fromstringlist(sequence[, parser])
74
75 Parses an XML document from a sequence of string fragments. *sequence* is a list
76 or other sequence containing XML data fragments. *parser* is an optional parser
77 instance. If not given, the standard :class:`XMLParser` parser is used.
78 Returns an Element instance.
79
80 .. versionadded:: 2.7
81
82
Georg Brandl8ec7f652007-08-15 14:28:01 +000083.. function:: iselement(element)
84
85 Checks if an object appears to be a valid element object. *element* is an
86 element instance. Returns a true value if this is an element object.
87
88
Florent Xicluna3e8c1892010-03-11 14:36:19 +000089.. function:: iterparse(source[, events[, parser]])
Georg Brandl8ec7f652007-08-15 14:28:01 +000090
91 Parses an XML section into an element tree incrementally, and reports what's
92 going on to the user. *source* is a filename or file object containing XML data.
93 *events* is a list of events to report back. If omitted, only "end" events are
Florent Xicluna3e8c1892010-03-11 14:36:19 +000094 reported. *parser* is an optional parser instance. If not given, the standard
95 :class:`XMLParser` parser is used. Returns an :term:`iterator`
96 providing ``(event, elem)`` pairs.
Georg Brandl8ec7f652007-08-15 14:28:01 +000097
Georg Brandlfb222632009-01-01 11:46:51 +000098 .. note::
99
100 :func:`iterparse` only guarantees that it has seen the ">"
101 character of a starting tag when it emits a "start" event, so the
102 attributes are defined, but the contents of the text and tail attributes
103 are undefined at that point. The same applies to the element children;
104 they may or may not be present.
105
106 If you need a fully populated element, look for "end" events instead.
107
Georg Brandl8ec7f652007-08-15 14:28:01 +0000108
109.. function:: parse(source[, parser])
110
111 Parses an XML section into an element tree. *source* is a filename or file
112 object containing XML data. *parser* is an optional parser instance. If not
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000113 given, the standard :class:`XMLParser` parser is used. Returns an
114 :class:`ElementTree` instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000115
116
117.. function:: ProcessingInstruction(target[, text])
118
119 PI element factory. This factory function creates a special element that will
120 be serialized as an XML processing instruction. *target* is a string containing
121 the PI target. *text* is a string containing the PI contents, if given. Returns
122 an element instance, representing a processing instruction.
123
124
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000125.. function:: register_namespace(prefix, uri)
126
127 Registers a namespace prefix. The registry is global, and any existing mapping
128 for either the given prefix or the namespace URI will be removed. *prefix* is a
129 namespace prefix. *uri* is a namespace uri. Tags and attributes in this namespace
130 will be serialized with the given prefix, if at all possible.
131
132 .. versionadded:: 2.7
133
134
Georg Brandl8ec7f652007-08-15 14:28:01 +0000135.. function:: SubElement(parent, tag[, attrib[, **extra]])
136
137 Subelement factory. This function creates an element instance, and appends it
138 to an existing element.
139
140 The element name, attribute names, and attribute values can be either 8-bit
141 ASCII strings or Unicode strings. *parent* is the parent element. *tag* is the
142 subelement name. *attrib* is an optional dictionary, containing element
143 attributes. *extra* contains additional attributes, given as keyword arguments.
144 Returns an element instance.
145
146
147.. function:: tostring(element[, encoding])
148
149 Generates a string representation of an XML element, including all subelements.
150 *element* is an Element instance. *encoding* is the output encoding (default is
151 US-ASCII). Returns an encoded string containing the XML data.
152
153
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000154.. function:: tostringlist(element[, encoding])
155
156 Generates a string representation of an XML element, including all subelements.
157 *element* is an Element instance. *encoding* is the output encoding (default is
158 US-ASCII). Returns a sequence object containing the XML data.
159
160 .. versionadded:: 2.7
161
162
163.. function:: XML(text[, parser])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000164
165 Parses an XML section from a string constant. This function can be used to
166 embed "XML literals" in Python code. *text* is a string containing XML data.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000167 *parser* is an optional parser instance. If not given, the standard
168 :class:`XMLParser` parser is used. Returns an Element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000169
170
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000171.. function:: XMLID(text[, parser])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000172
173 Parses an XML section from a string constant, and also returns a dictionary
174 which maps from element id:s to elements. *text* is a string containing XML
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000175 data. *parser* is an optional parser instance. If not given, the standard
176 :class:`XMLParser` parser is used. Returns a tuple containing an Element
177 instance and a dictionary.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000178
179
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000180.. _elementtree-element-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000181
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000182Element Objects
183---------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000184
Georg Brandl8ec7f652007-08-15 14:28:01 +0000185
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000186.. class:: Element(tag[, attrib[, **extra]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000187
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000188 Element class. This class defines the Element interface, and provides a
189 reference implementation of this interface.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000190
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000191 The element name, attribute names, and attribute values can be either 8-bit
192 ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an
193 optional dictionary, containing element attributes. *extra* contains additional
194 attributes, given as keyword arguments.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000195
196
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000197 .. attribute:: tag
Georg Brandl8ec7f652007-08-15 14:28:01 +0000198
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000199 A string identifying what kind of data this element represents (the element
200 type, in other words).
Georg Brandl8ec7f652007-08-15 14:28:01 +0000201
202
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000203 .. attribute:: text
Georg Brandl8ec7f652007-08-15 14:28:01 +0000204
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000205 The *text* attribute can be used to hold additional data associated with the
206 element. As the name implies this attribute is usually a string but may be
207 any application-specific object. If the element is created from an XML file
208 the attribute will contain any text found between the element tags.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000209
210
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000211 .. attribute:: tail
Georg Brandl8ec7f652007-08-15 14:28:01 +0000212
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000213 The *tail* attribute can be used to hold additional data associated with the
214 element. This attribute is usually a string but may be any
215 application-specific object. If the element is created from an XML file the
216 attribute will contain any text found after the element's end tag and before
217 the next tag.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000218
Georg Brandl8ec7f652007-08-15 14:28:01 +0000219
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000220 .. attribute:: attrib
Georg Brandl8ec7f652007-08-15 14:28:01 +0000221
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000222 A dictionary containing the element's attributes. Note that while the
223 *attrib* value is always a real mutable Python dictionary, an ElementTree
224 implementation may choose to use another internal representation, and create
225 the dictionary only if someone asks for it. To take advantage of such
226 implementations, use the dictionary methods below whenever possible.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000227
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000228 The following dictionary-like methods work on the element attributes.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000229
230
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000231 .. method:: clear()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000232
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000233 Resets an element. This function removes all subelements, clears all
234 attributes, and sets the text and tail attributes to None.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000235
Georg Brandl8ec7f652007-08-15 14:28:01 +0000236
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000237 .. method:: get(key[, default])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000238
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000239 Gets the element attribute named *key*.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000240
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000241 Returns the attribute value, or *default* if the attribute was not found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000242
243
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000244 .. method:: items()
245
246 Returns the element attributes as a sequence of (name, value) pairs. The
247 attributes are returned in an arbitrary order.
248
249
250 .. method:: keys()
251
252 Returns the elements attribute names as a list. The names are returned in an
253 arbitrary order.
254
255
256 .. method:: set(key, value)
257
258 Set the attribute *key* on the element to *value*.
259
260 The following methods work on the element's children (subelements).
261
262
263 .. method:: append(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000264
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000265 Adds the element *subelement* to the end of this elements internal list of
266 subelements.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000267
268
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000269 .. method:: extend(subelements)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000270
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000271 Appends *subelements* from a sequence object with zero or more elements.
272 Raises :exc:`AssertionError` if a subelement is not a valid object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000273
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000274 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000275
276
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000277 .. method:: find(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000278
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000279 Finds the first subelement matching *match*. *match* may be a tag name or path.
280 Returns an element instance or ``None``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000281
282
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000283 .. method:: findall(match)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000284
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000285 Finds all subelements matching *match*. *match* may be a tag name or path.
286 Returns an iterable yielding all matching elements in document order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000287
288
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000289 .. method:: findtext(condition[, default])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000290
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000291 Finds text for the first subelement matching *condition*. *condition* may be
292 a tag name or path. Returns the text content of the first matching element,
293 or *default* if no element was found. Note that if the matching element has
294 no text content an empty string is returned.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000295
296
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000297 .. method:: getchildren()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000298
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000299 .. deprecated:: 2.7
300 Use ``list(elem)`` or iteration.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000301
302
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000303 .. method:: getiterator([tag])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000304
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000305 .. deprecated:: 2.7
306 Use method :meth:`Element.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000307
308
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000309 .. method:: insert(index, element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000310
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000311 Inserts a subelement at the given position in this element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000312
313
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000314 .. method:: iter([tag])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000315
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000316 Creates a tree iterator with the current element as the root. The iterator
317 iterates over this element and all elements below it, in document (depth
318 first) order. If *tag* is not ``None`` or ``'*'``, only elements whose tag
319 equals *tag* are returned from the iterator. If the tree structure is
320 modified during iteration, the result is undefined.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000321
322
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000323 .. method:: makeelement(tag, attrib)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000324
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000325 Creates a new element object of the same type as this element. Do not call
326 this method, use the SubElement factory function instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000327
328
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000329 .. method:: remove(subelement)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000330
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000331 Removes *subelement* from the element. Unlike the findXYZ methods this
332 method compares elements based on the instance identity, not on tag value
333 or contents.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000334
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000335 Element objects also support the following sequence type methods for working
336 with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
337 :meth:`__len__`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000338
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000339 Caution: Because Element objects do not define a :meth:`__nonzero__` method,
340 elements with no subelements will test as ``False``. ::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000341
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000342 element = root.find('foo')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000343
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000344 if not element: # careful!
345 print "element not found, or element has no subelements"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000346
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000347 if element is None:
348 print "element not found"
Georg Brandl8ec7f652007-08-15 14:28:01 +0000349
350
351.. _elementtree-elementtree-objects:
352
353ElementTree Objects
354-------------------
355
356
357.. class:: ElementTree([element,] [file])
358
359 ElementTree wrapper class. This class represents an entire element hierarchy,
360 and adds some extra support for serialization to and from standard XML.
361
362 *element* is the root element. The tree is initialized with the contents of the
363 XML *file* if given.
364
365
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000366 .. method:: _setroot(element)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000367
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000368 Replaces the root element for this tree. This discards the current
369 contents of the tree, and replaces it with the given element. Use with
370 care. *element* is an element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000371
372
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000373 .. method:: find(path)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000374
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000375 Finds the first toplevel element with given tag. Same as
376 getroot().find(path). *path* is the element to look for. Returns the
377 first matching element, or ``None`` if no element was found.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000378
379
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000380 .. method:: findall(path)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000381
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000382 Finds all toplevel elements with the given tag. Same as
383 getroot().findall(path). *path* is the element to look for. Returns a
384 list or :term:`iterator` containing all matching elements, in document
385 order.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000386
387
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000388 .. method:: findtext(path[, default])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000389
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000390 Finds the element text for the first toplevel element with given tag.
391 Same as getroot().findtext(path). *path* is the toplevel element to look
392 for. *default* is the value to return if the element was not
393 found. Returns the text content of the first matching element, or the
394 default value no element was found. Note that if the element has is
395 found, but has no text content, this method returns an empty string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000396
397
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000398 .. method:: getiterator([tag])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000399
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000400 .. deprecated:: 2.7
401 Use method :meth:`ElementTree.iter` instead.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000402
403
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000404 .. method:: getroot()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000405
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000406 Returns the root element for this tree.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000407
408
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000409 .. method:: iter([tag])
410
411 Creates and returns a tree iterator for the root element. The iterator
412 loops over all elements in this tree, in section order. *tag* is the tag
413 to look for (default is to return all elements)
414
415
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000416 .. method:: parse(source[, parser])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000417
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000418 Loads an external XML section into this element tree. *source* is a file
419 name or file object. *parser* is an optional parser instance. If not
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000420 given, the standard XMLParser parser is used. Returns the section
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000421 root element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000422
423
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000424 .. method:: write(file[, encoding[, xml_declaration]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000425
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000426 Writes the element tree to a file, as XML. *file* is a file name, or a
427 file object opened for writing. *encoding* [1]_ is the output encoding
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000428 (default is US-ASCII). *xml_declaration* controls if an XML declaration
429 should be added to the file. Use False for never, True for always, None
430 for only if not US-ASCII or UTF-8. None is default.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000431
Georg Brandl39bd0592007-12-01 22:42:46 +0000432This is the XML file that is going to be manipulated::
433
434 <html>
435 <head>
436 <title>Example page</title>
437 </head>
438 <body>
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000439 <p>Moved to <a href="http://example.org/">example.org</a>
Georg Brandl39bd0592007-12-01 22:42:46 +0000440 or <a href="http://example.com/">example.com</a>.</p>
441 </body>
442 </html>
443
444Example of changing the attribute "target" of every link in first paragraph::
445
446 >>> from xml.etree.ElementTree import ElementTree
447 >>> tree = ElementTree()
448 >>> tree.parse("index.xhtml")
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000449 <Element 'html' at b7d3f1ec>
Georg Brandl39bd0592007-12-01 22:42:46 +0000450 >>> p = tree.find("body/p") # Finds first occurrence of tag p in body
451 >>> p
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000452 <Element 'p' at 8416e0c>
453 >>> links = list(p.iter("a")) # Returns list of all links
Georg Brandl39bd0592007-12-01 22:42:46 +0000454 >>> links
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000455 [<Element 'a' at b7d4f9ec>, <Element 'a' at b7d4fb0c>]
Georg Brandl39bd0592007-12-01 22:42:46 +0000456 >>> for i in links: # Iterates through all found links
457 ... i.attrib["target"] = "blank"
458 >>> tree.write("output.xhtml")
Georg Brandl8ec7f652007-08-15 14:28:01 +0000459
460.. _elementtree-qname-objects:
461
462QName Objects
463-------------
464
465
466.. class:: QName(text_or_uri[, tag])
467
468 QName wrapper. This can be used to wrap a QName attribute value, in order to
469 get proper namespace handling on output. *text_or_uri* is a string containing
470 the QName value, in the form {uri}local, or, if the tag argument is given, the
471 URI part of a QName. If *tag* is given, the first argument is interpreted as an
472 URI, and this argument is interpreted as a local name. :class:`QName` instances
473 are opaque.
474
475
476.. _elementtree-treebuilder-objects:
477
478TreeBuilder Objects
479-------------------
480
481
482.. class:: TreeBuilder([element_factory])
483
484 Generic element structure builder. This builder converts a sequence of start,
485 data, and end method calls to a well-formed element structure. You can use this
486 class to build an element structure using a custom XML parser, or a parser for
487 some other XML-like format. The *element_factory* is called to create new
488 Element instances when given.
489
490
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000491 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000492
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000493 Flushes the builder buffers, and returns the toplevel document
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000494 element. Returns an Element instance.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000495
496
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000497 .. method:: data(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000498
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000499 Adds text to the current element. *data* is a string. This should be
500 either an 8-bit string containing ASCII text, or a Unicode string.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000501
502
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000503 .. method:: end(tag)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000504
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000505 Closes the current element. *tag* is the element name. Returns the closed
506 element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000507
508
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000509 .. method:: start(tag, attrs)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000510
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000511 Opens a new element. *tag* is the element name. *attrs* is a dictionary
512 containing element attributes. Returns the opened element.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000513
514
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000515 In addition, a custom :class:`TreeBuilder` object can provide the
516 following method:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000517
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000518 .. method:: doctype(name, pubid, system)
519
520 Handles a doctype declaration. *name* is the doctype name. *pubid* is the
521 public identifier. *system* is the system identifier. This method does not
522 exist on the default :class:`TreeBuilder` class.
523
524 .. versionadded:: 2.7
Georg Brandl8ec7f652007-08-15 14:28:01 +0000525
526
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000527.. _elementtree-xmlparser-objects:
528
529XMLParser Objects
530-----------------
531
532
533.. class:: XMLParser([html [, target[, encoding]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000534
535 Element structure builder for XML source data, based on the expat parser. *html*
536 are predefined HTML entities. This flag is not supported by the current
537 implementation. *target* is the target object. If omitted, the builder uses an
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000538 instance of the standard TreeBuilder class. *encoding* [1]_ is optional.
539 If given, the value overrides the encoding specified in the XML file.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000540
541
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000542 .. method:: close()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000543
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000544 Finishes feeding data to the parser. Returns an element structure.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000545
546
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000547 .. method:: doctype(name, pubid, system)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000548
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000549 .. deprecated:: 2.7
550 Define the :meth:`TreeBuilder.doctype` method on a custom TreeBuilder
551 target.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000552
553
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000554 .. method:: feed(data)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000555
Benjamin Petersonc7b05922008-04-25 01:29:10 +0000556 Feeds data to the parser. *data* is encoded data.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000557
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000558:meth:`XMLParser.feed` calls *target*\'s :meth:`start` method
Georg Brandl39bd0592007-12-01 22:42:46 +0000559for each opening tag, its :meth:`end` method for each closing tag,
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000560and data is processed by method :meth:`data`. :meth:`XMLParser.close`
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000561calls *target*\'s method :meth:`close`.
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000562:class:`XMLParser` can be used not only for building a tree structure.
Georg Brandl39bd0592007-12-01 22:42:46 +0000563This is an example of counting the maximum depth of an XML file::
564
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000565 >>> from xml.etree.ElementTree import XMLParser
Georg Brandl39bd0592007-12-01 22:42:46 +0000566 >>> class MaxDepth: # The target object of the parser
567 ... maxDepth = 0
568 ... depth = 0
569 ... def start(self, tag, attrib): # Called for each opening tag.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000570 ... self.depth += 1
Georg Brandl39bd0592007-12-01 22:42:46 +0000571 ... if self.depth > self.maxDepth:
572 ... self.maxDepth = self.depth
573 ... def end(self, tag): # Called for each closing tag.
574 ... self.depth -= 1
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000575 ... def data(self, data):
Georg Brandl39bd0592007-12-01 22:42:46 +0000576 ... pass # We do not need to do anything with data.
577 ... def close(self): # Called when all data has been parsed.
578 ... return self.maxDepth
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000579 ...
Georg Brandl39bd0592007-12-01 22:42:46 +0000580 >>> target = MaxDepth()
Florent Xicluna3e8c1892010-03-11 14:36:19 +0000581 >>> parser = XMLParser(target=target)
Georg Brandl39bd0592007-12-01 22:42:46 +0000582 >>> exampleXml = """
583 ... <a>
584 ... <b>
585 ... </b>
586 ... <b>
587 ... <c>
588 ... <d>
589 ... </d>
590 ... </c>
591 ... </b>
592 ... </a>"""
593 >>> parser.feed(exampleXml)
594 >>> parser.close()
595 4
Mark Summerfield43da35d2008-03-17 08:28:15 +0000596
597
598.. rubric:: Footnotes
599
600.. [#] The encoding string included in XML output should conform to the
601 appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
602 not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
Georg Brandl8b8c2df2009-02-20 08:45:47 +0000603 and http://www.iana.org/assignments/character-sets.