blob: ead8d29182a150daf8f9d4e01f7f8d47934b6f90 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`xml.etree.ElementTree` --- The ElementTree XML API
3========================================================
4
5.. module:: xml.etree.ElementTree
6 :synopsis: Implementation of the ElementTree API.
7.. moduleauthor:: Fredrik Lundh <fredrik@pythonware.com>
8
9
10.. versionadded:: 2.5
11
12The Element type is a flexible container object, designed to store hierarchical
13data structures in memory. The type can be described as a cross between a list
14and a dictionary.
15
16Each element has a number of properties associated with it:
17
18* a tag which is a string identifying what kind of data this element represents
19 (the element type, in other words).
20
21* a number of attributes, stored in a Python dictionary.
22
23* a text string.
24
25* an optional tail string.
26
27* a number of child elements, stored in a Python sequence
28
29To create an element instance, use the Element or SubElement factory functions.
30
31The :class:`ElementTree` class can be used to wrap an element structure, and
32convert it from and to XML.
33
34A C implementation of this API is available as :mod:`xml.etree.cElementTree`.
35
36
37.. _elementtree-functions:
38
39Functions
40---------
41
42
43.. function:: Comment([text])
44
45 Comment element factory. This factory function creates a special element that
46 will be serialized as an XML comment. The comment string can be either an 8-bit
47 ASCII string or a Unicode string. *text* is a string containing the comment
48 string. Returns an element instance representing a comment.
49
50
51.. function:: dump(elem)
52
53 Writes an element tree or element structure to sys.stdout. This function should
54 be used for debugging only.
55
56 The exact output format is implementation dependent. In this version, it's
57 written as an ordinary XML file.
58
59 *elem* is an element tree or an individual element.
60
61
62.. function:: Element(tag[, attrib][, **extra])
63
64 Element factory. This function returns an object implementing the standard
65 Element interface. The exact class or type of that object is implementation
66 dependent, but it will always be compatible with the _ElementInterface class in
67 this module.
68
69 The element name, attribute names, and attribute values can be either 8-bit
70 ASCII strings or Unicode strings. *tag* is the element name. *attrib* is an
71 optional dictionary, containing element attributes. *extra* contains additional
72 attributes, given as keyword arguments. Returns an element instance.
73
74
75.. function:: fromstring(text)
76
77 Parses an XML section from a string constant. Same as XML. *text* is a string
78 containing XML data. Returns an Element instance.
79
80
81.. function:: iselement(element)
82
83 Checks if an object appears to be a valid element object. *element* is an
84 element instance. Returns a true value if this is an element object.
85
86
87.. function:: iterparse(source[, events])
88
89 Parses an XML section into an element tree incrementally, and reports what's
90 going on to the user. *source* is a filename or file object containing XML data.
91 *events* is a list of events to report back. If omitted, only "end" events are
92 reported. Returns an iterator providing ``(event, elem)`` pairs.
93
94
95.. function:: parse(source[, parser])
96
97 Parses an XML section into an element tree. *source* is a filename or file
98 object containing XML data. *parser* is an optional parser instance. If not
99 given, the standard XMLTreeBuilder parser is used. Returns an ElementTree
100 instance.
101
102
103.. function:: ProcessingInstruction(target[, text])
104
105 PI element factory. This factory function creates a special element that will
106 be serialized as an XML processing instruction. *target* is a string containing
107 the PI target. *text* is a string containing the PI contents, if given. Returns
108 an element instance, representing a processing instruction.
109
110
111.. function:: SubElement(parent, tag[, attrib[, **extra]])
112
113 Subelement factory. This function creates an element instance, and appends it
114 to an existing element.
115
116 The element name, attribute names, and attribute values can be either 8-bit
117 ASCII strings or Unicode strings. *parent* is the parent element. *tag* is the
118 subelement name. *attrib* is an optional dictionary, containing element
119 attributes. *extra* contains additional attributes, given as keyword arguments.
120 Returns an element instance.
121
122
123.. function:: tostring(element[, encoding])
124
125 Generates a string representation of an XML element, including all subelements.
126 *element* is an Element instance. *encoding* is the output encoding (default is
127 US-ASCII). Returns an encoded string containing the XML data.
128
129
130.. function:: XML(text)
131
132 Parses an XML section from a string constant. This function can be used to
133 embed "XML literals" in Python code. *text* is a string containing XML data.
134 Returns an Element instance.
135
136
137.. function:: XMLID(text)
138
139 Parses an XML section from a string constant, and also returns a dictionary
140 which maps from element id:s to elements. *text* is a string containing XML
141 data. Returns a tuple containing an Element instance and a dictionary.
142
143
144.. _elementtree-element-interface:
145
146The Element Interface
147---------------------
148
149Element objects returned by Element or SubElement have the following methods
150and attributes.
151
152
153.. attribute:: Element.tag
154
155 A string identifying what kind of data this element represents (the element
156 type, in other words).
157
158
159.. attribute:: Element.text
160
161 The *text* attribute can be used to hold additional data associated with the
162 element. As the name implies this attribute is usually a string but may be any
163 application-specific object. If the element is created from an XML file the
164 attribute will contain any text found between the element tags.
165
166
167.. attribute:: Element.tail
168
169 The *tail* attribute can be used to hold additional data associated with the
170 element. This attribute is usually a string but may be any application-specific
171 object. If the element is created from an XML file the attribute will contain
172 any text found after the element's end tag and before the next tag.
173
174
175.. attribute:: Element.attrib
176
177 A dictionary containing the element's attributes. Note that while the *attrib*
178 value is always a real mutable Python dictionary, an ElementTree implementation
179 may choose to use another internal representation, and create the dictionary
180 only if someone asks for it. To take advantage of such implementations, use the
181 dictionary methods below whenever possible.
182
183The following dictionary-like methods work on the element attributes.
184
185
186.. method:: Element.clear()
187
188 Resets an element. This function removes all subelements, clears all
189 attributes, and sets the text and tail attributes to None.
190
191
192.. method:: Element.get(key[, default=None])
193
194 Gets the element attribute named *key*.
195
196 Returns the attribute value, or *default* if the attribute was not found.
197
198
199.. method:: Element.items()
200
201 Returns the element attributes as a sequence of (name, value) pairs. The
202 attributes are returned in an arbitrary order.
203
204
205.. method:: Element.keys()
206
207 Returns the elements attribute names as a list. The names are returned in an
208 arbitrary order.
209
210
211.. method:: Element.set(key, value)
212
213 Set the attribute *key* on the element to *value*.
214
215The following methods work on the element's children (subelements).
216
217
218.. method:: Element.append(subelement)
219
220 Adds the element *subelement* to the end of this elements internal list of
221 subelements.
222
223
224.. method:: Element.find(match)
225
226 Finds the first subelement matching *match*. *match* may be a tag name or path.
227 Returns an element instance or ``None``.
228
229
230.. method:: Element.findall(match)
231
232 Finds all subelements matching *match*. *match* may be a tag name or path.
233 Returns an iterable yielding all matching elements in document order.
234
235
236.. method:: Element.findtext(condition[, default=None])
237
238 Finds text for the first subelement matching *condition*. *condition* may be a
239 tag name or path. Returns the text content of the first matching element, or
240 *default* if no element was found. Note that if the matching element has no
241 text content an empty string is returned.
242
243
244.. method:: Element.getchildren()
245
246 Returns all subelements. The elements are returned in document order.
247
248
249.. method:: Element.getiterator([tag=None])
250
251 Creates a tree iterator with the current element as the root. The iterator
252 iterates over this element and all elements below it that match the given tag.
253 If tag is ``None`` or ``'*'`` then all elements are iterated over. Returns an
254 iterable that provides element objects in document (depth first) order.
255
256
257.. method:: Element.insert(index, element)
258
259 Inserts a subelement at the given position in this element.
260
261
262.. method:: Element.makeelement(tag, attrib)
263
264 Creates a new element object of the same type as this element. Do not call this
265 method, use the SubElement factory function instead.
266
267
268.. method:: Element.remove(subelement)
269
270 Removes *subelement* from the element. Unlike the findXYZ methods this method
271 compares elements based on the instance identity, not on tag value or contents.
272
273Element objects also support the following sequence type methods for working
274with subelements: :meth:`__delitem__`, :meth:`__getitem__`, :meth:`__setitem__`,
275:meth:`__len__`.
276
277Caution: Because Element objects do not define a :meth:`__nonzero__` method,
278elements with no subelements will test as ``False``. ::
279
280 element = root.find('foo')
281
282 if not element: # careful!
283 print "element not found, or element has no subelements"
284
285 if element is None:
286 print "element not found"
287
288
289.. _elementtree-elementtree-objects:
290
291ElementTree Objects
292-------------------
293
294
295.. class:: ElementTree([element,] [file])
296
297 ElementTree wrapper class. This class represents an entire element hierarchy,
298 and adds some extra support for serialization to and from standard XML.
299
300 *element* is the root element. The tree is initialized with the contents of the
301 XML *file* if given.
302
303
304.. method:: ElementTree._setroot(element)
305
306 Replaces the root element for this tree. This discards the current contents of
307 the tree, and replaces it with the given element. Use with care. *element* is
308 an element instance.
309
310
311.. method:: ElementTree.find(path)
312
313 Finds the first toplevel element with given tag. Same as getroot().find(path).
314 *path* is the element to look for. Returns the first matching element, or
315 ``None`` if no element was found.
316
317
318.. method:: ElementTree.findall(path)
319
320 Finds all toplevel elements with the given tag. Same as getroot().findall(path).
321 *path* is the element to look for. Returns a list or iterator containing all
322 matching elements, in document order.
323
324
325.. method:: ElementTree.findtext(path[, default])
326
327 Finds the element text for the first toplevel element with given tag. Same as
328 getroot().findtext(path). *path* is the toplevel element to look for. *default*
329 is the value to return if the element was not found. Returns the text content of
330 the first matching element, or the default value no element was found. Note
331 that if the element has is found, but has no text content, this method returns
332 an empty string.
333
334
335.. method:: ElementTree.getiterator([tag])
336
337 Creates and returns a tree iterator for the root element. The iterator loops
338 over all elements in this tree, in section order. *tag* is the tag to look for
339 (default is to return all elements)
340
341
342.. method:: ElementTree.getroot()
343
344 Returns the root element for this tree.
345
346
347.. method:: ElementTree.parse(source[, parser])
348
349 Loads an external XML section into this element tree. *source* is a file name or
350 file object. *parser* is an optional parser instance. If not given, the
351 standard XMLTreeBuilder parser is used. Returns the section root element.
352
353
354.. method:: ElementTree.write(file[, encoding])
355
356 Writes the element tree to a file, as XML. *file* is a file name, or a file
357 object opened for writing. *encoding* is the output encoding (default is
358 US-ASCII).
359
360
361.. _elementtree-qname-objects:
362
363QName Objects
364-------------
365
366
367.. class:: QName(text_or_uri[, tag])
368
369 QName wrapper. This can be used to wrap a QName attribute value, in order to
370 get proper namespace handling on output. *text_or_uri* is a string containing
371 the QName value, in the form {uri}local, or, if the tag argument is given, the
372 URI part of a QName. If *tag* is given, the first argument is interpreted as an
373 URI, and this argument is interpreted as a local name. :class:`QName` instances
374 are opaque.
375
376
377.. _elementtree-treebuilder-objects:
378
379TreeBuilder Objects
380-------------------
381
382
383.. class:: TreeBuilder([element_factory])
384
385 Generic element structure builder. This builder converts a sequence of start,
386 data, and end method calls to a well-formed element structure. You can use this
387 class to build an element structure using a custom XML parser, or a parser for
388 some other XML-like format. The *element_factory* is called to create new
389 Element instances when given.
390
391
392.. method:: TreeBuilder.close()
393
394 Flushes the parser buffers, and returns the toplevel documen element. Returns an
395 Element instance.
396
397
398.. method:: TreeBuilder.data(data)
399
400 Adds text to the current element. *data* is a string. This should be either an
401 8-bit string containing ASCII text, or a Unicode string.
402
403
404.. method:: TreeBuilder.end(tag)
405
406 Closes the current element. *tag* is the element name. Returns the closed
407 element.
408
409
410.. method:: TreeBuilder.start(tag, attrs)
411
412 Opens a new element. *tag* is the element name. *attrs* is a dictionary
413 containing element attributes. Returns the opened element.
414
415
416.. _elementtree-xmltreebuilder-objects:
417
418XMLTreeBuilder Objects
419----------------------
420
421
422.. class:: XMLTreeBuilder([html,] [target])
423
424 Element structure builder for XML source data, based on the expat parser. *html*
425 are predefined HTML entities. This flag is not supported by the current
426 implementation. *target* is the target object. If omitted, the builder uses an
427 instance of the standard TreeBuilder class.
428
429
430.. method:: XMLTreeBuilder.close()
431
432 Finishes feeding data to the parser. Returns an element structure.
433
434
435.. method:: XMLTreeBuilder.doctype(name, pubid, system)
436
437 Handles a doctype declaration. *name* is the doctype name. *pubid* is the public
438 identifier. *system* is the system identifier.
439
440
441.. method:: XMLTreeBuilder.feed(data)
442
443 Feeds data to the parser. *data* is encoded data.
444