| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 |  | 
|  | 2 | :mod:`xml.dom.minidom` --- Lightweight DOM implementation | 
|  | 3 | ========================================================= | 
|  | 4 |  | 
|  | 5 | .. module:: xml.dom.minidom | 
|  | 6 | :synopsis: Lightweight Document Object Model (DOM) implementation. | 
|  | 7 | .. moduleauthor:: Paul Prescod <paul@prescod.net> | 
|  | 8 | .. sectionauthor:: Paul Prescod <paul@prescod.net> | 
|  | 9 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> | 
|  | 10 |  | 
|  | 11 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 12 | :mod:`xml.dom.minidom` is a light-weight implementation of the Document Object | 
|  | 13 | Model interface.  It is intended to be simpler than the full DOM and also | 
|  | 14 | significantly smaller. | 
|  | 15 |  | 
|  | 16 | DOM applications typically start by parsing some XML into a DOM.  With | 
|  | 17 | :mod:`xml.dom.minidom`, this is done through the parse functions:: | 
|  | 18 |  | 
|  | 19 | from xml.dom.minidom import parse, parseString | 
|  | 20 |  | 
|  | 21 | dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name | 
|  | 22 |  | 
|  | 23 | datasource = open('c:\\temp\\mydata.xml') | 
|  | 24 | dom2 = parse(datasource)   # parse an open file | 
|  | 25 |  | 
|  | 26 | dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') | 
|  | 27 |  | 
|  | 28 | The :func:`parse` function can take either a filename or an open file object. | 
|  | 29 |  | 
|  | 30 |  | 
| Georg Brandl | 8a1e4c4 | 2009-05-25 21:13:36 +0000 | [diff] [blame] | 31 | .. function:: parse(filename_or_file[, parser[, bufsize]]) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 32 |  | 
|  | 33 | Return a :class:`Document` from the given input. *filename_or_file* may be | 
|  | 34 | either a file name, or a file-like object. *parser*, if given, must be a SAX2 | 
|  | 35 | parser object. This function will change the document handler of the parser and | 
|  | 36 | activate namespace support; other parser configuration (like setting an entity | 
|  | 37 | resolver) must have been done in advance. | 
|  | 38 |  | 
|  | 39 | If you have XML in a string, you can use the :func:`parseString` function | 
|  | 40 | instead: | 
|  | 41 |  | 
|  | 42 |  | 
|  | 43 | .. function:: parseString(string[, parser]) | 
|  | 44 |  | 
|  | 45 | Return a :class:`Document` that represents the *string*. This method creates a | 
|  | 46 | :class:`StringIO` object for the string and passes that on to :func:`parse`. | 
|  | 47 |  | 
|  | 48 | Both functions return a :class:`Document` object representing the content of the | 
|  | 49 | document. | 
|  | 50 |  | 
|  | 51 | What the :func:`parse` and :func:`parseString` functions do is connect an XML | 
|  | 52 | parser with a "DOM builder" that can accept parse events from any SAX parser and | 
|  | 53 | convert them into a DOM tree.  The name of the functions are perhaps misleading, | 
|  | 54 | but are easy to grasp when learning the interfaces.  The parsing of the document | 
|  | 55 | will be completed before these functions return; it's simply that these | 
|  | 56 | functions do not provide a parser implementation themselves. | 
|  | 57 |  | 
|  | 58 | You can also create a :class:`Document` by calling a method on a "DOM | 
|  | 59 | Implementation" object.  You can get this object either by calling the | 
|  | 60 | :func:`getDOMImplementation` function in the :mod:`xml.dom` package or the | 
|  | 61 | :mod:`xml.dom.minidom` module. Using the implementation from the | 
|  | 62 | :mod:`xml.dom.minidom` module will always return a :class:`Document` instance | 
|  | 63 | from the minidom implementation, while the version from :mod:`xml.dom` may | 
|  | 64 | provide an alternate implementation (this is likely if you have the `PyXML | 
|  | 65 | package <http://pyxml.sourceforge.net/>`_ installed).  Once you have a | 
|  | 66 | :class:`Document`, you can add child nodes to it to populate the DOM:: | 
|  | 67 |  | 
|  | 68 | from xml.dom.minidom import getDOMImplementation | 
|  | 69 |  | 
|  | 70 | impl = getDOMImplementation() | 
|  | 71 |  | 
|  | 72 | newdoc = impl.createDocument(None, "some_tag", None) | 
|  | 73 | top_element = newdoc.documentElement | 
|  | 74 | text = newdoc.createTextNode('Some textual content.') | 
|  | 75 | top_element.appendChild(text) | 
|  | 76 |  | 
|  | 77 | Once you have a DOM document object, you can access the parts of your XML | 
|  | 78 | document through its properties and methods.  These properties are defined in | 
|  | 79 | the DOM specification.  The main property of the document object is the | 
|  | 80 | :attr:`documentElement` property.  It gives you the main element in the XML | 
|  | 81 | document: the one that holds all others.  Here is an example program:: | 
|  | 82 |  | 
|  | 83 | dom3 = parseString("<myxml>Some data</myxml>") | 
|  | 84 | assert dom3.documentElement.tagName == "myxml" | 
|  | 85 |  | 
|  | 86 | When you are finished with a DOM, you should clean it up.  This is necessary | 
|  | 87 | because some versions of Python do not support garbage collection of objects | 
|  | 88 | that refer to each other in a cycle.  Until this restriction is removed from all | 
|  | 89 | versions of Python, it is safest to write your code as if cycles would not be | 
|  | 90 | cleaned up. | 
|  | 91 |  | 
|  | 92 | The way to clean up a DOM is to call its :meth:`unlink` method:: | 
|  | 93 |  | 
|  | 94 | dom1.unlink() | 
|  | 95 | dom2.unlink() | 
|  | 96 | dom3.unlink() | 
|  | 97 |  | 
|  | 98 | :meth:`unlink` is a :mod:`xml.dom.minidom`\ -specific extension to the DOM API. | 
|  | 99 | After calling :meth:`unlink` on a node, the node and its descendants are | 
|  | 100 | essentially useless. | 
|  | 101 |  | 
|  | 102 |  | 
|  | 103 | .. seealso:: | 
|  | 104 |  | 
|  | 105 | `Document Object Model (DOM) Level 1 Specification <http://www.w3.org/TR/REC-DOM-Level-1/>`_ | 
|  | 106 | The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. | 
|  | 107 |  | 
|  | 108 |  | 
|  | 109 | .. _minidom-objects: | 
|  | 110 |  | 
|  | 111 | DOM Objects | 
|  | 112 | ----------- | 
|  | 113 |  | 
|  | 114 | The definition of the DOM API for Python is given as part of the :mod:`xml.dom` | 
|  | 115 | module documentation.  This section lists the differences between the API and | 
|  | 116 | :mod:`xml.dom.minidom`. | 
|  | 117 |  | 
|  | 118 |  | 
|  | 119 | .. method:: Node.unlink() | 
|  | 120 |  | 
|  | 121 | Break internal references within the DOM so that it will be garbage collected on | 
|  | 122 | versions of Python without cyclic GC.  Even when cyclic GC is available, using | 
|  | 123 | this can make large amounts of memory available sooner, so calling this on DOM | 
|  | 124 | objects as soon as they are no longer needed is good practice.  This only needs | 
|  | 125 | to be called on the :class:`Document` object, but may be called on child nodes | 
|  | 126 | to discard children of that node. | 
|  | 127 |  | 
|  | 128 |  | 
| Christian Heimes | 33fe809 | 2008-04-13 13:53:33 +0000 | [diff] [blame] | 129 | .. method:: Node.writexml(writer[, indent=""[, addindent=""[, newl=""[, encoding=""]]]]) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 130 |  | 
|  | 131 | Write XML to the writer object.  The writer should have a :meth:`write` method | 
|  | 132 | which matches that of the file object interface.  The *indent* parameter is the | 
|  | 133 | indentation of the current node.  The *addindent* parameter is the incremental | 
|  | 134 | indentation to use for subnodes of the current one.  The *newl* parameter | 
|  | 135 | specifies the string to use to terminate newlines. | 
|  | 136 |  | 
| Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 137 | For the :class:`Document` node, an additional keyword argument *encoding* can be | 
|  | 138 | used to specify the encoding field of the XML header. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 139 |  | 
|  | 140 |  | 
|  | 141 | .. method:: Node.toxml([encoding]) | 
|  | 142 |  | 
|  | 143 | Return the XML that the DOM represents as a string. | 
|  | 144 |  | 
|  | 145 | With no argument, the XML header does not specify an encoding, and the result is | 
|  | 146 | Unicode string if the default encoding cannot represent all characters in the | 
|  | 147 | document. Encoding this string in an encoding other than UTF-8 is likely | 
|  | 148 | incorrect, since UTF-8 is the default encoding of XML. | 
|  | 149 |  | 
| Christian Heimes | b186d00 | 2008-03-18 15:15:01 +0000 | [diff] [blame] | 150 | With an explicit *encoding* [1]_ argument, the result is a byte string in the | 
|  | 151 | specified encoding. It is recommended that this argument is always specified. To | 
|  | 152 | avoid :exc:`UnicodeError` exceptions in case of unrepresentable text data, the | 
|  | 153 | encoding argument should be specified as "utf-8". | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 154 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 155 |  | 
| Christian Heimes | 33fe809 | 2008-04-13 13:53:33 +0000 | [diff] [blame] | 156 | .. method:: Node.toprettyxml([indent=""[, newl=""[, encoding=""]]]) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 |  | 
|  | 158 | Return a pretty-printed version of the document. *indent* specifies the | 
|  | 159 | indentation string and defaults to a tabulator; *newl* specifies the string | 
|  | 160 | emitted at the end of each line and defaults to ``\n``. | 
|  | 161 |  | 
| Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 162 | There's also an *encoding* argument; see :meth:`toxml`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 163 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 164 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 165 | .. _dom-example: | 
|  | 166 |  | 
|  | 167 | DOM Example | 
|  | 168 | ----------- | 
|  | 169 |  | 
|  | 170 | This example program is a fairly realistic example of a simple program. In this | 
|  | 171 | particular case, we do not take much advantage of the flexibility of the DOM. | 
|  | 172 |  | 
|  | 173 | .. literalinclude:: ../includes/minidom-example.py | 
|  | 174 |  | 
|  | 175 |  | 
|  | 176 | .. _minidom-and-dom: | 
|  | 177 |  | 
|  | 178 | minidom and the DOM standard | 
|  | 179 | ---------------------------- | 
|  | 180 |  | 
|  | 181 | The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with | 
|  | 182 | some DOM 2 features (primarily namespace features). | 
|  | 183 |  | 
|  | 184 | Usage of the DOM interface in Python is straight-forward.  The following mapping | 
|  | 185 | rules apply: | 
|  | 186 |  | 
|  | 187 | * Interfaces are accessed through instance objects. Applications should not | 
|  | 188 | instantiate the classes themselves; they should use the creator functions | 
|  | 189 | available on the :class:`Document` object. Derived interfaces support all | 
|  | 190 | operations (and attributes) from the base interfaces, plus any new operations. | 
|  | 191 |  | 
|  | 192 | * Operations are used as methods. Since the DOM uses only :keyword:`in` | 
|  | 193 | parameters, the arguments are passed in normal order (from left to right). | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 194 | There are no optional arguments. ``void`` operations return ``None``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 195 |  | 
|  | 196 | * IDL attributes map to instance attributes. For compatibility with the OMG IDL | 
|  | 197 | language mapping for Python, an attribute ``foo`` can also be accessed through | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 198 | accessor methods :meth:`_get_foo` and :meth:`_set_foo`.  ``readonly`` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 199 | attributes must not be changed; this is not enforced at runtime. | 
|  | 200 |  | 
|  | 201 | * The types ``short int``, ``unsigned int``, ``unsigned long long``, and | 
|  | 202 | ``boolean`` all map to Python integer objects. | 
|  | 203 |  | 
|  | 204 | * The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports | 
| Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 205 | either bytes or strings, but will normally produce strings. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 206 | Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL | 
|  | 207 | ``null`` value by the DOM specification from the W3C. | 
|  | 208 |  | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 209 | * ``const`` declarations map to variables in their respective scope (e.g. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 210 | ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. | 
|  | 211 |  | 
|  | 212 | * ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. | 
|  | 213 | Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as | 
|  | 214 | :exc:`TypeError` and :exc:`AttributeError`. | 
|  | 215 |  | 
|  | 216 | * :class:`NodeList` objects are implemented using Python's built-in list type. | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 217 | These objects provide the interface defined in the DOM specification, but with | 
|  | 218 | earlier versions of Python they do not support the official API.  They are, | 
|  | 219 | however, much more "Pythonic" than the interface defined in the W3C | 
|  | 220 | recommendations. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 221 |  | 
|  | 222 | The following interfaces have no implementation in :mod:`xml.dom.minidom`: | 
|  | 223 |  | 
|  | 224 | * :class:`DOMTimeStamp` | 
|  | 225 |  | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 226 | * :class:`DocumentType` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 227 |  | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 228 | * :class:`DOMImplementation` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 229 |  | 
|  | 230 | * :class:`CharacterData` | 
|  | 231 |  | 
|  | 232 | * :class:`CDATASection` | 
|  | 233 |  | 
|  | 234 | * :class:`Notation` | 
|  | 235 |  | 
|  | 236 | * :class:`Entity` | 
|  | 237 |  | 
|  | 238 | * :class:`EntityReference` | 
|  | 239 |  | 
|  | 240 | * :class:`DocumentFragment` | 
|  | 241 |  | 
|  | 242 | Most of these reflect information in the XML document that is not of general | 
|  | 243 | utility to most DOM users. | 
|  | 244 |  | 
| Christian Heimes | b186d00 | 2008-03-18 15:15:01 +0000 | [diff] [blame] | 245 | .. rubric:: Footnotes | 
|  | 246 |  | 
|  | 247 | .. [#] The encoding string included in XML output should conform to the | 
|  | 248 | appropriate standards. For example, "UTF-8" is valid, but "UTF8" is | 
|  | 249 | not. See http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl | 
|  | 250 | and http://www.iana.org/assignments/character-sets . |