Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 1 | :mod:`xml.dom.minidom` --- Minimal DOM implementation |
| 2 | ===================================================== |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
| 4 | .. module:: xml.dom.minidom |
Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 5 | :synopsis: Minimal Document Object Model (DOM) implementation. |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Paul Prescod <paul@prescod.net> |
| 8 | .. sectionauthor:: Paul Prescod <paul@prescod.net> |
| 9 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> |
| 10 | |
Raymond Hettinger | 3029aff | 2011-02-10 08:09:36 +0000 | [diff] [blame] | 11 | **Source code:** :source:`Lib/xml/dom/minidom.py` |
| 12 | |
| 13 | -------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | |
Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 15 | :mod:`xml.dom.minidom` is a minimal implementation of the Document Object |
| 16 | Model interface, with an API similar to that in other languages. It is intended |
| 17 | to be simpler than the full DOM and also significantly smaller. Users who are |
| 18 | not already proficient with the DOM should consider using the |
Martin Panter | d21e0b5 | 2015-10-10 10:36:22 +0000 | [diff] [blame] | 19 | :mod:`xml.etree.ElementTree` module for their XML processing instead. |
Eli Bendersky | 2029344 | 2012-03-02 07:37:13 +0200 | [diff] [blame] | 20 | |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 21 | |
| 22 | .. warning:: |
| 23 | |
| 24 | The :mod:`xml.dom.minidom` module is not secure against |
| 25 | maliciously constructed data. If you need to parse untrusted or |
| 26 | unauthenticated data see :ref:`xml-vulnerabilities`. |
| 27 | |
| 28 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 29 | DOM applications typically start by parsing some XML into a DOM. With |
| 30 | :mod:`xml.dom.minidom`, this is done through the parse functions:: |
| 31 | |
| 32 | from xml.dom.minidom import parse, parseString |
| 33 | |
Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 34 | dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 | |
| 36 | datasource = open('c:\\temp\\mydata.xml') |
Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 37 | dom2 = parse(datasource) # parse an open file |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 38 | |
| 39 | dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') |
| 40 | |
| 41 | The :func:`parse` function can take either a filename or an open file object. |
| 42 | |
| 43 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 44 | .. function:: parse(filename_or_file, parser=None, bufsize=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 | |
| 46 | Return a :class:`Document` from the given input. *filename_or_file* may be |
| 47 | either a file name, or a file-like object. *parser*, if given, must be a SAX2 |
| 48 | parser object. This function will change the document handler of the parser and |
| 49 | activate namespace support; other parser configuration (like setting an entity |
| 50 | resolver) must have been done in advance. |
| 51 | |
| 52 | If you have XML in a string, you can use the :func:`parseString` function |
| 53 | instead: |
| 54 | |
| 55 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 56 | .. function:: parseString(string, parser=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 57 | |
Serhiy Storchaka | d65c949 | 2015-11-02 14:10:23 +0200 | [diff] [blame] | 58 | Return a :class:`Document` that represents the *string*. This method creates an |
Serhiy Storchaka | e79be87 | 2013-08-17 00:09:55 +0300 | [diff] [blame] | 59 | :class:`io.StringIO` object for the string and passes that on to :func:`parse`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 60 | |
| 61 | Both functions return a :class:`Document` object representing the content of the |
| 62 | document. |
| 63 | |
| 64 | What the :func:`parse` and :func:`parseString` functions do is connect an XML |
| 65 | parser with a "DOM builder" that can accept parse events from any SAX parser and |
| 66 | convert them into a DOM tree. The name of the functions are perhaps misleading, |
| 67 | but are easy to grasp when learning the interfaces. The parsing of the document |
| 68 | will be completed before these functions return; it's simply that these |
| 69 | functions do not provide a parser implementation themselves. |
| 70 | |
| 71 | You can also create a :class:`Document` by calling a method on a "DOM |
| 72 | Implementation" object. You can get this object either by calling the |
| 73 | :func:`getDOMImplementation` function in the :mod:`xml.dom` package or the |
Martin v. Löwis | 2f48d89 | 2011-05-09 08:05:43 +0200 | [diff] [blame] | 74 | :mod:`xml.dom.minidom` module. Once you have a :class:`Document`, you |
| 75 | can add child nodes to it to populate the DOM:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | |
| 77 | from xml.dom.minidom import getDOMImplementation |
| 78 | |
| 79 | impl = getDOMImplementation() |
| 80 | |
| 81 | newdoc = impl.createDocument(None, "some_tag", None) |
| 82 | top_element = newdoc.documentElement |
| 83 | text = newdoc.createTextNode('Some textual content.') |
| 84 | top_element.appendChild(text) |
| 85 | |
| 86 | Once you have a DOM document object, you can access the parts of your XML |
| 87 | document through its properties and methods. These properties are defined in |
| 88 | the DOM specification. The main property of the document object is the |
| 89 | :attr:`documentElement` property. It gives you the main element in the XML |
| 90 | document: the one that holds all others. Here is an example program:: |
| 91 | |
| 92 | dom3 = parseString("<myxml>Some data</myxml>") |
| 93 | assert dom3.documentElement.tagName == "myxml" |
| 94 | |
Benjamin Peterson | 21896a3 | 2010-03-21 22:03:03 +0000 | [diff] [blame] | 95 | When you are finished with a DOM tree, you may optionally call the |
| 96 | :meth:`unlink` method to encourage early cleanup of the now-unneeded |
Martin Panter | 204bf0b | 2016-07-11 07:51:37 +0000 | [diff] [blame] | 97 | objects. :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific |
Benjamin Peterson | 21896a3 | 2010-03-21 22:03:03 +0000 | [diff] [blame] | 98 | extension to the DOM API that renders the node and its descendants are |
| 99 | essentially useless. Otherwise, Python's garbage collector will |
| 100 | eventually take care of the objects in the tree. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 101 | |
| 102 | .. seealso:: |
| 103 | |
Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 104 | `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 105 | The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. |
| 106 | |
| 107 | |
| 108 | .. _minidom-objects: |
| 109 | |
| 110 | DOM Objects |
| 111 | ----------- |
| 112 | |
| 113 | The definition of the DOM API for Python is given as part of the :mod:`xml.dom` |
| 114 | module documentation. This section lists the differences between the API and |
| 115 | :mod:`xml.dom.minidom`. |
| 116 | |
| 117 | |
| 118 | .. method:: Node.unlink() |
| 119 | |
| 120 | Break internal references within the DOM so that it will be garbage collected on |
| 121 | versions of Python without cyclic GC. Even when cyclic GC is available, using |
| 122 | this can make large amounts of memory available sooner, so calling this on DOM |
| 123 | objects as soon as they are no longer needed is good practice. This only needs |
| 124 | to be called on the :class:`Document` object, but may be called on child nodes |
| 125 | to discard children of that node. |
| 126 | |
Kristján Valur Jónsson | 17173cf | 2010-06-09 08:13:42 +0000 | [diff] [blame] | 127 | You can avoid calling this method explicitly by using the :keyword:`with` |
| 128 | statement. The following code will automatically unlink *dom* when the |
Serhiy Storchaka | 2b57c43 | 2018-12-19 08:09:46 +0200 | [diff] [blame^] | 129 | :keyword:`!with` block is exited:: |
Kristján Valur Jónsson | 17173cf | 2010-06-09 08:13:42 +0000 | [diff] [blame] | 130 | |
| 131 | with xml.dom.minidom.parse(datasource) as dom: |
| 132 | ... # Work with dom. |
| 133 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 134 | |
Georg Brandl | 2c39c77 | 2010-12-28 11:15:49 +0000 | [diff] [blame] | 135 | .. method:: Node.writexml(writer, indent="", addindent="", newl="") |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 136 | |
| 137 | Write XML to the writer object. The writer should have a :meth:`write` method |
| 138 | which matches that of the file object interface. The *indent* parameter is the |
| 139 | indentation of the current node. The *addindent* parameter is the incremental |
| 140 | indentation to use for subnodes of the current one. The *newl* parameter |
| 141 | specifies the string to use to terminate newlines. |
| 142 | |
Georg Brandl | 2c39c77 | 2010-12-28 11:15:49 +0000 | [diff] [blame] | 143 | For the :class:`Document` node, an additional keyword argument *encoding* can |
| 144 | be used to specify the encoding field of the XML header. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 145 | |
Diego Rojas | 5598cc9 | 2018-11-07 09:09:04 -0500 | [diff] [blame] | 146 | .. versionchanged:: 3.8 |
| 147 | The :meth:`writexml` method now preserves the attribute order specified |
| 148 | by the user. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 149 | |
Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 150 | .. method:: Node.toxml(encoding=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 | |
Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 152 | Return a string or byte string containing the XML represented by |
| 153 | the DOM node. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 154 | |
Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 155 | With an explicit *encoding* [1]_ argument, the result is a byte |
Eli Bendersky | 8a80502 | 2012-07-13 09:52:39 +0300 | [diff] [blame] | 156 | string in the specified encoding. |
Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 157 | With no *encoding* argument, the result is a Unicode string, and the |
| 158 | XML declaration in the resulting string does not specify an |
| 159 | encoding. Encoding this string in an encoding other than UTF-8 is |
| 160 | likely incorrect, since UTF-8 is the default encoding of XML. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 161 | |
Diego Rojas | 5598cc9 | 2018-11-07 09:09:04 -0500 | [diff] [blame] | 162 | .. versionchanged:: 3.8 |
| 163 | The :meth:`toxml` method now preserves the attribute order specified |
| 164 | by the user. |
| 165 | |
E Kawashima | 2d8f976 | 2018-12-06 07:15:42 +0900 | [diff] [blame] | 166 | .. method:: Node.toprettyxml(indent="\\t", newl="\\n", encoding=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 167 | |
| 168 | Return a pretty-printed version of the document. *indent* specifies the |
| 169 | indentation string and defaults to a tabulator; *newl* specifies the string |
| 170 | emitted at the end of each line and defaults to ``\n``. |
| 171 | |
Andrew M. Kuchling | 57a7c3d | 2010-07-26 12:54:02 +0000 | [diff] [blame] | 172 | The *encoding* argument behaves like the corresponding argument of |
| 173 | :meth:`toxml`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 174 | |
Diego Rojas | 5598cc9 | 2018-11-07 09:09:04 -0500 | [diff] [blame] | 175 | .. versionchanged:: 3.8 |
| 176 | The :meth:`toprettyxml` method now preserves the attribute order specified |
| 177 | by the user. |
| 178 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 179 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 180 | .. _dom-example: |
| 181 | |
| 182 | DOM Example |
| 183 | ----------- |
| 184 | |
| 185 | This example program is a fairly realistic example of a simple program. In this |
| 186 | particular case, we do not take much advantage of the flexibility of the DOM. |
| 187 | |
| 188 | .. literalinclude:: ../includes/minidom-example.py |
| 189 | |
| 190 | |
| 191 | .. _minidom-and-dom: |
| 192 | |
| 193 | minidom and the DOM standard |
| 194 | ---------------------------- |
| 195 | |
| 196 | The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with |
| 197 | some DOM 2 features (primarily namespace features). |
| 198 | |
| 199 | Usage of the DOM interface in Python is straight-forward. The following mapping |
| 200 | rules apply: |
| 201 | |
| 202 | * Interfaces are accessed through instance objects. Applications should not |
| 203 | instantiate the classes themselves; they should use the creator functions |
| 204 | available on the :class:`Document` object. Derived interfaces support all |
| 205 | operations (and attributes) from the base interfaces, plus any new operations. |
| 206 | |
| 207 | * Operations are used as methods. Since the DOM uses only :keyword:`in` |
| 208 | parameters, the arguments are passed in normal order (from left to right). |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 209 | There are no optional arguments. ``void`` operations return ``None``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 210 | |
| 211 | * IDL attributes map to instance attributes. For compatibility with the OMG IDL |
| 212 | language mapping for Python, an attribute ``foo`` can also be accessed through |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 213 | accessor methods :meth:`_get_foo` and :meth:`_set_foo`. ``readonly`` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 214 | attributes must not be changed; this is not enforced at runtime. |
| 215 | |
| 216 | * The types ``short int``, ``unsigned int``, ``unsigned long long``, and |
| 217 | ``boolean`` all map to Python integer objects. |
| 218 | |
| 219 | * The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 220 | either bytes or strings, but will normally produce strings. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 221 | Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL |
| 222 | ``null`` value by the DOM specification from the W3C. |
| 223 | |
Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 224 | * ``const`` declarations map to variables in their respective scope (e.g. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 225 | ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. |
| 226 | |
| 227 | * ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. |
| 228 | Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as |
| 229 | :exc:`TypeError` and :exc:`AttributeError`. |
| 230 | |
| 231 | * :class:`NodeList` objects are implemented using Python's built-in list type. |
Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 232 | These objects provide the interface defined in the DOM specification, but with |
| 233 | earlier versions of Python they do not support the official API. They are, |
| 234 | however, much more "Pythonic" than the interface defined in the W3C |
| 235 | recommendations. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 236 | |
| 237 | The following interfaces have no implementation in :mod:`xml.dom.minidom`: |
| 238 | |
| 239 | * :class:`DOMTimeStamp` |
| 240 | |
Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 241 | * :class:`DocumentType` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 242 | |
Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 243 | * :class:`DOMImplementation` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 244 | |
| 245 | * :class:`CharacterData` |
| 246 | |
| 247 | * :class:`CDATASection` |
| 248 | |
| 249 | * :class:`Notation` |
| 250 | |
| 251 | * :class:`Entity` |
| 252 | |
| 253 | * :class:`EntityReference` |
| 254 | |
| 255 | * :class:`DocumentFragment` |
| 256 | |
| 257 | Most of these reflect information in the XML document that is not of general |
| 258 | utility to most DOM users. |
| 259 | |
Christian Heimes | b186d00 | 2008-03-18 15:15:01 +0000 | [diff] [blame] | 260 | .. rubric:: Footnotes |
| 261 | |
Serhiy Storchaka | d97b7dc | 2017-05-16 23:18:09 +0300 | [diff] [blame] | 262 | .. [1] The encoding name included in the XML output should conform to |
Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 263 | the appropriate standards. For example, "UTF-8" is valid, but |
| 264 | "UTF8" is not valid in an XML document's declaration, even though |
| 265 | Python accepts it as an encoding name. |
Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 266 | See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl |
| 267 | and https://www.iana.org/assignments/character-sets/character-sets.xhtml. |