| Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 1 | :mod:`xml.dom.minidom` --- Minimal DOM implementation | 
 | 2 | ===================================================== | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 |  | 
 | 4 | .. module:: xml.dom.minidom | 
| Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 5 |    :synopsis: Minimal Document Object Model (DOM) implementation. | 
| Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 6 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 7 | .. moduleauthor:: Paul Prescod <paul@prescod.net> | 
 | 8 | .. sectionauthor:: Paul Prescod <paul@prescod.net> | 
 | 9 | .. sectionauthor:: Martin v. Löwis <martin@v.loewis.de> | 
 | 10 |  | 
| Raymond Hettinger | 3029aff | 2011-02-10 08:09:36 +0000 | [diff] [blame] | 11 | **Source code:** :source:`Lib/xml/dom/minidom.py` | 
 | 12 |  | 
 | 13 | -------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 |  | 
| Ezio Melotti | da4b5b8 | 2013-01-22 22:47:57 +0200 | [diff] [blame] | 15 | :mod:`xml.dom.minidom` is a minimal implementation of the Document Object | 
 | 16 | Model interface, with an API similar to that in other languages.  It is intended | 
 | 17 | to be simpler than the full DOM and also significantly smaller.  Users who are | 
 | 18 | not already proficient with the DOM should consider using the | 
| Martin Panter | d21e0b5 | 2015-10-10 10:36:22 +0000 | [diff] [blame] | 19 | :mod:`xml.etree.ElementTree` module for their XML processing instead. | 
| Eli Bendersky | 2029344 | 2012-03-02 07:37:13 +0200 | [diff] [blame] | 20 |  | 
| Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 21 |  | 
 | 22 | .. warning:: | 
 | 23 |  | 
 | 24 |    The :mod:`xml.dom.minidom` module is not secure against | 
 | 25 |    maliciously constructed data.  If you need to parse untrusted or | 
 | 26 |    unauthenticated data see :ref:`xml-vulnerabilities`. | 
 | 27 |  | 
 | 28 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 29 | DOM applications typically start by parsing some XML into a DOM.  With | 
 | 30 | :mod:`xml.dom.minidom`, this is done through the parse functions:: | 
 | 31 |  | 
 | 32 |    from xml.dom.minidom import parse, parseString | 
 | 33 |  | 
| Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 34 |    dom1 = parse('c:\\temp\\mydata.xml')  # parse an XML file by name | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 |  | 
 | 36 |    datasource = open('c:\\temp\\mydata.xml') | 
| Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 37 |    dom2 = parse(datasource)  # parse an open file | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 38 |  | 
 | 39 |    dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') | 
 | 40 |  | 
 | 41 | The :func:`parse` function can take either a filename or an open file object. | 
 | 42 |  | 
 | 43 |  | 
| Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 44 | .. function:: parse(filename_or_file, parser=None, bufsize=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 |  | 
 | 46 |    Return a :class:`Document` from the given input. *filename_or_file* may be | 
 | 47 |    either a file name, or a file-like object. *parser*, if given, must be a SAX2 | 
 | 48 |    parser object. This function will change the document handler of the parser and | 
 | 49 |    activate namespace support; other parser configuration (like setting an entity | 
 | 50 |    resolver) must have been done in advance. | 
 | 51 |  | 
 | 52 | If you have XML in a string, you can use the :func:`parseString` function | 
 | 53 | instead: | 
 | 54 |  | 
 | 55 |  | 
| Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 56 | .. function:: parseString(string, parser=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 57 |  | 
| Serhiy Storchaka | d65c949 | 2015-11-02 14:10:23 +0200 | [diff] [blame] | 58 |    Return a :class:`Document` that represents the *string*. This method creates an | 
| Serhiy Storchaka | e79be87 | 2013-08-17 00:09:55 +0300 | [diff] [blame] | 59 |    :class:`io.StringIO` object for the string and passes that on to :func:`parse`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 60 |  | 
 | 61 | Both functions return a :class:`Document` object representing the content of the | 
 | 62 | document. | 
 | 63 |  | 
 | 64 | What the :func:`parse` and :func:`parseString` functions do is connect an XML | 
 | 65 | parser with a "DOM builder" that can accept parse events from any SAX parser and | 
 | 66 | convert them into a DOM tree.  The name of the functions are perhaps misleading, | 
 | 67 | but are easy to grasp when learning the interfaces.  The parsing of the document | 
 | 68 | will be completed before these functions return; it's simply that these | 
 | 69 | functions do not provide a parser implementation themselves. | 
 | 70 |  | 
 | 71 | You can also create a :class:`Document` by calling a method on a "DOM | 
 | 72 | Implementation" object.  You can get this object either by calling the | 
 | 73 | :func:`getDOMImplementation` function in the :mod:`xml.dom` package or the | 
| Martin v. Löwis | 2f48d89 | 2011-05-09 08:05:43 +0200 | [diff] [blame] | 74 | :mod:`xml.dom.minidom` module.  Once you have a :class:`Document`, you | 
 | 75 | can add child nodes to it to populate the DOM:: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 |  | 
 | 77 |    from xml.dom.minidom import getDOMImplementation | 
 | 78 |  | 
 | 79 |    impl = getDOMImplementation() | 
 | 80 |  | 
 | 81 |    newdoc = impl.createDocument(None, "some_tag", None) | 
 | 82 |    top_element = newdoc.documentElement | 
 | 83 |    text = newdoc.createTextNode('Some textual content.') | 
 | 84 |    top_element.appendChild(text) | 
 | 85 |  | 
 | 86 | Once you have a DOM document object, you can access the parts of your XML | 
 | 87 | document through its properties and methods.  These properties are defined in | 
 | 88 | the DOM specification.  The main property of the document object is the | 
 | 89 | :attr:`documentElement` property.  It gives you the main element in the XML | 
 | 90 | document: the one that holds all others.  Here is an example program:: | 
 | 91 |  | 
 | 92 |    dom3 = parseString("<myxml>Some data</myxml>") | 
 | 93 |    assert dom3.documentElement.tagName == "myxml" | 
 | 94 |  | 
| Benjamin Peterson | 21896a3 | 2010-03-21 22:03:03 +0000 | [diff] [blame] | 95 | When you are finished with a DOM tree, you may optionally call the | 
 | 96 | :meth:`unlink` method to encourage early cleanup of the now-unneeded | 
| Martin Panter | 204bf0b | 2016-07-11 07:51:37 +0000 | [diff] [blame] | 97 | objects.  :meth:`unlink` is an :mod:`xml.dom.minidom`\ -specific | 
| Benjamin Peterson | 21896a3 | 2010-03-21 22:03:03 +0000 | [diff] [blame] | 98 | extension to the DOM API that renders the node and its descendants are | 
 | 99 | essentially useless.  Otherwise, Python's garbage collector will | 
 | 100 | eventually take care of the objects in the tree. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 101 |  | 
 | 102 | .. seealso:: | 
 | 103 |  | 
| Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 104 |    `Document Object Model (DOM) Level 1 Specification <https://www.w3.org/TR/REC-DOM-Level-1/>`_ | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 105 |       The W3C recommendation for the DOM supported by :mod:`xml.dom.minidom`. | 
 | 106 |  | 
 | 107 |  | 
 | 108 | .. _minidom-objects: | 
 | 109 |  | 
 | 110 | DOM Objects | 
 | 111 | ----------- | 
 | 112 |  | 
 | 113 | The definition of the DOM API for Python is given as part of the :mod:`xml.dom` | 
 | 114 | module documentation.  This section lists the differences between the API and | 
 | 115 | :mod:`xml.dom.minidom`. | 
 | 116 |  | 
 | 117 |  | 
 | 118 | .. method:: Node.unlink() | 
 | 119 |  | 
 | 120 |    Break internal references within the DOM so that it will be garbage collected on | 
 | 121 |    versions of Python without cyclic GC.  Even when cyclic GC is available, using | 
 | 122 |    this can make large amounts of memory available sooner, so calling this on DOM | 
 | 123 |    objects as soon as they are no longer needed is good practice.  This only needs | 
 | 124 |    to be called on the :class:`Document` object, but may be called on child nodes | 
 | 125 |    to discard children of that node. | 
 | 126 |  | 
| Kristján Valur Jónsson | 17173cf | 2010-06-09 08:13:42 +0000 | [diff] [blame] | 127 |    You can avoid calling this method explicitly by using the :keyword:`with` | 
 | 128 |    statement. The following code will automatically unlink *dom* when the | 
 | 129 |    :keyword:`with` block is exited:: | 
 | 130 |  | 
 | 131 |       with xml.dom.minidom.parse(datasource) as dom: | 
 | 132 |           ... # Work with dom. | 
 | 133 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 134 |  | 
| Georg Brandl | 2c39c77 | 2010-12-28 11:15:49 +0000 | [diff] [blame] | 135 | .. method:: Node.writexml(writer, indent="", addindent="", newl="") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 136 |  | 
 | 137 |    Write XML to the writer object.  The writer should have a :meth:`write` method | 
 | 138 |    which matches that of the file object interface.  The *indent* parameter is the | 
 | 139 |    indentation of the current node.  The *addindent* parameter is the incremental | 
 | 140 |    indentation to use for subnodes of the current one.  The *newl* parameter | 
 | 141 |    specifies the string to use to terminate newlines. | 
 | 142 |  | 
| Georg Brandl | 2c39c77 | 2010-12-28 11:15:49 +0000 | [diff] [blame] | 143 |    For the :class:`Document` node, an additional keyword argument *encoding* can | 
 | 144 |    be used to specify the encoding field of the XML header. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 145 |  | 
 | 146 |  | 
| Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 147 | .. method:: Node.toxml(encoding=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 148 |  | 
| Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 149 |    Return a string or byte string containing the XML represented by | 
 | 150 |    the DOM node. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 |  | 
| Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 152 |    With an explicit *encoding* [1]_ argument, the result is a byte | 
| Eli Bendersky | 8a80502 | 2012-07-13 09:52:39 +0300 | [diff] [blame] | 153 |    string in the specified encoding. | 
| Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 154 |    With no *encoding* argument, the result is a Unicode string, and the | 
 | 155 |    XML declaration in the resulting string does not specify an | 
 | 156 |    encoding. Encoding this string in an encoding other than UTF-8 is | 
 | 157 |    likely incorrect, since UTF-8 is the default encoding of XML. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 158 |  | 
| Georg Brandl | 7f01a13 | 2009-09-16 15:58:14 +0000 | [diff] [blame] | 159 | .. method:: Node.toprettyxml(indent="", newl="", encoding="") | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 |  | 
 | 161 |    Return a pretty-printed version of the document. *indent* specifies the | 
 | 162 |    indentation string and defaults to a tabulator; *newl* specifies the string | 
 | 163 |    emitted at the end of each line and defaults to ``\n``. | 
 | 164 |  | 
| Andrew M. Kuchling | 57a7c3d | 2010-07-26 12:54:02 +0000 | [diff] [blame] | 165 |    The *encoding* argument behaves like the corresponding argument of | 
 | 166 |    :meth:`toxml`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 167 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 168 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 169 | .. _dom-example: | 
 | 170 |  | 
 | 171 | DOM Example | 
 | 172 | ----------- | 
 | 173 |  | 
 | 174 | This example program is a fairly realistic example of a simple program. In this | 
 | 175 | particular case, we do not take much advantage of the flexibility of the DOM. | 
 | 176 |  | 
 | 177 | .. literalinclude:: ../includes/minidom-example.py | 
 | 178 |  | 
 | 179 |  | 
 | 180 | .. _minidom-and-dom: | 
 | 181 |  | 
 | 182 | minidom and the DOM standard | 
 | 183 | ---------------------------- | 
 | 184 |  | 
 | 185 | The :mod:`xml.dom.minidom` module is essentially a DOM 1.0-compatible DOM with | 
 | 186 | some DOM 2 features (primarily namespace features). | 
 | 187 |  | 
 | 188 | Usage of the DOM interface in Python is straight-forward.  The following mapping | 
 | 189 | rules apply: | 
 | 190 |  | 
 | 191 | * Interfaces are accessed through instance objects. Applications should not | 
 | 192 |   instantiate the classes themselves; they should use the creator functions | 
 | 193 |   available on the :class:`Document` object. Derived interfaces support all | 
 | 194 |   operations (and attributes) from the base interfaces, plus any new operations. | 
 | 195 |  | 
 | 196 | * Operations are used as methods. Since the DOM uses only :keyword:`in` | 
 | 197 |   parameters, the arguments are passed in normal order (from left to right). | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 198 |   There are no optional arguments. ``void`` operations return ``None``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 199 |  | 
 | 200 | * IDL attributes map to instance attributes. For compatibility with the OMG IDL | 
 | 201 |   language mapping for Python, an attribute ``foo`` can also be accessed through | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 202 |   accessor methods :meth:`_get_foo` and :meth:`_set_foo`.  ``readonly`` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 203 |   attributes must not be changed; this is not enforced at runtime. | 
 | 204 |  | 
 | 205 | * The types ``short int``, ``unsigned int``, ``unsigned long long``, and | 
 | 206 |   ``boolean`` all map to Python integer objects. | 
 | 207 |  | 
 | 208 | * The type ``DOMString`` maps to Python strings. :mod:`xml.dom.minidom` supports | 
| Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 209 |   either bytes or strings, but will normally produce strings. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 210 |   Values of type ``DOMString`` may also be ``None`` where allowed to have the IDL | 
 | 211 |   ``null`` value by the DOM specification from the W3C. | 
 | 212 |  | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 213 | * ``const`` declarations map to variables in their respective scope (e.g. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 214 |   ``xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE``); they must not be changed. | 
 | 215 |  | 
 | 216 | * ``DOMException`` is currently not supported in :mod:`xml.dom.minidom`. | 
 | 217 |   Instead, :mod:`xml.dom.minidom` uses standard Python exceptions such as | 
 | 218 |   :exc:`TypeError` and :exc:`AttributeError`. | 
 | 219 |  | 
 | 220 | * :class:`NodeList` objects are implemented using Python's built-in list type. | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 221 |   These objects provide the interface defined in the DOM specification, but with | 
 | 222 |   earlier versions of Python they do not support the official API.  They are, | 
 | 223 |   however, much more "Pythonic" than the interface defined in the W3C | 
 | 224 |   recommendations. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 225 |  | 
 | 226 | The following interfaces have no implementation in :mod:`xml.dom.minidom`: | 
 | 227 |  | 
 | 228 | * :class:`DOMTimeStamp` | 
 | 229 |  | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 230 | * :class:`DocumentType` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 231 |  | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 232 | * :class:`DOMImplementation` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 233 |  | 
 | 234 | * :class:`CharacterData` | 
 | 235 |  | 
 | 236 | * :class:`CDATASection` | 
 | 237 |  | 
 | 238 | * :class:`Notation` | 
 | 239 |  | 
 | 240 | * :class:`Entity` | 
 | 241 |  | 
 | 242 | * :class:`EntityReference` | 
 | 243 |  | 
 | 244 | * :class:`DocumentFragment` | 
 | 245 |  | 
 | 246 | Most of these reflect information in the XML document that is not of general | 
 | 247 | utility to most DOM users. | 
 | 248 |  | 
| Christian Heimes | b186d00 | 2008-03-18 15:15:01 +0000 | [diff] [blame] | 249 | .. rubric:: Footnotes | 
 | 250 |  | 
| Andrew M. Kuchling | ea64a6a | 2010-07-25 23:23:30 +0000 | [diff] [blame] | 251 | .. [#] The encoding name included in the XML output should conform to | 
 | 252 |    the appropriate standards. For example, "UTF-8" is valid, but | 
 | 253 |    "UTF8" is not valid in an XML document's declaration, even though | 
 | 254 |    Python accepts it as an encoding name. | 
| Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 255 |    See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl | 
 | 256 |    and https://www.iana.org/assignments/character-sets/character-sets.xhtml. |