blob: d743c7d3d9b63f56686776ad2c3b3edfd32895d2 [file] [log] [blame]
Fred Drakeeaf57aa2000-11-29 06:10:22 +00001\section{\module{xml.dom.minidom} ---
2 Lightweight DOM implementation}
3
4\declaremodule{standard}{xml.dom.minidom}
5\modulesynopsis{Lightweight Document Object Model (DOM) implementation.}
6\moduleauthor{Paul Prescod}{paul@prescod.net}
7\sectionauthor{Paul Prescod}{paul@prescod.net}
8\sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de}
9
10\versionadded{2.0}
11
12\module{xml.dom.minidom} is a light-weight implementation of the
13Document Object Model interface. It is intended to be
14simpler than the full DOM and also significantly smaller.
15
16DOM applications typically start by parsing some XML into a DOM. With
17\module{xml.dom.minidom}, this is done through the parse functions:
18
19\begin{verbatim}
20from xml.dom.minidom import parse, parseString
21
22dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
23
24datasource = open('c:\\temp\\mydata.xml')
25dom2 = parse(datasource) # parse an open file
26
27dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
28\end{verbatim}
29
30The parse function can take either a filename or an open file object.
31
32\begin{funcdesc}{parse}{filename_or_file{, parser}}
33 Return a \class{Document} from the given input. \var{filename_or_file}
34 may be either a file name, or a file-like object. \var{parser}, if
35 given, must be a SAX2 parser object. This function will change the
36 document handler of the parser and activate namespace support; other
37 parser configuration (like setting an entity resolver) must have been
38 done in advance.
39\end{funcdesc}
40
41If you have XML in a string, you can use the
42\function{parseString()} function instead:
43
44\begin{funcdesc}{parseString}{string\optional{, parser}}
45 Return a \class{Document} that represents the \var{string}. This
46 method creates a \class{StringIO} object for the string and passes
47 that on to \function{parse}.
48\end{funcdesc}
49
50Both functions return a \class{Document} object representing the
51content of the document.
52
53You can also create a \class{Document} node merely by instantiating a
54document object. Then you could add child nodes to it to populate
55the DOM:
56
57\begin{verbatim}
58from xml.dom.minidom import Document
59
60newdoc = Document()
61newel = newdoc.createElement("some_tag")
62newdoc.appendChild(newel)
63\end{verbatim}
64
65Once you have a DOM document object, you can access the parts of your
66XML document through its properties and methods. These properties are
67defined in the DOM specification. The main property of the document
68object is the \member{documentElement} property. It gives you the
69main element in the XML document: the one that holds all others. Here
70is an example program:
71
72\begin{verbatim}
73dom3 = parseString("<myxml>Some data</myxml>")
74assert dom3.documentElement.tagName == "myxml"
75\end{verbatim}
76
77When you are finished with a DOM, you should clean it up. This is
78necessary because some versions of Python do not support garbage
79collection of objects that refer to each other in a cycle. Until this
80restriction is removed from all versions of Python, it is safest to
81write your code as if cycles would not be cleaned up.
82
83The way to clean up a DOM is to call its \method{unlink()} method:
84
85\begin{verbatim}
86dom1.unlink()
87dom2.unlink()
88dom3.unlink()
89\end{verbatim}
90
91\method{unlink()} is a \module{xml.dom.minidom}-specific extension to
92the DOM API. After calling \method{unlink()} on a node, the node and
93its descendents are essentially useless.
94
95\begin{seealso}
96 \seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object
97 Model (DOM) Level 1 Specification}
98 {The W3C recommendation for the
99 DOM supported by \module{xml.dom.minidom}.}
100\end{seealso}
101
102
103\subsection{DOM objects \label{dom-objects}}
104
105The definition of the DOM API for Python is given as part of the
106\refmodule{xml.dom} module documentation. This section lists the
107differences between the API and \refmodule{xml.dom.minidom}.
108
109
110\begin{methoddesc}{unlink}{}
111Break internal references within the DOM so that it will be garbage
112collected on versions of Python without cyclic GC. Even when cyclic
113GC is available, using this can make large amounts of memory available
114sooner, so calling this on DOM objects as soon as they are no longer
115needed is good practice. This only needs to be called on the
116\class{Document} object, but may be called on child nodes to discard
117children of that node.
118\end{methoddesc}
119
120\begin{methoddesc}{writexml}{writer}
121Write XML to the writer object. The writer should have a
122\method{write()} method which matches that of the file object
123interface.
124\end{methoddesc}
125
126\begin{methoddesc}{toxml}{}
127Return the XML that the DOM represents as a string.
128\end{methoddesc}
129
130The following standard DOM methods have special considerations with
131\refmodule{xml.dom.minidom}:
132
133\begin{methoddesc}{cloneNode}{deep}
134Although this method was present in the version of
135\refmodule{xml.dom.minidom} packaged with Python 2.0, it was seriously
136broken. This has been corrected for subsequent releases.
137\end{methoddesc}
138
139
140\subsection{DOM Example \label{dom-example}}
141
142This example program is a fairly realistic example of a simple
143program. In this particular case, we do not take much advantage
144of the flexibility of the DOM.
145
Fred Drakeb8667702001-09-02 06:07:36 +0000146\verbatiminput{minidom-example.py}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000147
148
149\subsection{minidom and the DOM standard \label{minidom-and-dom}}
150
Fred Drake0f564ea2001-01-22 19:06:20 +0000151The \refmodule{xml.dom.minidom} module is essentially a DOM
1521.0-compatible DOM with some DOM 2 features (primarily namespace
153features).
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000154
155Usage of the DOM interface in Python is straight-forward. The
156following mapping rules apply:
157
158\begin{itemize}
159\item Interfaces are accessed through instance objects. Applications
160 should not instantiate the classes themselves; they should use
161 the creator functions available on the \class{Document} object.
162 Derived interfaces support all operations (and attributes) from
163 the base interfaces, plus any new operations.
164
165\item Operations are used as methods. Since the DOM uses only
166 \keyword{in} parameters, the arguments are passed in normal
167 order (from left to right). There are no optional
168 arguments. \keyword{void} operations return \code{None}.
169
170\item IDL attributes map to instance attributes. For compatibility
171 with the OMG IDL language mapping for Python, an attribute
172 \code{foo} can also be accessed through accessor methods
173 \method{_get_foo()} and \method{_set_foo()}. \keyword{readonly}
174 attributes must not be changed; this is not enforced at
175 runtime.
176
177\item The types \code{short int}, \code{unsigned int}, \code{unsigned
178 long long}, and \code{boolean} all map to Python integer
179 objects.
180
181\item The type \code{DOMString} maps to Python strings.
182 \refmodule{xml.dom.minidom} supports either byte or Unicode
Fred Drakee21e2bb2001-10-26 20:09:49 +0000183 strings, but will normally produce Unicode strings. Values
184 of type \code{DOMString} may also be \code{None} where allowed
185 to have the IDL \code{null} value by the DOM specification from
186 the W3C.
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000187
188\item \keyword{const} declarations map to variables in their
189 respective scope
190 (e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE});
191 they must not be changed.
192
193\item \code{DOMException} is currently not supported in
194 \refmodule{xml.dom.minidom}. Instead,
195 \refmodule{xml.dom.minidom} uses standard Python exceptions such
196 as \exception{TypeError} and \exception{AttributeError}.
197
Fred Drakee21e2bb2001-10-26 20:09:49 +0000198\item \class{NodeList} objects are implemented using Python's built-in
199 list type. Starting with Python 2.2, these objects provide the
200 interface defined in the DOM specification, but with earlier
201 versions of Python they do not support the official API. They
202 are, however, much more ``Pythonic'' than the interface defined
203 in the W3C recommendations.
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000204\end{itemize}
205
206
207The following interfaces have no implementation in
208\refmodule{xml.dom.minidom}:
209
210\begin{itemize}
Fred Drakee21e2bb2001-10-26 20:09:49 +0000211\item \class{DOMTimeStamp}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000212
Fred Drakee21e2bb2001-10-26 20:09:49 +0000213\item \class{DocumentType} (added in Python 2.1)
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000214
Fred Drakee21e2bb2001-10-26 20:09:49 +0000215\item \class{DOMImplementation} (added in Python 2.1)
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000216
Fred Drakee21e2bb2001-10-26 20:09:49 +0000217\item \class{CharacterData}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000218
Fred Drakee21e2bb2001-10-26 20:09:49 +0000219\item \class{CDATASection}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000220
Fred Drakee21e2bb2001-10-26 20:09:49 +0000221\item \class{Notation}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000222
Fred Drakee21e2bb2001-10-26 20:09:49 +0000223\item \class{Entity}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000224
Fred Drakee21e2bb2001-10-26 20:09:49 +0000225\item \class{EntityReference}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000226
Fred Drakee21e2bb2001-10-26 20:09:49 +0000227\item \class{DocumentFragment}
Fred Drakeeaf57aa2000-11-29 06:10:22 +0000228\end{itemize}
229
230Most of these reflect information in the XML document that is not of
231general utility to most DOM users.