Blame - Doc/lib/xmldomminidom.tex - platform/external/python/cpython3

blob: 055711305e084d5f8b8c3eb4d39881907e0863c7 [file] [log] [blame]

Fred Drake	eaf57aa	2000-11-29 06:10:22 +0000	[diff] [blame]	1	\section{\module{xml.dom.minidom} ---
				2	Lightweight DOM implementation}
				3
				4	\declaremodule{standard}{xml.dom.minidom}
				5	\modulesynopsis{Lightweight Document Object Model (DOM) implementation.}
				6	\moduleauthor{Paul Prescod}{paul@prescod.net}
				7	\sectionauthor{Paul Prescod}{paul@prescod.net}
				8	\sectionauthor{Martin v. L\"owis}{loewis@informatik.hu-berlin.de}
				9
				10	\versionadded{2.0}
				11
				12	\module{xml.dom.minidom} is a light-weight implementation of the
				13	Document Object Model interface. It is intended to be
				14	simpler than the full DOM and also significantly smaller.
				15
				16	DOM applications typically start by parsing some XML into a DOM. With
				17	\module{xml.dom.minidom}, this is done through the parse functions:
				18
				19	\begin{verbatim}
				20	from xml.dom.minidom import parse, parseString
				21
				22	dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name
				23
				24	datasource = open('c:\\temp\\mydata.xml')
				25	dom2 = parse(datasource) # parse an open file
				26
				27	dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')
				28	\end{verbatim}
				29
				30	The parse function can take either a filename or an open file object.
				31
				32	\begin{funcdesc}{parse}{filename_or_file{, parser}}
				33	Return a \class{Document} from the given input. \var{filename_or_file}
				34	may be either a file name, or a file-like object. \var{parser}, if
				35	given, must be a SAX2 parser object. This function will change the
				36	document handler of the parser and activate namespace support; other
				37	parser configuration (like setting an entity resolver) must have been
				38	done in advance.
				39	\end{funcdesc}
				40
				41	If you have XML in a string, you can use the
				42	\function{parseString()} function instead:
				43
				44	\begin{funcdesc}{parseString}{string\optional{, parser}}
				45	Return a \class{Document} that represents the \var{string}. This
				46	method creates a \class{StringIO} object for the string and passes
				47	that on to \function{parse}.
				48	\end{funcdesc}
				49
				50	Both functions return a \class{Document} object representing the
				51	content of the document.
				52
				53	You can also create a \class{Document} node merely by instantiating a
				54	document object. Then you could add child nodes to it to populate
				55	the DOM:
				56
				57	\begin{verbatim}
				58	from xml.dom.minidom import Document
				59
				60	newdoc = Document()
				61	newel = newdoc.createElement("some_tag")
				62	newdoc.appendChild(newel)
				63	\end{verbatim}
				64
				65	Once you have a DOM document object, you can access the parts of your
				66	XML document through its properties and methods. These properties are
				67	defined in the DOM specification. The main property of the document
				68	object is the \member{documentElement} property. It gives you the
				69	main element in the XML document: the one that holds all others. Here
				70	is an example program:
				71
				72	\begin{verbatim}
				73	dom3 = parseString("<myxml>Some data</myxml>")
				74	assert dom3.documentElement.tagName == "myxml"
				75	\end{verbatim}
				76
				77	When you are finished with a DOM, you should clean it up. This is
				78	necessary because some versions of Python do not support garbage
				79	collection of objects that refer to each other in a cycle. Until this
				80	restriction is removed from all versions of Python, it is safest to
				81	write your code as if cycles would not be cleaned up.
				82
				83	The way to clean up a DOM is to call its \method{unlink()} method:
				84
				85	\begin{verbatim}
				86	dom1.unlink()
				87	dom2.unlink()
				88	dom3.unlink()
				89	\end{verbatim}
				90
				91	\method{unlink()} is a \module{xml.dom.minidom}-specific extension to
				92	the DOM API. After calling \method{unlink()} on a node, the node and
				93	its descendents are essentially useless.
				94
				95	\begin{seealso}
				96	\seetitle[http://www.w3.org/TR/REC-DOM-Level-1/]{Document Object
				97	Model (DOM) Level 1 Specification}
				98	{The W3C recommendation for the
				99	DOM supported by \module{xml.dom.minidom}.}
				100	\end{seealso}
				101
				102
				103	\subsection{DOM objects \label{dom-objects}}
				104
				105	The definition of the DOM API for Python is given as part of the
				106	\refmodule{xml.dom} module documentation. This section lists the
				107	differences between the API and \refmodule{xml.dom.minidom}.
				108
				109
				110	\begin{methoddesc}{unlink}{}
				111	Break internal references within the DOM so that it will be garbage
				112	collected on versions of Python without cyclic GC. Even when cyclic
				113	GC is available, using this can make large amounts of memory available
				114	sooner, so calling this on DOM objects as soon as they are no longer
				115	needed is good practice. This only needs to be called on the
				116	\class{Document} object, but may be called on child nodes to discard
				117	children of that node.
				118	\end{methoddesc}
				119
				120	\begin{methoddesc}{writexml}{writer}
				121	Write XML to the writer object. The writer should have a
				122	\method{write()} method which matches that of the file object
				123	interface.
				124	\end{methoddesc}
				125
				126	\begin{methoddesc}{toxml}{}
				127	Return the XML that the DOM represents as a string.
				128	\end{methoddesc}
				129
				130	The following standard DOM methods have special considerations with
				131	\refmodule{xml.dom.minidom}:
				132
				133	\begin{methoddesc}{cloneNode}{deep}
				134	Although this method was present in the version of
				135	\refmodule{xml.dom.minidom} packaged with Python 2.0, it was seriously
				136	broken. This has been corrected for subsequent releases.
				137	\end{methoddesc}
				138
				139
				140	\subsection{DOM Example \label{dom-example}}
				141
				142	This example program is a fairly realistic example of a simple
				143	program. In this particular case, we do not take much advantage
				144	of the flexibility of the DOM.
				145
				146	\begin{verbatim}
				147	import xml.dom.minidom
				148
				149	document = """\
				150	<slideshow>
				151	<title>Demo slideshow</title>
				152	<slide><title>Slide title</title>
				153	<point>This is a demo</point>
				154	<point>Of a program for processing slides</point>
				155	</slide>
				156
				157	<slide><title>Another demo slide</title>
				158	<point>It is important</point>
				159	<point>To have more than</point>
				160	<point>one slide</point>
				161	</slide>
				162	</slideshow>
				163	"""
				164
				165	dom = xml.dom.minidom.parseString(document)
				166
				167	space = " "
				168	def getText(nodelist):
				169	rc = ""
				170	for node in nodelist:
				171	if node.nodeType == node.TEXT_NODE:
				172	rc = rc + node.data
				173	return rc
				174
				175	def handleSlideshow(slideshow):
				176	print "<html>"
				177	handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])
				178	slides = slideshow.getElementsByTagName("slide")
				179	handleToc(slides)
				180	handleSlides(slides)
				181	print "</html>"
				182
				183	def handleSlides(slides):
				184	for slide in slides:
				185	handleSlide(slide)
				186
				187	def handleSlide(slide):
				188	handleSlideTitle(slide.getElementsByTagName("title")[0])
				189	handlePoints(slide.getElementsByTagName("point"))
				190
				191	def handleSlideshowTitle(title):
				192	print "<title>%s</title>" % getText(title.childNodes)
				193
				194	def handleSlideTitle(title):
				195	print "<h2>%s</h2>" % getText(title.childNodes)
				196
				197	def handlePoints(points):
				198	print "<ul>"
				199	for point in points:
				200	handlePoint(point)
				201	print "</ul>"
				202
				203	def handlePoint(point):
				204	print "<li>%s</li>" % getText(point.childNodes)
				205
				206	def handleToc(slides):
				207	for slide in slides:
				208	title = slide.getElementsByTagName("title")[0]
				209	print "<p>%s</p>" % getText(title.childNodes)
				210
				211	handleSlideshow(dom)
				212	\end{verbatim}
				213
				214
				215	\subsection{minidom and the DOM standard \label{minidom-and-dom}}
				216
				217	\refmodule{xml.dom.minidom} is basically a DOM 1.0-compatible DOM with
				218	some DOM 2 features (primarily namespace features).
				219
				220	Usage of the DOM interface in Python is straight-forward. The
				221	following mapping rules apply:
				222
				223	\begin{itemize}
				224	\item Interfaces are accessed through instance objects. Applications
				225	should not instantiate the classes themselves; they should use
				226	the creator functions available on the \class{Document} object.
				227	Derived interfaces support all operations (and attributes) from
				228	the base interfaces, plus any new operations.
				229
				230	\item Operations are used as methods. Since the DOM uses only
				231	\keyword{in} parameters, the arguments are passed in normal
				232	order (from left to right). There are no optional
				233	arguments. \keyword{void} operations return \code{None}.
				234
				235	\item IDL attributes map to instance attributes. For compatibility
				236	with the OMG IDL language mapping for Python, an attribute
				237	\code{foo} can also be accessed through accessor methods
				238	\method{_get_foo()} and \method{_set_foo()}. \keyword{readonly}
				239	attributes must not be changed; this is not enforced at
				240	runtime.
				241
				242	\item The types \code{short int}, \code{unsigned int}, \code{unsigned
				243	long long}, and \code{boolean} all map to Python integer
				244	objects.
				245
				246	\item The type \code{DOMString} maps to Python strings.
				247	\refmodule{xml.dom.minidom} supports either byte or Unicode
				248	strings, but will normally produce Unicode strings. Attributes
				249	of type \code{DOMString} may also be \code{None}.
				250
				251	\item \keyword{const} declarations map to variables in their
				252	respective scope
				253	(e.g. \code{xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE});
				254	they must not be changed.
				255
				256	\item \code{DOMException} is currently not supported in
				257	\refmodule{xml.dom.minidom}. Instead,
				258	\refmodule{xml.dom.minidom} uses standard Python exceptions such
				259	as \exception{TypeError} and \exception{AttributeError}.
				260
				261	\item \class{NodeList} objects are implemented as Python's built-in
				262	list type, so don't support the official API, but are much more
				263	``Pythonic.''
Fred Drake	eaf57aa	2000-11-29 06:10:22 +0000	[diff] [blame]	264	\end{itemize}
				265
				266
				267	The following interfaces have no implementation in
				268	\refmodule{xml.dom.minidom}:
				269
				270	\begin{itemize}
				271	\item DOMTimeStamp
				272
Fred Drake	16942f2	2000-12-07 04:47:51 +0000	[diff] [blame]	273	\item DocumentType (added in Python 2.1)
Fred Drake	eaf57aa	2000-11-29 06:10:22 +0000	[diff] [blame]	274
Fred Drake	16942f2	2000-12-07 04:47:51 +0000	[diff] [blame]	275	\item DOMImplementation (added in Python 2.1)
Fred Drake	eaf57aa	2000-11-29 06:10:22 +0000	[diff] [blame]	276
				277	\item CharacterData
				278
				279	\item CDATASection
				280
				281	\item Notation
				282
				283	\item Entity
				284
				285	\item EntityReference
				286
				287	\item DocumentFragment
				288	\end{itemize}
				289
				290	Most of these reflect information in the XML document that is not of
				291	general utility to most DOM users.