blob: 660c75c1a1b340dc2f740bbd5f4af7f52b325876 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
2=================================================================
3
4.. module:: xml.dom.pulldom
5 :synopsis: Support for building partial DOM trees from SAX events.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Paul Prescod <paul@prescod.net>
8
Raymond Hettinger3029aff2011-02-10 08:09:36 +00009**Source code:** :source:`Lib/xml/dom/pulldom.py`
10
11--------------
Georg Brandl116aa622007-08-15 14:28:22 +000012
Eli Bendersky3fb05a92012-03-16 14:37:14 +020013The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
14asked to produce DOM-accessible fragments of the document where necessary. The
15basic concept involves pulling "events" from a stream of incoming XML and
16processing them. In contrast to SAX which also employs an event-driven
17processing model together with callbacks, the user of a pull parser is
18responsible for explicitly pulling events from the stream, looping over those
19events until either processing is finished or an error condition occurs.
20
Christian Heimes7380a672013-03-26 17:35:55 +010021
22.. warning::
23
24 The :mod:`xml.dom.pulldom` module is not secure against
25 maliciously constructed data. If you need to parse untrusted or
26 unauthenticated data see :ref:`xml-vulnerabilities`.
27
Serhiy Storchakabf99bcf2018-12-19 15:29:04 +020028.. versionchanged:: 3.7.1
Christian Heimes17b1d5d2018-09-23 09:50:25 +020029
30 The SAX parser no longer processes general external entities by default to
31 increase security by default. To enable processing of external entities,
32 pass a custom parser instance in::
33
34 from xml.dom.pulldom import parse
35 from xml.sax import make_parser
36 from xml.sax.handler import feature_external_ges
37
38 parser = make_parser()
39 parser.setFeature(feature_external_ges, True)
40 parse(filename, parser=parser)
41
Christian Heimes7380a672013-03-26 17:35:55 +010042
Eli Bendersky3fb05a92012-03-16 14:37:14 +020043Example::
44
45 from xml.dom import pulldom
46
47 doc = pulldom.parse('sales_items.xml')
48 for event, node in doc:
49 if event == pulldom.START_ELEMENT and node.tagName == 'item':
50 if int(node.getAttribute('price')) > 50:
51 doc.expandNode(node)
52 print(node.toxml())
53
54``event`` is a constant and can be one of:
55
56* :data:`START_ELEMENT`
57* :data:`END_ELEMENT`
58* :data:`COMMENT`
59* :data:`START_DOCUMENT`
60* :data:`END_DOCUMENT`
61* :data:`CHARACTERS`
62* :data:`PROCESSING_INSTRUCTION`
63* :data:`IGNORABLE_WHITESPACE`
64
Martin Panter7462b6492015-11-02 03:37:02 +000065``node`` is an object of type :class:`xml.dom.minidom.Document`,
Eli Bendersky3fb05a92012-03-16 14:37:14 +020066:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
67
68Since the document is treated as a "flat" stream of events, the document "tree"
69is implicitly traversed and the desired elements are found regardless of their
Eli Bendersky969b8da2012-03-16 16:49:58 +020070depth in the tree. In other words, one does not need to consider hierarchical
71issues such as recursive searching of the document nodes, although if the
72context of elements were important, one would either need to maintain some
73context-related state (i.e. remembering where one is in the document at any
74given point) or to make use of the :func:`DOMEventStream.expandNode` method
75and switch to DOM-related processing.
Georg Brandl116aa622007-08-15 14:28:22 +000076
77
Eli Bendersky3fb05a92012-03-16 14:37:14 +020078.. class:: PullDom(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000079
Eli Bendersky3fb05a92012-03-16 14:37:14 +020080 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000081
82
Georg Brandl7f01a132009-09-16 15:58:14 +000083.. class:: SAX2DOM(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000084
Eli Bendersky3fb05a92012-03-16 14:37:14 +020085 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87
Georg Brandl7f01a132009-09-16 15:58:14 +000088.. function:: parse(stream_or_string, parser=None, bufsize=None)
Georg Brandl116aa622007-08-15 14:28:22 +000089
Eli Bendersky3fb05a92012-03-16 14:37:14 +020090 Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
Martin Panterd210a702016-08-20 08:03:06 +000091 either a file name, or a file-like object. *parser*, if given, must be an
Serhiy Storchaka15e65902013-08-29 10:28:44 +030092 :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
93 document handler of the
Eli Bendersky3fb05a92012-03-16 14:37:14 +020094 parser and activate namespace support; other parser configuration (like
95 setting an entity resolver) must have been done in advance.
96
97If you have XML in a string, you can use the :func:`parseString` function instead:
Georg Brandl116aa622007-08-15 14:28:22 +000098
Georg Brandl7f01a132009-09-16 15:58:14 +000099.. function:: parseString(string, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +0000100
Eli Bendersky969b8da2012-03-16 16:49:58 +0200101 Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
Georg Brandl116aa622007-08-15 14:28:22 +0000102
103.. data:: default_bufsize
104
105 Default value for the *bufsize* parameter to :func:`parse`.
106
Georg Brandl55ac8f02007-09-01 13:51:09 +0000107 The value of this variable can be changed before calling :func:`parse` and
108 the new value will take effect.
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Georg Brandl116aa622007-08-15 14:28:22 +0000110.. _domeventstream-objects:
111
112DOMEventStream Objects
113----------------------
114
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200115.. class:: DOMEventStream(stream, parser, bufsize)
Georg Brandl116aa622007-08-15 14:28:22 +0000116
Berker Peksag84a13fb2018-08-11 09:05:04 +0300117 .. deprecated:: 3.8
118 Support for :meth:`sequence protocol <__getitem__>` is deprecated.
Georg Brandl116aa622007-08-15 14:28:22 +0000119
Eli Bendersky969b8da2012-03-16 16:49:58 +0200120 .. method:: getEvent()
Georg Brandl116aa622007-08-15 14:28:22 +0000121
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200122 Return a tuple containing *event* and the current *node* as
Eli Bendersky969b8da2012-03-16 16:49:58 +0200123 :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
124 :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
125 :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
126 :data:`CHARACTERS`.
delirious-lettuce3378b202017-05-19 14:37:57 -0600127 The current node does not contain information about its children, unless
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200128 :func:`expandNode` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000129
Eli Bendersky969b8da2012-03-16 16:49:58 +0200130 .. method:: expandNode(node)
Georg Brandl116aa622007-08-15 14:28:22 +0000131
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200132 Expands all children of *node* into *node*. Example::
Georg Brandl116aa622007-08-15 14:28:22 +0000133
Berker Peksag13b3acd2016-03-30 16:28:43 +0300134 from xml.dom import pulldom
135
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200136 xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
137 doc = pulldom.parseString(xml)
138 for event, node in doc:
139 if event == pulldom.START_ELEMENT and node.tagName == 'p':
140 # Following statement only prints '<p/>'
141 print(node.toxml())
Berker Peksag13b3acd2016-03-30 16:28:43 +0300142 doc.expandNode(node)
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200143 # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
144 print(node.toxml())
145
146 .. method:: DOMEventStream.reset()
Georg Brandl116aa622007-08-15 14:28:22 +0000147