blob: a9c9f67a714de6755bfbdc58d80fe006e6f08afd [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
2=================================================================
3
4.. module:: xml.dom.pulldom
5 :synopsis: Support for building partial DOM trees from SAX events.
6.. moduleauthor:: Paul Prescod <paul@prescod.net>
7
Raymond Hettinger3029aff2011-02-10 08:09:36 +00008**Source code:** :source:`Lib/xml/dom/pulldom.py`
9
10--------------
Georg Brandl116aa622007-08-15 14:28:22 +000011
Eli Bendersky3fb05a92012-03-16 14:37:14 +020012The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
13asked to produce DOM-accessible fragments of the document where necessary. The
14basic concept involves pulling "events" from a stream of incoming XML and
15processing them. In contrast to SAX which also employs an event-driven
16processing model together with callbacks, the user of a pull parser is
17responsible for explicitly pulling events from the stream, looping over those
18events until either processing is finished or an error condition occurs.
19
Christian Heimes7380a672013-03-26 17:35:55 +010020
21.. warning::
22
23 The :mod:`xml.dom.pulldom` module is not secure against
24 maliciously constructed data. If you need to parse untrusted or
25 unauthenticated data see :ref:`xml-vulnerabilities`.
26
27
Eli Bendersky3fb05a92012-03-16 14:37:14 +020028Example::
29
30 from xml.dom import pulldom
31
32 doc = pulldom.parse('sales_items.xml')
33 for event, node in doc:
34 if event == pulldom.START_ELEMENT and node.tagName == 'item':
35 if int(node.getAttribute('price')) > 50:
36 doc.expandNode(node)
37 print(node.toxml())
38
39``event`` is a constant and can be one of:
40
41* :data:`START_ELEMENT`
42* :data:`END_ELEMENT`
43* :data:`COMMENT`
44* :data:`START_DOCUMENT`
45* :data:`END_DOCUMENT`
46* :data:`CHARACTERS`
47* :data:`PROCESSING_INSTRUCTION`
48* :data:`IGNORABLE_WHITESPACE`
49
50``node`` is a object of type :class:`xml.dom.minidom.Document`,
51:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
52
53Since the document is treated as a "flat" stream of events, the document "tree"
54is implicitly traversed and the desired elements are found regardless of their
Eli Bendersky969b8da2012-03-16 16:49:58 +020055depth in the tree. In other words, one does not need to consider hierarchical
56issues such as recursive searching of the document nodes, although if the
57context of elements were important, one would either need to maintain some
58context-related state (i.e. remembering where one is in the document at any
59given point) or to make use of the :func:`DOMEventStream.expandNode` method
60and switch to DOM-related processing.
Georg Brandl116aa622007-08-15 14:28:22 +000061
62
Eli Bendersky3fb05a92012-03-16 14:37:14 +020063.. class:: PullDom(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000064
Eli Bendersky3fb05a92012-03-16 14:37:14 +020065 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000066
67
Georg Brandl7f01a132009-09-16 15:58:14 +000068.. class:: SAX2DOM(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000069
Eli Bendersky3fb05a92012-03-16 14:37:14 +020070 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000071
72
Georg Brandl7f01a132009-09-16 15:58:14 +000073.. function:: parse(stream_or_string, parser=None, bufsize=None)
Georg Brandl116aa622007-08-15 14:28:22 +000074
Eli Bendersky3fb05a92012-03-16 14:37:14 +020075 Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
76 either a file name, or a file-like object. *parser*, if given, must be a
Serhiy Storchaka15e65902013-08-29 10:28:44 +030077 :class:`~xml.sax.xmlreader.XMLReader` object. This function will change the
78 document handler of the
Eli Bendersky3fb05a92012-03-16 14:37:14 +020079 parser and activate namespace support; other parser configuration (like
80 setting an entity resolver) must have been done in advance.
81
82If you have XML in a string, you can use the :func:`parseString` function instead:
Georg Brandl116aa622007-08-15 14:28:22 +000083
Georg Brandl7f01a132009-09-16 15:58:14 +000084.. function:: parseString(string, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +000085
Eli Bendersky969b8da2012-03-16 16:49:58 +020086 Return a :class:`DOMEventStream` that represents the (Unicode) *string*.
Georg Brandl116aa622007-08-15 14:28:22 +000087
88.. data:: default_bufsize
89
90 Default value for the *bufsize* parameter to :func:`parse`.
91
Georg Brandl55ac8f02007-09-01 13:51:09 +000092 The value of this variable can be changed before calling :func:`parse` and
93 the new value will take effect.
Georg Brandl116aa622007-08-15 14:28:22 +000094
Georg Brandl116aa622007-08-15 14:28:22 +000095.. _domeventstream-objects:
96
97DOMEventStream Objects
98----------------------
99
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200100.. class:: DOMEventStream(stream, parser, bufsize)
Georg Brandl116aa622007-08-15 14:28:22 +0000101
102
Eli Bendersky969b8da2012-03-16 16:49:58 +0200103 .. method:: getEvent()
Georg Brandl116aa622007-08-15 14:28:22 +0000104
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200105 Return a tuple containing *event* and the current *node* as
Eli Bendersky969b8da2012-03-16 16:49:58 +0200106 :class:`xml.dom.minidom.Document` if event equals :data:`START_DOCUMENT`,
107 :class:`xml.dom.minidom.Element` if event equals :data:`START_ELEMENT` or
108 :data:`END_ELEMENT` or :class:`xml.dom.minidom.Text` if event equals
109 :data:`CHARACTERS`.
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200110 The current node does not contain informations about its children, unless
111 :func:`expandNode` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000112
Eli Bendersky969b8da2012-03-16 16:49:58 +0200113 .. method:: expandNode(node)
Georg Brandl116aa622007-08-15 14:28:22 +0000114
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200115 Expands all children of *node* into *node*. Example::
Georg Brandl116aa622007-08-15 14:28:22 +0000116
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200117 xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
118 doc = pulldom.parseString(xml)
119 for event, node in doc:
120 if event == pulldom.START_ELEMENT and node.tagName == 'p':
121 # Following statement only prints '<p/>'
122 print(node.toxml())
123 doc.exandNode(node)
124 # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
125 print(node.toxml())
126
127 .. method:: DOMEventStream.reset()
Georg Brandl116aa622007-08-15 14:28:22 +0000128