blob: 7dd38c0c3a92ee9b6a6e78f1e9789ac5b3767131 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.dom.pulldom` --- Support for building partial DOM trees
2=================================================================
3
4.. module:: xml.dom.pulldom
5 :synopsis: Support for building partial DOM trees from SAX events.
6.. moduleauthor:: Paul Prescod <paul@prescod.net>
7
Raymond Hettinger3029aff2011-02-10 08:09:36 +00008**Source code:** :source:`Lib/xml/dom/pulldom.py`
9
10--------------
Georg Brandl116aa622007-08-15 14:28:22 +000011
Eli Bendersky3fb05a92012-03-16 14:37:14 +020012The :mod:`xml.dom.pulldom` module provides a "pull parser" which can also be
13asked to produce DOM-accessible fragments of the document where necessary. The
14basic concept involves pulling "events" from a stream of incoming XML and
15processing them. In contrast to SAX which also employs an event-driven
16processing model together with callbacks, the user of a pull parser is
17responsible for explicitly pulling events from the stream, looping over those
18events until either processing is finished or an error condition occurs.
19
20Example::
21
22 from xml.dom import pulldom
23
24 doc = pulldom.parse('sales_items.xml')
25 for event, node in doc:
26 if event == pulldom.START_ELEMENT and node.tagName == 'item':
27 if int(node.getAttribute('price')) > 50:
28 doc.expandNode(node)
29 print(node.toxml())
30
31``event`` is a constant and can be one of:
32
33* :data:`START_ELEMENT`
34* :data:`END_ELEMENT`
35* :data:`COMMENT`
36* :data:`START_DOCUMENT`
37* :data:`END_DOCUMENT`
38* :data:`CHARACTERS`
39* :data:`PROCESSING_INSTRUCTION`
40* :data:`IGNORABLE_WHITESPACE`
41
42``node`` is a object of type :class:`xml.dom.minidom.Document`,
43:class:`xml.dom.minidom.Element` or :class:`xml.dom.minidom.Text`.
44
45Since the document is treated as a "flat" stream of events, the document "tree"
46is implicitly traversed and the desired elements are found regardless of their
47depth in the tree. In other words, one does not need to consider hierarchical issues
48such as recursive searching of the document nodes, although if the context of
49elements were important, one would either need to maintain some context-related
50state (ie. remembering where one is in the document at any given point) or to
51make use of the :func:`DOMEventStream.expandNode` method and switch to DOM-related processing.
Georg Brandl116aa622007-08-15 14:28:22 +000052
53
Eli Bendersky3fb05a92012-03-16 14:37:14 +020054.. class:: PullDom(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000055
Eli Bendersky3fb05a92012-03-16 14:37:14 +020056 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000057
58
Georg Brandl7f01a132009-09-16 15:58:14 +000059.. class:: SAX2DOM(documentFactory=None)
Georg Brandl116aa622007-08-15 14:28:22 +000060
Eli Bendersky3fb05a92012-03-16 14:37:14 +020061 Subclass of :class:`xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +000062
63
Georg Brandl7f01a132009-09-16 15:58:14 +000064.. function:: parse(stream_or_string, parser=None, bufsize=None)
Georg Brandl116aa622007-08-15 14:28:22 +000065
Eli Bendersky3fb05a92012-03-16 14:37:14 +020066 Return a :class:`DOMEventStream` from the given input. *stream_or_string* may be
67 either a file name, or a file-like object. *parser*, if given, must be a
68 :class:`XmlReader` object. This function will change the document handler of the
69 parser and activate namespace support; other parser configuration (like
70 setting an entity resolver) must have been done in advance.
71
72If you have XML in a string, you can use the :func:`parseString` function instead:
Georg Brandl116aa622007-08-15 14:28:22 +000073
74
Georg Brandl7f01a132009-09-16 15:58:14 +000075.. function:: parseString(string, parser=None)
Georg Brandl116aa622007-08-15 14:28:22 +000076
Eli Bendersky3fb05a92012-03-16 14:37:14 +020077 Return a :class:`DOMEventStream` that represents the (unicode) *string*.
Georg Brandl116aa622007-08-15 14:28:22 +000078
79
80.. data:: default_bufsize
81
82 Default value for the *bufsize* parameter to :func:`parse`.
83
Georg Brandl55ac8f02007-09-01 13:51:09 +000084 The value of this variable can be changed before calling :func:`parse` and
85 the new value will take effect.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87
88.. _domeventstream-objects:
89
90DOMEventStream Objects
91----------------------
92
Eli Bendersky3fb05a92012-03-16 14:37:14 +020093.. class:: DOMEventStream(stream, parser, bufsize)
Georg Brandl116aa622007-08-15 14:28:22 +000094
95
Eli Bendersky3fb05a92012-03-16 14:37:14 +020096 .. method:: DOMEventStream.getEvent()
Georg Brandl116aa622007-08-15 14:28:22 +000097
Eli Bendersky3fb05a92012-03-16 14:37:14 +020098 Return a tuple containing *event* and the current *node* as
99 :class:`xml.dom.minidom.Document` if event equals START_DOCUMENT,
100 :class:`xml.dom.minidom.Element` if event equals START_ELEMENT or
101 END_ELEMENT or :class:`xml.dom.minidom.Text` if event equals CHARACTERS.
102 The current node does not contain informations about its children, unless
103 :func:`expandNode` is called.
Georg Brandl116aa622007-08-15 14:28:22 +0000104
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200105 .. method:: DOMEventStream.expandNode(node)
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200107 Expands all children of *node* into *node*. Example::
Georg Brandl116aa622007-08-15 14:28:22 +0000108
Eli Bendersky3fb05a92012-03-16 14:37:14 +0200109 xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
110 doc = pulldom.parseString(xml)
111 for event, node in doc:
112 if event == pulldom.START_ELEMENT and node.tagName == 'p':
113 # Following statement only prints '<p/>'
114 print(node.toxml())
115 doc.exandNode(node)
116 # Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
117 print(node.toxml())
118
119 .. method:: DOMEventStream.reset()
Georg Brandl116aa622007-08-15 14:28:22 +0000120