blob: 113e9e93fb04ff4c19fae65f400e069c43bb3d3b [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`xml.sax.xmlreader` --- Interface for XML parsers
2======================================================
3
4.. module:: xml.sax.xmlreader
5 :synopsis: Interface which SAX-compliant XML parsers must implement.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Lars Marius Garshol <larsga@garshol.priv.no>
8.. sectionauthor:: Martin v. Löwis <martin@v.loewis.de>
9
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040010**Source code:** :source:`Lib/xml/sax/xmlreader.py`
11
12--------------
Georg Brandl116aa622007-08-15 14:28:22 +000013
Georg Brandl116aa622007-08-15 14:28:22 +000014SAX parsers implement the :class:`XMLReader` interface. They are implemented in
15a Python module, which must provide a function :func:`create_parser`. This
16function is invoked by :func:`xml.sax.make_parser` with no arguments to create
17a new parser object.
18
19
20.. class:: XMLReader()
21
22 Base class which can be inherited by SAX parsers.
23
24
25.. class:: IncrementalParser()
26
27 In some cases, it is desirable not to parse an input source at once, but to feed
28 chunks of the document as they get available. Note that the reader will normally
29 not read the entire file, but read it in chunks as well; still :meth:`parse`
30 won't return until the entire document is processed. So these interfaces should
31 be used if the blocking behaviour of :meth:`parse` is not desirable.
32
33 When the parser is instantiated it is ready to begin accepting data from the
34 feed method immediately. After parsing has been finished with a call to close
35 the reset method must be called to make the parser ready to accept new data,
36 either from feed or using the parse method.
37
38 Note that these methods must *not* be called during parsing, that is, after
39 parse has been called and before it returns.
40
41 By default, the class also implements the parse method of the XMLReader
42 interface using the feed, close and reset methods of the IncrementalParser
43 interface as a convenience to SAX 2.0 driver writers.
44
45
46.. class:: Locator()
47
48 Interface for associating a SAX event with a document location. A locator object
49 will return valid results only during calls to DocumentHandler methods; at any
50 other time, the results are unpredictable. If information is not available,
51 methods may return ``None``.
52
53
Georg Brandl7f01a132009-09-16 15:58:14 +000054.. class:: InputSource(system_id=None)
Georg Brandl116aa622007-08-15 14:28:22 +000055
56 Encapsulation of the information needed by the :class:`XMLReader` to read
57 entities.
58
59 This class may include information about the public identifier, system
60 identifier, byte stream (possibly with character encoding information) and/or
61 the character stream of an entity.
62
63 Applications will create objects of this class for use in the
64 :meth:`XMLReader.parse` method and for returning from
65 EntityResolver.resolveEntity.
66
67 An :class:`InputSource` belongs to the application, the :class:`XMLReader` is
68 not allowed to modify :class:`InputSource` objects passed to it from the
69 application, although it may make copies and modify those.
70
71
72.. class:: AttributesImpl(attrs)
73
74 This is an implementation of the :class:`Attributes` interface (see section
75 :ref:`attributes-objects`). This is a dictionary-like object which
76 represents the element attributes in a :meth:`startElement` call. In addition
77 to the most useful dictionary operations, it supports a number of other
78 methods as described by the interface. Objects of this class should be
79 instantiated by readers; *attrs* must be a dictionary-like object containing
80 a mapping from attribute names to attribute values.
81
82
83.. class:: AttributesNSImpl(attrs, qnames)
84
85 Namespace-aware variant of :class:`AttributesImpl`, which will be passed to
86 :meth:`startElementNS`. It is derived from :class:`AttributesImpl`, but
87 understands attribute names as two-tuples of *namespaceURI* and
88 *localname*. In addition, it provides a number of methods expecting qualified
89 names as they appear in the original document. This class implements the
90 :class:`AttributesNS` interface (see section :ref:`attributes-ns-objects`).
91
92
93.. _xmlreader-objects:
94
95XMLReader Objects
96-----------------
97
98The :class:`XMLReader` interface supports the following methods:
99
100
101.. method:: XMLReader.parse(source)
102
103 Process an input source, producing SAX events. The *source* object can be a
104 system identifier (a string identifying the input source -- typically a file
Mickaël Schoentgen929b7042019-04-14 09:16:54 +0000105 name or a URL), a :class:`pathlib.Path` or :term:`path-like <path-like object>`
106 object, or an :class:`InputSource` object. When
Georg Brandl116aa622007-08-15 14:28:22 +0000107 :meth:`parse` returns, the input is completely processed, and the parser object
Serhiy Storchaka61de0872015-04-02 21:00:13 +0300108 can be discarded or reset.
109
110 .. versionchanged:: 3.5
111 Added support of character streams.
Georg Brandl116aa622007-08-15 14:28:22 +0000112
Mickaël Schoentgen929b7042019-04-14 09:16:54 +0000113 .. versionchanged:: 3.8
114 Added support of path-like objects.
115
Georg Brandl116aa622007-08-15 14:28:22 +0000116
117.. method:: XMLReader.getContentHandler()
118
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300119 Return the current :class:`~xml.sax.handler.ContentHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +0000120
121
122.. method:: XMLReader.setContentHandler(handler)
123
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300124 Set the current :class:`~xml.sax.handler.ContentHandler`. If no
125 :class:`~xml.sax.handler.ContentHandler` is set, content events will be
126 discarded.
Georg Brandl116aa622007-08-15 14:28:22 +0000127
128
129.. method:: XMLReader.getDTDHandler()
130
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300131 Return the current :class:`~xml.sax.handler.DTDHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +0000132
133
134.. method:: XMLReader.setDTDHandler(handler)
135
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300136 Set the current :class:`~xml.sax.handler.DTDHandler`. If no
137 :class:`~xml.sax.handler.DTDHandler` is set, DTD
Georg Brandl116aa622007-08-15 14:28:22 +0000138 events will be discarded.
139
140
141.. method:: XMLReader.getEntityResolver()
142
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300143 Return the current :class:`~xml.sax.handler.EntityResolver`.
Georg Brandl116aa622007-08-15 14:28:22 +0000144
145
146.. method:: XMLReader.setEntityResolver(handler)
147
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300148 Set the current :class:`~xml.sax.handler.EntityResolver`. If no
149 :class:`~xml.sax.handler.EntityResolver` is set,
Georg Brandl116aa622007-08-15 14:28:22 +0000150 attempts to resolve an external entity will result in opening the system
151 identifier for the entity, and fail if it is not available.
152
153
154.. method:: XMLReader.getErrorHandler()
155
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300156 Return the current :class:`~xml.sax.handler.ErrorHandler`.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
158
159.. method:: XMLReader.setErrorHandler(handler)
160
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300161 Set the current error handler. If no :class:`~xml.sax.handler.ErrorHandler`
162 is set, errors will be raised as exceptions, and warnings will be printed.
Georg Brandl116aa622007-08-15 14:28:22 +0000163
164
165.. method:: XMLReader.setLocale(locale)
166
167 Allow an application to set the locale for errors and warnings.
168
169 SAX parsers are not required to provide localization for errors and warnings; if
Georg Brandl7cb13192010-08-03 12:06:29 +0000170 they cannot support the requested locale, however, they must raise a SAX
Georg Brandl116aa622007-08-15 14:28:22 +0000171 exception. Applications may request a locale change in the middle of a parse.
172
173
174.. method:: XMLReader.getFeature(featurename)
175
176 Return the current setting for feature *featurename*. If the feature is not
177 recognized, :exc:`SAXNotRecognizedException` is raised. The well-known
178 featurenames are listed in the module :mod:`xml.sax.handler`.
179
180
181.. method:: XMLReader.setFeature(featurename, value)
182
183 Set the *featurename* to *value*. If the feature is not recognized,
184 :exc:`SAXNotRecognizedException` is raised. If the feature or its setting is not
185 supported by the parser, *SAXNotSupportedException* is raised.
186
187
188.. method:: XMLReader.getProperty(propertyname)
189
190 Return the current setting for property *propertyname*. If the property is not
191 recognized, a :exc:`SAXNotRecognizedException` is raised. The well-known
192 propertynames are listed in the module :mod:`xml.sax.handler`.
193
194
195.. method:: XMLReader.setProperty(propertyname, value)
196
197 Set the *propertyname* to *value*. If the property is not recognized,
198 :exc:`SAXNotRecognizedException` is raised. If the property or its setting is
199 not supported by the parser, *SAXNotSupportedException* is raised.
200
201
202.. _incremental-parser-objects:
203
204IncrementalParser Objects
205-------------------------
206
207Instances of :class:`IncrementalParser` offer the following additional methods:
208
209
210.. method:: IncrementalParser.feed(data)
211
212 Process a chunk of *data*.
213
214
215.. method:: IncrementalParser.close()
216
217 Assume the end of the document. That will check well-formedness conditions that
218 can be checked only at the end, invoke handlers, and may clean up resources
219 allocated during parsing.
220
221
222.. method:: IncrementalParser.reset()
223
224 This method is called after close has been called to reset the parser so that it
225 is ready to parse new documents. The results of calling parse or feed after
226 close without calling reset are undefined.
227
228
229.. _locator-objects:
230
231Locator Objects
232---------------
233
234Instances of :class:`Locator` provide these methods:
235
236
237.. method:: Locator.getColumnNumber()
238
R David Murrayf86959d2016-06-02 15:14:30 -0400239 Return the column number where the current event begins.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
241
242.. method:: Locator.getLineNumber()
243
R David Murrayf86959d2016-06-02 15:14:30 -0400244 Return the line number where the current event begins.
Georg Brandl116aa622007-08-15 14:28:22 +0000245
246
247.. method:: Locator.getPublicId()
248
249 Return the public identifier for the current event.
250
251
252.. method:: Locator.getSystemId()
253
254 Return the system identifier for the current event.
255
256
257.. _input-source-objects:
258
259InputSource Objects
260-------------------
261
262
263.. method:: InputSource.setPublicId(id)
264
265 Sets the public identifier of this :class:`InputSource`.
266
267
268.. method:: InputSource.getPublicId()
269
270 Returns the public identifier of this :class:`InputSource`.
271
272
273.. method:: InputSource.setSystemId(id)
274
275 Sets the system identifier of this :class:`InputSource`.
276
277
278.. method:: InputSource.getSystemId()
279
280 Returns the system identifier of this :class:`InputSource`.
281
282
283.. method:: InputSource.setEncoding(encoding)
284
285 Sets the character encoding of this :class:`InputSource`.
286
287 The encoding must be a string acceptable for an XML encoding declaration (see
288 section 4.3.3 of the XML recommendation).
289
290 The encoding attribute of the :class:`InputSource` is ignored if the
291 :class:`InputSource` also contains a character stream.
292
293
294.. method:: InputSource.getEncoding()
295
296 Get the character encoding of this InputSource.
297
298
299.. method:: InputSource.setByteStream(bytefile)
300
Serhiy Storchaka61de0872015-04-02 21:00:13 +0300301 Set the byte stream (a :term:`binary file`) for this input source.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
303 The SAX parser will ignore this if there is also a character stream specified,
304 but it will use a byte stream in preference to opening a URI connection itself.
305
306 If the application knows the character encoding of the byte stream, it should
307 set it with the setEncoding method.
308
309
310.. method:: InputSource.getByteStream()
311
312 Get the byte stream for this input source.
313
314 The getEncoding method will return the character encoding for this byte stream,
Serhiy Storchakaecf41da2016-10-19 16:29:26 +0300315 or ``None`` if unknown.
Georg Brandl116aa622007-08-15 14:28:22 +0000316
317
318.. method:: InputSource.setCharacterStream(charfile)
319
Serhiy Storchaka61de0872015-04-02 21:00:13 +0300320 Set the character stream (a :term:`text file`) for this input source.
Georg Brandl116aa622007-08-15 14:28:22 +0000321
322 If there is a character stream specified, the SAX parser will ignore any byte
323 stream and will not attempt to open a URI connection to the system identifier.
324
325
326.. method:: InputSource.getCharacterStream()
327
328 Get the character stream for this input source.
329
330
331.. _attributes-objects:
332
333The :class:`Attributes` Interface
334---------------------------------
335
Serhiy Storchaka15e65902013-08-29 10:28:44 +0300336:class:`Attributes` objects implement a portion of the :term:`mapping protocol
337<mapping>`, including the methods :meth:`~collections.abc.Mapping.copy`,
338:meth:`~collections.abc.Mapping.get`, :meth:`~object.__contains__`,
339:meth:`~collections.abc.Mapping.items`, :meth:`~collections.abc.Mapping.keys`,
340and :meth:`~collections.abc.Mapping.values`. The following methods
Collin Winterf6b81212007-09-10 00:03:41 +0000341are also provided:
Georg Brandl116aa622007-08-15 14:28:22 +0000342
343
344.. method:: Attributes.getLength()
345
346 Return the number of attributes.
347
348
349.. method:: Attributes.getNames()
350
351 Return the names of the attributes.
352
353
354.. method:: Attributes.getType(name)
355
356 Returns the type of the attribute *name*, which is normally ``'CDATA'``.
357
358
359.. method:: Attributes.getValue(name)
360
361 Return the value of attribute *name*.
362
Christian Heimes5b5e81c2007-12-31 16:14:33 +0000363.. getValueByQName, getNameByQName, getQNameByName, getQNames available
364.. here already, but documented only for derived class.
Georg Brandl116aa622007-08-15 14:28:22 +0000365
366
367.. _attributes-ns-objects:
368
369The :class:`AttributesNS` Interface
370-----------------------------------
371
372This interface is a subtype of the :class:`Attributes` interface (see section
373:ref:`attributes-objects`). All methods supported by that interface are also
374available on :class:`AttributesNS` objects.
375
376The following methods are also available:
377
378
379.. method:: AttributesNS.getValueByQName(name)
380
381 Return the value for a qualified name.
382
383
384.. method:: AttributesNS.getNameByQName(name)
385
386 Return the ``(namespace, localname)`` pair for a qualified *name*.
387
388
389.. method:: AttributesNS.getQNameByName(name)
390
391 Return the qualified name for a ``(namespace, localname)`` pair.
392
393
394.. method:: AttributesNS.getQNames()
395
396 Return the qualified names of all attributes.
397