blob: ee6af3fb392a9a4b6f43590fa5fedcd05965e147 [file] [log] [blame]
R David Murray79cf3ba2012-05-27 17:10:36 -04001:mod:`email.parser`: Parsing email messages
2-------------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +00003
4.. module:: email.parser
5 :synopsis: Parse flat text email messages to produce a message object structure.
6
7
8Message object structures can be created in one of two ways: they can be created
Georg Brandl3638e482009-04-27 16:46:17 +00009from whole cloth by instantiating :class:`~email.message.Message` objects and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030010stringing them together via :meth:`~email.message.Message.attach` and
11:meth:`~email.message.Message.set_payload` calls, or they
Georg Brandl3638e482009-04-27 16:46:17 +000012can be created by parsing a flat text representation of the email message.
Georg Brandl116aa622007-08-15 14:28:22 +000013
14The :mod:`email` package provides a standard parser that understands most email
15document structures, including MIME documents. You can pass the parser a string
Georg Brandl3638e482009-04-27 16:46:17 +000016or a file object, and the parser will return to you the root
17:class:`~email.message.Message` instance of the object structure. For simple,
18non-MIME messages the payload of this root object will likely be a string
19containing the text of the message. For MIME messages, the root object will
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030020return ``True`` from its :meth:`~email.message.Message.is_multipart` method, and
21the subparts can be accessed via the :meth:`~email.message.Message.get_payload`
22and :meth:`~email.message.Message.walk` methods.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24There are actually two parser interfaces available for use, the classic
25:class:`Parser` API and the incremental :class:`FeedParser` API. The classic
26:class:`Parser` API is fine if you have the entire text of the message in memory
27as a string, or if the entire message lives in a file on the file system.
28:class:`FeedParser` is more appropriate for when you're reading the message from
29a stream which might block waiting for more input (e.g. reading an email message
30from a socket). The :class:`FeedParser` can consume and parse the message
31incrementally, and only returns the root object when you close the parser [#]_.
32
33Note that the parser can be extended in limited ways, and of course you can
34implement your own parser completely from scratch. There is no magical
35connection between the :mod:`email` package's bundled parser and the
Georg Brandl3638e482009-04-27 16:46:17 +000036:class:`~email.message.Message` class, so your custom parser can create message
37object trees any way it finds necessary.
Georg Brandl116aa622007-08-15 14:28:22 +000038
39
40FeedParser API
41^^^^^^^^^^^^^^
42
Georg Brandl116aa622007-08-15 14:28:22 +000043The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
44provides an API that is conducive to incremental parsing of email messages, such
45as would be necessary when reading the text of an email message from a source
46that can block (e.g. a socket). The :class:`FeedParser` can of course be used
47to parse an email message fully contained in a string or a file, but the classic
48:class:`Parser` API may be more convenient for such use cases. The semantics
49and results of the two parser APIs are identical.
50
51The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
52of text until there's no more to feed it, then close the parser to retrieve the
53root message object. The :class:`FeedParser` is extremely accurate when parsing
54standards-compliant messages, and it does a very good job of parsing
55non-compliant messages, providing information about how a message was deemed
56broken. It will populate a message object's *defects* attribute with a list of
57any problems it found in a message. See the :mod:`email.errors` module for the
58list of defects that it can find.
59
60Here is the API for the :class:`FeedParser`:
61
62
R David Murray3edd22a2011-04-18 13:59:37 -040063.. class:: FeedParser(_factory=email.message.Message, *, policy=policy.default)
Georg Brandl116aa622007-08-15 14:28:22 +000064
65 Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
66 callable that will be called whenever a new message object is needed. It
67 defaults to the :class:`email.message.Message` class.
68
R David Murray3edd22a2011-04-18 13:59:37 -040069 The *policy* keyword specifies a :mod:`~email.policy` object that controls a
70 number of aspects of the parser's operation. The default policy maintains
71 backward compatibility.
72
73 .. versionchanged:: 3.3 Added the *policy* keyword.
74
Benjamin Petersone41251e2008-04-25 01:59:09 +000075 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +000076
Benjamin Petersone41251e2008-04-25 01:59:09 +000077 Feed the :class:`FeedParser` some more data. *data* should be a string
78 containing one or more lines. The lines can be partial and the
79 :class:`FeedParser` will stitch such partial lines together properly. The
80 lines in the string can have any of the common three line endings,
81 carriage return, newline, or carriage return and newline (they can even be
82 mixed).
Georg Brandl116aa622007-08-15 14:28:22 +000083
Benjamin Petersone41251e2008-04-25 01:59:09 +000084 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +000085
Benjamin Petersone41251e2008-04-25 01:59:09 +000086 Closing a :class:`FeedParser` completes the parsing of all previously fed
87 data, and returns the root message object. It is undefined what happens
88 if you feed more data to a closed :class:`FeedParser`.
Georg Brandl116aa622007-08-15 14:28:22 +000089
90
R. David Murray96fd54e2010-10-08 15:55:28 +000091.. class:: BytesFeedParser(_factory=email.message.Message)
92
93 Works exactly like :class:`FeedParser` except that the input to the
94 :meth:`~FeedParser.feed` method must be bytes and not string.
95
96 .. versionadded:: 3.2
97
98
Georg Brandl116aa622007-08-15 14:28:22 +000099Parser class API
100^^^^^^^^^^^^^^^^
101
102The :class:`Parser` class, imported from the :mod:`email.parser` module,
103provides an API that can be used to parse a message when the complete contents
104of the message are available in a string or file. The :mod:`email.parser`
R David Murrayb35c8502011-04-13 16:46:05 -0400105module also provides header-only parsers, called :class:`HeaderParser` and
106:class:`BytesHeaderParser`, which can be used if you're only interested in the
107headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser`
108can be much faster in these situations, since they do not attempt to parse the
109message body, instead setting the payload to the raw body as a string. They
110have the same API as the :class:`Parser` and :class:`BytesParser` classes.
111
Georg Brandl61063cc2012-06-24 22:48:30 +0200112.. versionadded:: 3.3
113 The BytesHeaderParser class.
Georg Brandl116aa622007-08-15 14:28:22 +0000114
115
R David Murray3edd22a2011-04-18 13:59:37 -0400116.. class:: Parser(_class=email.message.Message, *, policy=policy.default)
Georg Brandl116aa622007-08-15 14:28:22 +0000117
118 The constructor for the :class:`Parser` class takes an optional argument
119 *_class*. This must be a callable factory (such as a function or a class), and
120 it is used whenever a sub-message object needs to be created. It defaults to
Georg Brandl3638e482009-04-27 16:46:17 +0000121 :class:`~email.message.Message` (see :mod:`email.message`). The factory will
122 be called without arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
R David Murray3edd22a2011-04-18 13:59:37 -0400124 The *policy* keyword specifies a :mod:`~email.policy` object that controls a
125 number of aspects of the parser's operation. The default policy maintains
126 backward compatibility.
127
128 .. versionchanged:: 3.3
129 Removed the *strict* argument that was deprecated in 2.4. Added the
130 *policy* keyword.
Georg Brandl116aa622007-08-15 14:28:22 +0000131
Benjamin Petersone41251e2008-04-25 01:59:09 +0000132 The other public :class:`Parser` methods are:
Georg Brandl116aa622007-08-15 14:28:22 +0000133
134
Georg Brandl3f076d82009-05-17 11:28:33 +0000135 .. method:: parse(fp, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000136
Benjamin Petersone41251e2008-04-25 01:59:09 +0000137 Read all the data from the file-like object *fp*, parse the resulting
138 text, and return the root message object. *fp* must support both the
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300139 :meth:`~io.TextIOBase.readline` and the :meth:`~io.TextIOBase.read`
140 methods on file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000141
Benjamin Petersone41251e2008-04-25 01:59:09 +0000142 The text contained in *fp* must be formatted as a block of :rfc:`2822`
143 style headers and header continuation lines, optionally preceded by a
144 envelope header. The header block is terminated either by the end of the
145 data or by a blank line. Following the header block is the body of the
146 message (which may contain MIME-encoded subparts).
Georg Brandl116aa622007-08-15 14:28:22 +0000147
Georg Brandlc875d202012-01-29 15:38:47 +0100148 Optional *headersonly* is a flag specifying whether to stop parsing after
149 reading the headers or not. The default is ``False``, meaning it parses
150 the entire contents of the file.
Georg Brandl116aa622007-08-15 14:28:22 +0000151
Georg Brandl3f076d82009-05-17 11:28:33 +0000152 .. method:: parsestr(text, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000153
Benjamin Petersone41251e2008-04-25 01:59:09 +0000154 Similar to the :meth:`parse` method, except it takes a string object
155 instead of a file-like object. Calling this method on a string is exactly
R. David Murray96fd54e2010-10-08 15:55:28 +0000156 equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
Benjamin Petersone41251e2008-04-25 01:59:09 +0000157 calling :meth:`parse`.
Georg Brandl116aa622007-08-15 14:28:22 +0000158
Georg Brandlc875d202012-01-29 15:38:47 +0100159 Optional *headersonly* is as with the :meth:`parse` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
Georg Brandl116aa622007-08-15 14:28:22 +0000161
R David Murray3edd22a2011-04-18 13:59:37 -0400162.. class:: BytesParser(_class=email.message.Message, *, policy=policy.default)
R. David Murray96fd54e2010-10-08 15:55:28 +0000163
164 This class is exactly parallel to :class:`Parser`, but handles bytes input.
165 The *_class* and *strict* arguments are interpreted in the same way as for
R David Murray3edd22a2011-04-18 13:59:37 -0400166 the :class:`Parser` constructor.
167
168 The *policy* keyword specifies a :mod:`~email.policy` object that
169 controls a number of aspects of the parser's operation. The default
170 policy maintains backward compatibility.
171
172 .. versionchanged:: 3.3
173 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000174
175 .. method:: parse(fp, headeronly=False)
176
177 Read all the data from the binary file-like object *fp*, parse the
178 resulting bytes, and return the message object. *fp* must support
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300179 both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
180 methods on file-like objects.
R. David Murray96fd54e2010-10-08 15:55:28 +0000181
182 The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
183 style headers and header continuation lines, optionally preceded by a
184 envelope header. The header block is terminated either by the end of the
185 data or by a blank line. Following the header block is the body of the
186 message (which may contain MIME-encoded subparts, including subparts
187 with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
188
189 Optional *headersonly* is a flag specifying whether to stop parsing after
190 reading the headers or not. The default is ``False``, meaning it parses
191 the entire contents of the file.
192
193 .. method:: parsebytes(bytes, headersonly=False)
194
195 Similar to the :meth:`parse` method, except it takes a byte string object
196 instead of a file-like object. Calling this method on a byte string is
197 exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
198 first and calling :meth:`parse`.
199
200 Optional *headersonly* is as with the :meth:`parse` method.
201
202 .. versionadded:: 3.2
203
204
Georg Brandl116aa622007-08-15 14:28:22 +0000205Since creating a message object structure from a string or a file object is such
R. David Murray96fd54e2010-10-08 15:55:28 +0000206a common task, four functions are provided as a convenience. They are available
Georg Brandl116aa622007-08-15 14:28:22 +0000207in the top-level :mod:`email` package namespace.
208
Georg Brandla971c652008-11-07 09:39:56 +0000209.. currentmodule:: email
Georg Brandl116aa622007-08-15 14:28:22 +0000210
R David Murray3edd22a2011-04-18 13:59:37 -0400211.. function:: message_from_string(s, _class=email.message.Message, *, \
212 policy=policy.default)
Georg Brandl116aa622007-08-15 14:28:22 +0000213
214 Return a message object structure from a string. This is exactly equivalent to
R David Murray3edd22a2011-04-18 13:59:37 -0400215 ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300216 with the :class:`~email.parser.Parser` class constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000217
R David Murray6a45d3b2011-04-18 16:00:47 -0400218 .. versionchanged:: 3.3
219 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400220
R David Murray6a45d3b2011-04-18 16:00:47 -0400221.. function:: message_from_bytes(s, _class=email.message.Message, *, \
222 policy=policy.default)
Georg Brandl116aa622007-08-15 14:28:22 +0000223
R. David Murray96fd54e2010-10-08 15:55:28 +0000224 Return a message object structure from a byte string. This is exactly
225 equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300226 *strict* are interpreted as with the :class:`~email.parser.Parser` class
227 constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000228
229 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400230 .. versionchanged:: 3.3
231 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000232
R David Murray3edd22a2011-04-18 13:59:37 -0400233.. function:: message_from_file(fp, _class=email.message.Message, *, \
234 policy=policy.default)
Georg Brandl116aa622007-08-15 14:28:22 +0000235
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000236 Return a message object structure tree from an open :term:`file object`.
R David Murray3edd22a2011-04-18 13:59:37 -0400237 This is exactly equivalent to ``Parser().parse(fp)``. *_class*
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300238 and *policy* are interpreted as with the :class:`~email.parser.Parser` class
239 constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000240
R David Murray6a45d3b2011-04-18 16:00:47 -0400241 .. versionchanged::
242 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400243
244.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
245 policy=policy.default)
R. David Murray96fd54e2010-10-08 15:55:28 +0000246
247 Return a message object structure tree from an open binary :term:`file
248 object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300249 *_class* and *policy* are interpreted as with the
250 :class:`~email.parser.Parser` class constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000251
252 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400253 .. versionchanged:: 3.3
254 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000255
Georg Brandl116aa622007-08-15 14:28:22 +0000256Here's an example of how you might use this at an interactive Python prompt::
257
258 >>> import email
Andrew Svetlov439e17f2012-08-12 15:16:42 +0300259 >>> msg = email.message_from_string(myString) # doctest: +SKIP
Georg Brandl116aa622007-08-15 14:28:22 +0000260
261
262Additional notes
263^^^^^^^^^^^^^^^^
264
265Here are some notes on the parsing semantics:
266
267* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
268 object with a string payload. These objects will return ``False`` for
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300269 :meth:`~email.message.Message.is_multipart`. Their
270 :meth:`~email.message.Message.get_payload` method will return a string object.
Georg Brandl116aa622007-08-15 14:28:22 +0000271
272* All :mimetype:`multipart` type messages will be parsed as a container message
273 object with a list of sub-message objects for their payload. The outer
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300274 container message will return ``True`` for
275 :meth:`~email.message.Message.is_multipart` and their
276 :meth:`~email.message.Message.get_payload` method will return the list of
277 :class:`~email.message.Message` subparts.
Georg Brandl116aa622007-08-15 14:28:22 +0000278
279* Most messages with a content type of :mimetype:`message/\*` (e.g.
280 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
281 parsed as container object containing a list payload of length 1. Their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300282 :meth:`~email.message.Message.is_multipart` method will return ``True``.
283 The single element in the list payload will be a sub-message object.
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285* Some non-standards compliant messages may not be internally consistent about
286 their :mimetype:`multipart`\ -edness. Such messages may have a
287 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300288 :meth:`~email.message.Message.is_multipart` method may return ``False``.
289 If such messages were parsed with the :class:`~email.parser.FeedParser`,
290 they will have an instance of the
291 :class:`~email.errors.MultipartInvariantViolationDefect` class in their
292 *defects* attribute list. See :mod:`email.errors` for details.
Georg Brandl116aa622007-08-15 14:28:22 +0000293
294.. rubric:: Footnotes
295
296.. [#] As of email package version 3.0, introduced in Python 2.4, the classic
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300297 :class:`~email.parser.Parser` was re-implemented in terms of the
298 :class:`~email.parser.FeedParser`, so the semantics and results are
299 identical between the two parsers.
Georg Brandl116aa622007-08-15 14:28:22 +0000300