blob: c323ebc6401b998940f00621d7cc1c960c2d1402 [file] [log] [blame]
R David Murray79cf3ba2012-05-27 17:10:36 -04001:mod:`email.parser`: Parsing email messages
2-------------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +00003
4.. module:: email.parser
5 :synopsis: Parse flat text email messages to produce a message object structure.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/email/parser.py`
8
9--------------
Georg Brandl116aa622007-08-15 14:28:22 +000010
R David Murray29d1bc02016-09-07 21:15:59 -040011Message object structures can be created in one of two ways: they can be
12created from whole cloth by creating an :class:`~email.message.EmailMessage`
13object, adding headers using the dictionary interface, and adding payload(s)
14using :meth:`~email.message.EmailMessage.set_content` and related methods, or
15they can be created by parsing a serialized representation of the email
16message.
Georg Brandl116aa622007-08-15 14:28:22 +000017
18The :mod:`email` package provides a standard parser that understands most email
R David Murray29d1bc02016-09-07 21:15:59 -040019document structures, including MIME documents. You can pass the parser a
20bytes, string or file object, and the parser will return to you the root
21:class:`~email.message.EmailMessage` instance of the object structure. For
22simple, non-MIME messages the payload of this root object will likely be a
23string containing the text of the message. For MIME messages, the root object
24will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart`
25method, and the subparts can be accessed via the payload manipulation methods,
26such as :meth:`~email.message.EmailMessage.get_body`,
27:meth:`~email.message.EmailMessage.iter_parts`, and
28:meth:`~email.message.EmailMessage.walk`.
Georg Brandl116aa622007-08-15 14:28:22 +000029
R David Murray29d1bc02016-09-07 21:15:59 -040030There are actually two parser interfaces available for use, the :class:`Parser`
31API and the incremental :class:`FeedParser` API. The :class:`Parser` API is
32most useful if you have the entire text of the message in memory, or if the
33entire message lives in a file on the file system. :class:`FeedParser` is more
34appropriate when you are reading the message from a stream which might block
35waiting for more input (such as reading an email message from a socket). The
36:class:`FeedParser` can consume and parse the message incrementally, and only
37returns the root object when you close the parser.
Georg Brandl116aa622007-08-15 14:28:22 +000038
39Note that the parser can be extended in limited ways, and of course you can
R David Murray29d1bc02016-09-07 21:15:59 -040040implement your own parser completely from scratch. All of the logic that
41connects the :mod:`email` package's bundled parser and the
42:class:`~email.message.EmailMessage` class is embodied in the :mod:`policy`
43class, so a custom parser can create message object trees any way it finds
44necessary by implementing custom versions of the appropriate :mod:`policy`
45methods.
Georg Brandl116aa622007-08-15 14:28:22 +000046
47
48FeedParser API
49^^^^^^^^^^^^^^
50
R David Murray29d1bc02016-09-07 21:15:59 -040051The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module,
52provides an API that is conducive to incremental parsing of email messages,
53such as would be necessary when reading the text of an email message from a
54source that can block (such as a socket). The :class:`BytesFeedParser` can of
55course be used to parse an email message fully contained in a :term:`bytes-like
56object`, string, or file, but the :class:`BytesParser` API may be more
57convenient for such use cases. The semantics and results of the two parser
58APIs are identical.
Georg Brandl116aa622007-08-15 14:28:22 +000059
R David Murray29d1bc02016-09-07 21:15:59 -040060The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a
61bunch of bytes until there's no more to feed it, then close the parser to
62retrieve the root message object. The :class:`BytesFeedParser` is extremely
63accurate when parsing standards-compliant messages, and it does a very good job
64of parsing non-compliant messages, providing information about how a message
65was deemed broken. It will populate a message object's
66:attr:`~email.message.EmailMessage.defects` attribute with a list of any
67problems it found in a message. See the :mod:`email.errors` module for the
Georg Brandl116aa622007-08-15 14:28:22 +000068list of defects that it can find.
69
R David Murray29d1bc02016-09-07 21:15:59 -040070Here is the API for the :class:`BytesFeedParser`:
Georg Brandl116aa622007-08-15 14:28:22 +000071
72
R David Murray29d1bc02016-09-07 21:15:59 -040073.. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +000074
R David Murray29d1bc02016-09-07 21:15:59 -040075 Create a :class:`BytesFeedParser` instance. Optional *_factory* is a
R David Murray06ed2182016-09-09 18:39:18 -040076 no-argument callable; if not specified use the
77 :attr:`~email.policy.Policy.message_factory` from the *policy*. Call
78 *_factory* whenever a new message object is needed.
Georg Brandl116aa622007-08-15 14:28:22 +000079
R David Murray29d1bc02016-09-07 21:15:59 -040080 If *policy* is specified use the rules it specifies to update the
81 representation of the message. If *policy* is not set, use the
82 :class:`compat32 <email.policy.Compat32>` policy, which maintains backward
83 compatibility with the Python 3.2 version of the email package and provides
84 :class:`~email.message.Message` as the default factory. All other policies
85 provide :class:`~email.message.EmailMessage` as the default *_factory*. For
86 more information on what else *policy* controls, see the
R David Murraye2524462014-05-06 21:33:18 -040087 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -040088
R David Murray29d1bc02016-09-07 21:15:59 -040089 Note: **The policy keyword should always be specified**; The default will
90 change to :data:`email.policy.default` in a future version of Python.
R. David Murray96fd54e2010-10-08 15:55:28 +000091
92 .. versionadded:: 3.2
93
R David Murray29d1bc02016-09-07 21:15:59 -040094 .. versionchanged:: 3.3 Added the *policy* keyword.
Benjamin Petersonb17ba092016-10-18 23:14:08 -070095 .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``.
R. David Murray96fd54e2010-10-08 15:55:28 +000096
Georg Brandl116aa622007-08-15 14:28:22 +000097
R David Murray29d1bc02016-09-07 21:15:59 -040098 .. method:: feed(data)
99
100 Feed the parser some more data. *data* should be a :term:`bytes-like
101 object` containing one or more lines. The lines can be partial and the
102 parser will stitch such partial lines together properly. The lines can
103 have any of the three common line endings: carriage return, newline, or
104 carriage return and newline (they can even be mixed).
105
106
107 .. method:: close()
108
109 Complete the parsing of all previously fed data and return the root
110 message object. It is undefined what happens if :meth:`~feed` is called
111 after this method has been called.
112
113
114.. class:: FeedParser(_factory=None, *, policy=policy.compat32)
115
116 Works like :class:`BytesFeedParser` except that the input to the
117 :meth:`~BytesFeedParser.feed` method must be a string. This is of limited
118 utility, since the only way for such a message to be valid is for it to
119 contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is
120 ``True``, no binary attachments.
121
122 .. versionchanged:: 3.3 Added the *policy* keyword.
123
124
125Parser API
126^^^^^^^^^^
127
128The :class:`BytesParser` class, imported from the :mod:`email.parser` module,
Georg Brandl116aa622007-08-15 14:28:22 +0000129provides an API that can be used to parse a message when the complete contents
R David Murray29d1bc02016-09-07 21:15:59 -0400130of the message are available in a :term:`bytes-like object` or file. The
131:mod:`email.parser` module also provides :class:`Parser` for parsing strings,
132and header-only parsers, :class:`BytesHeaderParser` and
133:class:`HeaderParser`, which can be used if you're only interested in the
134headers of the message. :class:`BytesHeaderParser` and :class:`HeaderParser`
R David Murrayb35c8502011-04-13 16:46:05 -0400135can be much faster in these situations, since they do not attempt to parse the
R David Murray29d1bc02016-09-07 21:15:59 -0400136message body, instead setting the payload to the raw body.
Georg Brandl116aa622007-08-15 14:28:22 +0000137
138
R David Murray29d1bc02016-09-07 21:15:59 -0400139.. class:: BytesParser(_class=None, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000140
R David Murray29d1bc02016-09-07 21:15:59 -0400141 Create a :class:`BytesParser` instance. The *_class* and *policy*
142 arguments have the same meaning and sematnics as the *_factory*
143 and *policy* arguments of :class:`BytesFeedParser`.
Georg Brandl116aa622007-08-15 14:28:22 +0000144
R David Murray29d1bc02016-09-07 21:15:59 -0400145 Note: **The policy keyword should always be specified**; The default will
146 change to :data:`email.policy.default` in a future version of Python.
R David Murray3edd22a2011-04-18 13:59:37 -0400147
148 .. versionchanged:: 3.3
149 Removed the *strict* argument that was deprecated in 2.4. Added the
150 *policy* keyword.
Benjamin Petersonb17ba092016-10-18 23:14:08 -0700151 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
Georg Brandl116aa622007-08-15 14:28:22 +0000152
R. David Murray96fd54e2010-10-08 15:55:28 +0000153
Jesus Ceaca2e02c2014-09-22 00:43:39 +0200154 .. method:: parse(fp, headersonly=False)
R. David Murray96fd54e2010-10-08 15:55:28 +0000155
156 Read all the data from the binary file-like object *fp*, parse the
157 resulting bytes, and return the message object. *fp* must support
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300158 both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
R David Murray29d1bc02016-09-07 21:15:59 -0400159 methods.
R. David Murray96fd54e2010-10-08 15:55:28 +0000160
R David Murray29d1bc02016-09-07 21:15:59 -0400161 The bytes contained in *fp* must be formatted as a block of :rfc:`5322`
162 (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`)
Serhiy Storchakad65c9492015-11-02 14:10:23 +0200163 style headers and header continuation lines, optionally preceded by an
R. David Murray96fd54e2010-10-08 15:55:28 +0000164 envelope header. The header block is terminated either by the end of the
165 data or by a blank line. Following the header block is the body of the
166 message (which may contain MIME-encoded subparts, including subparts
167 with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
168
169 Optional *headersonly* is a flag specifying whether to stop parsing after
170 reading the headers or not. The default is ``False``, meaning it parses
171 the entire contents of the file.
172
R David Murray29d1bc02016-09-07 21:15:59 -0400173
174 .. method:: parsebytes(bytes, headersonly=False)
R. David Murray96fd54e2010-10-08 15:55:28 +0000175
R David Murray74eda762016-08-30 21:17:02 -0400176 Similar to the :meth:`parse` method, except it takes a :term:`bytes-like
R David Murray29d1bc02016-09-07 21:15:59 -0400177 object` instead of a file-like object. Calling this method on a
178 :term:`bytes-like object` is equivalent to wrapping *bytes* in a
179 :class:`~io.BytesIO` instance first and calling :meth:`parse`.
R. David Murray96fd54e2010-10-08 15:55:28 +0000180
181 Optional *headersonly* is as with the :meth:`parse` method.
182
183 .. versionadded:: 3.2
184
185
R David Murray29d1bc02016-09-07 21:15:59 -0400186.. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32)
187
188 Exactly like :class:`BytesParser`, except that *headersonly*
189 defaults to ``True``.
190
191 .. versionadded:: 3.3
192
193
194.. class:: Parser(_class=None, *, policy=policy.compat32)
195
196 This class is parallel to :class:`BytesParser`, but handles string input.
197
198 .. versionchanged:: 3.3
199 Removed the *strict* argument. Added the *policy* keyword.
Benjamin Petersonb17ba092016-10-18 23:14:08 -0700200 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
R David Murray29d1bc02016-09-07 21:15:59 -0400201
202
203 .. method:: parse(fp, headersonly=False)
204
205 Read all the data from the text-mode file-like object *fp*, parse the
206 resulting text, and return the root message object. *fp* must support
207 both the :meth:`~io.TextIOBase.readline` and the
208 :meth:`~io.TextIOBase.read` methods on file-like objects.
209
210 Other than the text mode requirement, this method operates like
211 :meth:`BytesParser.parse`.
212
213
214 .. method:: parsestr(text, headersonly=False)
215
216 Similar to the :meth:`parse` method, except it takes a string object
217 instead of a file-like object. Calling this method on a string is
218 equivalent to wrapping *text* in a :class:`~io.StringIO` instance first
219 and calling :meth:`parse`.
220
221 Optional *headersonly* is as with the :meth:`parse` method.
222
223
224.. class:: HeaderParser(_class=None, *, policy=policy.compat32)
225
226 Exactly like :class:`Parser`, except that *headersonly*
227 defaults to ``True``.
228
229
Georg Brandl116aa622007-08-15 14:28:22 +0000230Since creating a message object structure from a string or a file object is such
R. David Murray96fd54e2010-10-08 15:55:28 +0000231a common task, four functions are provided as a convenience. They are available
Georg Brandl116aa622007-08-15 14:28:22 +0000232in the top-level :mod:`email` package namespace.
233
Georg Brandla971c652008-11-07 09:39:56 +0000234.. currentmodule:: email
Georg Brandl116aa622007-08-15 14:28:22 +0000235
R David Murray29d1bc02016-09-07 21:15:59 -0400236
R David Murray06ed2182016-09-09 18:39:18 -0400237.. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32)
R David Murray29d1bc02016-09-07 21:15:59 -0400238
239 Return a message object structure from a :term:`bytes-like object`. This is
240 equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
241 *strict* are interpreted as with the :class:`~email.parser.BytesParser` class
242 constructor.
243
244 .. versionadded:: 3.2
245 .. versionchanged:: 3.3
246 Removed the *strict* argument. Added the *policy* keyword.
247
248
R David Murray06ed2182016-09-09 18:39:18 -0400249.. function:: message_from_binary_file(fp, _class=None, *,
R David Murray29d1bc02016-09-07 21:15:59 -0400250 policy=policy.compat32)
251
252 Return a message object structure tree from an open binary :term:`file
253 object`. This is equivalent to ``BytesParser().parse(fp)``. *_class* and
254 *policy* are interpreted as with the :class:`~email.parser.BytesParser` class
255 constructor.
256
257 .. versionadded:: 3.2
258 .. versionchanged:: 3.3
259 Removed the *strict* argument. Added the *policy* keyword.
260
261
R David Murray06ed2182016-09-09 18:39:18 -0400262.. function:: message_from_string(s, _class=None, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000263
R David Murray29d1bc02016-09-07 21:15:59 -0400264 Return a message object structure from a string. This is equivalent to
R David Murray3edd22a2011-04-18 13:59:37 -0400265 ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300266 with the :class:`~email.parser.Parser` class constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
R David Murray6a45d3b2011-04-18 16:00:47 -0400268 .. versionchanged:: 3.3
269 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400270
Georg Brandl116aa622007-08-15 14:28:22 +0000271
R David Murray06ed2182016-09-09 18:39:18 -0400272.. function:: message_from_file(fp, _class=None, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000273
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000274 Return a message object structure tree from an open :term:`file object`.
R David Murray29d1bc02016-09-07 21:15:59 -0400275 This is equivalent to ``Parser().parse(fp)``. *_class* and *policy* are
276 interpreted as with the :class:`~email.parser.Parser` class constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000277
Berker Peksag25539b02016-07-27 13:32:54 +0300278 .. versionchanged:: 3.3
R David Murray6a45d3b2011-04-18 16:00:47 -0400279 Removed the *strict* argument. Added the *policy* keyword.
Benjamin Petersonb17ba092016-10-18 23:14:08 -0700280 .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``.
R David Murray3edd22a2011-04-18 13:59:37 -0400281
R. David Murray96fd54e2010-10-08 15:55:28 +0000282
R David Murray29d1bc02016-09-07 21:15:59 -0400283Here's an example of how you might use :func:`message_from_bytes` at an
284interactive Python prompt::
Georg Brandl116aa622007-08-15 14:28:22 +0000285
286 >>> import email
R David Murray29d1bc02016-09-07 21:15:59 -0400287 >>> msg = email.message_from_bytes(myBytes) # doctest: +SKIP
Georg Brandl116aa622007-08-15 14:28:22 +0000288
289
290Additional notes
291^^^^^^^^^^^^^^^^
292
293Here are some notes on the parsing semantics:
294
295* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
296 object with a string payload. These objects will return ``False`` for
R David Murray29d1bc02016-09-07 21:15:59 -0400297 :meth:`~email.message.EmailMessage.is_multipart`, and
298 :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list.
Georg Brandl116aa622007-08-15 14:28:22 +0000299
300* All :mimetype:`multipart` type messages will be parsed as a container message
301 object with a list of sub-message objects for their payload. The outer
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300302 container message will return ``True`` for
R David Murray29d1bc02016-09-07 21:15:59 -0400303 :meth:`~email.message.EmailMessage.is_multipart`, and
304 :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts.
Georg Brandl116aa622007-08-15 14:28:22 +0000305
R David Murray29d1bc02016-09-07 21:15:59 -0400306* Most messages with a content type of :mimetype:`message/\*` (such as
307 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also
308 be parsed as container object containing a list payload of length 1. Their
309 :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``.
310 The single element yielded by :meth:`~email.message.EmailMessage.iter_parts`
311 will be a sub-message object.
Georg Brandl116aa622007-08-15 14:28:22 +0000312
R David Murray29d1bc02016-09-07 21:15:59 -0400313* Some non-standards-compliant messages may not be internally consistent about
Georg Brandl116aa622007-08-15 14:28:22 +0000314 their :mimetype:`multipart`\ -edness. Such messages may have a
315 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
R David Murray29d1bc02016-09-07 21:15:59 -0400316 :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``.
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300317 If such messages were parsed with the :class:`~email.parser.FeedParser`,
318 they will have an instance of the
319 :class:`~email.errors.MultipartInvariantViolationDefect` class in their
320 *defects* attribute list. See :mod:`email.errors` for details.