blob: 71b339a15e1811a1abecdc421948eac4f3870978 [file] [log] [blame]
R David Murray79cf3ba2012-05-27 17:10:36 -04001:mod:`email.parser`: Parsing email messages
2-------------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +00003
4.. module:: email.parser
5 :synopsis: Parse flat text email messages to produce a message object structure.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/email/parser.py`
8
9--------------
Georg Brandl116aa622007-08-15 14:28:22 +000010
11Message object structures can be created in one of two ways: they can be created
Georg Brandl3638e482009-04-27 16:46:17 +000012from whole cloth by instantiating :class:`~email.message.Message` objects and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030013stringing them together via :meth:`~email.message.Message.attach` and
14:meth:`~email.message.Message.set_payload` calls, or they
Georg Brandl3638e482009-04-27 16:46:17 +000015can be created by parsing a flat text representation of the email message.
Georg Brandl116aa622007-08-15 14:28:22 +000016
17The :mod:`email` package provides a standard parser that understands most email
18document structures, including MIME documents. You can pass the parser a string
Georg Brandl3638e482009-04-27 16:46:17 +000019or a file object, and the parser will return to you the root
20:class:`~email.message.Message` instance of the object structure. For simple,
21non-MIME messages the payload of this root object will likely be a string
22containing the text of the message. For MIME messages, the root object will
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030023return ``True`` from its :meth:`~email.message.Message.is_multipart` method, and
24the subparts can be accessed via the :meth:`~email.message.Message.get_payload`
25and :meth:`~email.message.Message.walk` methods.
Georg Brandl116aa622007-08-15 14:28:22 +000026
27There are actually two parser interfaces available for use, the classic
28:class:`Parser` API and the incremental :class:`FeedParser` API. The classic
29:class:`Parser` API is fine if you have the entire text of the message in memory
30as a string, or if the entire message lives in a file on the file system.
31:class:`FeedParser` is more appropriate for when you're reading the message from
32a stream which might block waiting for more input (e.g. reading an email message
33from a socket). The :class:`FeedParser` can consume and parse the message
34incrementally, and only returns the root object when you close the parser [#]_.
35
36Note that the parser can be extended in limited ways, and of course you can
37implement your own parser completely from scratch. There is no magical
38connection between the :mod:`email` package's bundled parser and the
Georg Brandl3638e482009-04-27 16:46:17 +000039:class:`~email.message.Message` class, so your custom parser can create message
40object trees any way it finds necessary.
Georg Brandl116aa622007-08-15 14:28:22 +000041
42
43FeedParser API
44^^^^^^^^^^^^^^
45
Georg Brandl116aa622007-08-15 14:28:22 +000046The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
47provides an API that is conducive to incremental parsing of email messages, such
48as would be necessary when reading the text of an email message from a source
49that can block (e.g. a socket). The :class:`FeedParser` can of course be used
50to parse an email message fully contained in a string or a file, but the classic
51:class:`Parser` API may be more convenient for such use cases. The semantics
52and results of the two parser APIs are identical.
53
54The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
55of text until there's no more to feed it, then close the parser to retrieve the
56root message object. The :class:`FeedParser` is extremely accurate when parsing
57standards-compliant messages, and it does a very good job of parsing
58non-compliant messages, providing information about how a message was deemed
59broken. It will populate a message object's *defects* attribute with a list of
60any problems it found in a message. See the :mod:`email.errors` module for the
61list of defects that it can find.
62
63Here is the API for the :class:`FeedParser`:
64
65
R David Murraye2524462014-05-06 21:33:18 -040066.. class:: FeedParser(_factory=email.message.Message, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +000067
68 Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
69 callable that will be called whenever a new message object is needed. It
70 defaults to the :class:`email.message.Message` class.
71
R David Murraye2524462014-05-06 21:33:18 -040072 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -040073 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -040074 message. If *policy* is not set, use the :class:`compat32
75 <email.policy.Compat32>` policy, which maintains backward compatibility with
76 the Python 3.2 version of the email package. For more information see the
77 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -040078
79 .. versionchanged:: 3.3 Added the *policy* keyword.
80
Benjamin Petersone41251e2008-04-25 01:59:09 +000081 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +000082
Benjamin Petersone41251e2008-04-25 01:59:09 +000083 Feed the :class:`FeedParser` some more data. *data* should be a string
84 containing one or more lines. The lines can be partial and the
85 :class:`FeedParser` will stitch such partial lines together properly. The
86 lines in the string can have any of the common three line endings,
87 carriage return, newline, or carriage return and newline (they can even be
88 mixed).
Georg Brandl116aa622007-08-15 14:28:22 +000089
Benjamin Petersone41251e2008-04-25 01:59:09 +000090 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +000091
Benjamin Petersone41251e2008-04-25 01:59:09 +000092 Closing a :class:`FeedParser` completes the parsing of all previously fed
93 data, and returns the root message object. It is undefined what happens
94 if you feed more data to a closed :class:`FeedParser`.
Georg Brandl116aa622007-08-15 14:28:22 +000095
96
R. David Murray96fd54e2010-10-08 15:55:28 +000097.. class:: BytesFeedParser(_factory=email.message.Message)
98
99 Works exactly like :class:`FeedParser` except that the input to the
100 :meth:`~FeedParser.feed` method must be bytes and not string.
101
102 .. versionadded:: 3.2
103
104
Georg Brandl116aa622007-08-15 14:28:22 +0000105Parser class API
106^^^^^^^^^^^^^^^^
107
108The :class:`Parser` class, imported from the :mod:`email.parser` module,
109provides an API that can be used to parse a message when the complete contents
110of the message are available in a string or file. The :mod:`email.parser`
R David Murrayb35c8502011-04-13 16:46:05 -0400111module also provides header-only parsers, called :class:`HeaderParser` and
112:class:`BytesHeaderParser`, which can be used if you're only interested in the
113headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser`
114can be much faster in these situations, since they do not attempt to parse the
115message body, instead setting the payload to the raw body as a string. They
116have the same API as the :class:`Parser` and :class:`BytesParser` classes.
117
Georg Brandl61063cc2012-06-24 22:48:30 +0200118.. versionadded:: 3.3
119 The BytesHeaderParser class.
Georg Brandl116aa622007-08-15 14:28:22 +0000120
121
R David Murraye2524462014-05-06 21:33:18 -0400122.. class:: Parser(_class=email.message.Message, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000123
124 The constructor for the :class:`Parser` class takes an optional argument
125 *_class*. This must be a callable factory (such as a function or a class), and
126 it is used whenever a sub-message object needs to be created. It defaults to
Georg Brandl3638e482009-04-27 16:46:17 +0000127 :class:`~email.message.Message` (see :mod:`email.message`). The factory will
128 be called without arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000129
R David Murraye2524462014-05-06 21:33:18 -0400130 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -0400131 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -0400132 message. If *policy* is not set, use the :class:`compat32
133 <email.policy.Compat32>` policy, which maintains backward compatibility with
134 the Python 3.2 version of the email package. For more information see the
135 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -0400136
137 .. versionchanged:: 3.3
138 Removed the *strict* argument that was deprecated in 2.4. Added the
139 *policy* keyword.
Georg Brandl116aa622007-08-15 14:28:22 +0000140
Benjamin Petersone41251e2008-04-25 01:59:09 +0000141 The other public :class:`Parser` methods are:
Georg Brandl116aa622007-08-15 14:28:22 +0000142
143
Georg Brandl3f076d82009-05-17 11:28:33 +0000144 .. method:: parse(fp, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000145
Benjamin Petersone41251e2008-04-25 01:59:09 +0000146 Read all the data from the file-like object *fp*, parse the resulting
147 text, and return the root message object. *fp* must support both the
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300148 :meth:`~io.TextIOBase.readline` and the :meth:`~io.TextIOBase.read`
149 methods on file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000150
Benjamin Petersone41251e2008-04-25 01:59:09 +0000151 The text contained in *fp* must be formatted as a block of :rfc:`2822`
Serhiy Storchakad65c9492015-11-02 14:10:23 +0200152 style headers and header continuation lines, optionally preceded by an
Benjamin Petersone41251e2008-04-25 01:59:09 +0000153 envelope header. The header block is terminated either by the end of the
154 data or by a blank line. Following the header block is the body of the
155 message (which may contain MIME-encoded subparts).
Georg Brandl116aa622007-08-15 14:28:22 +0000156
Georg Brandlc875d202012-01-29 15:38:47 +0100157 Optional *headersonly* is a flag specifying whether to stop parsing after
158 reading the headers or not. The default is ``False``, meaning it parses
159 the entire contents of the file.
Georg Brandl116aa622007-08-15 14:28:22 +0000160
Georg Brandl3f076d82009-05-17 11:28:33 +0000161 .. method:: parsestr(text, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000162
Benjamin Petersone41251e2008-04-25 01:59:09 +0000163 Similar to the :meth:`parse` method, except it takes a string object
164 instead of a file-like object. Calling this method on a string is exactly
R. David Murray96fd54e2010-10-08 15:55:28 +0000165 equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
Benjamin Petersone41251e2008-04-25 01:59:09 +0000166 calling :meth:`parse`.
Georg Brandl116aa622007-08-15 14:28:22 +0000167
Georg Brandlc875d202012-01-29 15:38:47 +0100168 Optional *headersonly* is as with the :meth:`parse` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000169
Georg Brandl116aa622007-08-15 14:28:22 +0000170
R David Murraye2524462014-05-06 21:33:18 -0400171.. class:: BytesParser(_class=email.message.Message, *, policy=policy.compat32)
R. David Murray96fd54e2010-10-08 15:55:28 +0000172
173 This class is exactly parallel to :class:`Parser`, but handles bytes input.
174 The *_class* and *strict* arguments are interpreted in the same way as for
R David Murray3edd22a2011-04-18 13:59:37 -0400175 the :class:`Parser` constructor.
176
R David Murraye2524462014-05-06 21:33:18 -0400177 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -0400178 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -0400179 message. If *policy* is not set, use the :class:`compat32
180 <email.policy.Compat32>` policy, which maintains backward compatibility with
181 the Python 3.2 version of the email package. For more information see the
182 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -0400183
184 .. versionchanged:: 3.3
185 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000186
Jesus Ceaca2e02c2014-09-22 00:43:39 +0200187 .. method:: parse(fp, headersonly=False)
R. David Murray96fd54e2010-10-08 15:55:28 +0000188
189 Read all the data from the binary file-like object *fp*, parse the
190 resulting bytes, and return the message object. *fp* must support
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300191 both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
192 methods on file-like objects.
R. David Murray96fd54e2010-10-08 15:55:28 +0000193
194 The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
Serhiy Storchakad65c9492015-11-02 14:10:23 +0200195 style headers and header continuation lines, optionally preceded by an
R. David Murray96fd54e2010-10-08 15:55:28 +0000196 envelope header. The header block is terminated either by the end of the
197 data or by a blank line. Following the header block is the body of the
198 message (which may contain MIME-encoded subparts, including subparts
199 with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
200
201 Optional *headersonly* is a flag specifying whether to stop parsing after
202 reading the headers or not. The default is ``False``, meaning it parses
203 the entire contents of the file.
204
205 .. method:: parsebytes(bytes, headersonly=False)
206
207 Similar to the :meth:`parse` method, except it takes a byte string object
208 instead of a file-like object. Calling this method on a byte string is
209 exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
210 first and calling :meth:`parse`.
211
212 Optional *headersonly* is as with the :meth:`parse` method.
213
214 .. versionadded:: 3.2
215
216
Georg Brandl116aa622007-08-15 14:28:22 +0000217Since creating a message object structure from a string or a file object is such
R. David Murray96fd54e2010-10-08 15:55:28 +0000218a common task, four functions are provided as a convenience. They are available
Georg Brandl116aa622007-08-15 14:28:22 +0000219in the top-level :mod:`email` package namespace.
220
Georg Brandla971c652008-11-07 09:39:56 +0000221.. currentmodule:: email
Georg Brandl116aa622007-08-15 14:28:22 +0000222
R David Murray3edd22a2011-04-18 13:59:37 -0400223.. function:: message_from_string(s, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400224 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000225
226 Return a message object structure from a string. This is exactly equivalent to
R David Murray3edd22a2011-04-18 13:59:37 -0400227 ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300228 with the :class:`~email.parser.Parser` class constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000229
R David Murray6a45d3b2011-04-18 16:00:47 -0400230 .. versionchanged:: 3.3
231 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400232
R David Murray6a45d3b2011-04-18 16:00:47 -0400233.. function:: message_from_bytes(s, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400234 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000235
R. David Murray96fd54e2010-10-08 15:55:28 +0000236 Return a message object structure from a byte string. This is exactly
237 equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300238 *strict* are interpreted as with the :class:`~email.parser.Parser` class
239 constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000240
241 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400242 .. versionchanged:: 3.3
243 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000244
R David Murray3edd22a2011-04-18 13:59:37 -0400245.. function:: message_from_file(fp, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400246 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000248 Return a message object structure tree from an open :term:`file object`.
R David Murray3edd22a2011-04-18 13:59:37 -0400249 This is exactly equivalent to ``Parser().parse(fp)``. *_class*
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300250 and *policy* are interpreted as with the :class:`~email.parser.Parser` class
251 constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Berker Peksag25539b02016-07-27 13:32:54 +0300253 .. versionchanged:: 3.3
R David Murray6a45d3b2011-04-18 16:00:47 -0400254 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400255
256.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400257 policy=policy.compat32)
R. David Murray96fd54e2010-10-08 15:55:28 +0000258
259 Return a message object structure tree from an open binary :term:`file
260 object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300261 *_class* and *policy* are interpreted as with the
262 :class:`~email.parser.Parser` class constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000263
264 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400265 .. versionchanged:: 3.3
266 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000267
Georg Brandl116aa622007-08-15 14:28:22 +0000268Here's an example of how you might use this at an interactive Python prompt::
269
270 >>> import email
Andrew Svetlov439e17f2012-08-12 15:16:42 +0300271 >>> msg = email.message_from_string(myString) # doctest: +SKIP
Georg Brandl116aa622007-08-15 14:28:22 +0000272
273
274Additional notes
275^^^^^^^^^^^^^^^^
276
277Here are some notes on the parsing semantics:
278
279* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
280 object with a string payload. These objects will return ``False`` for
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300281 :meth:`~email.message.Message.is_multipart`. Their
282 :meth:`~email.message.Message.get_payload` method will return a string object.
Georg Brandl116aa622007-08-15 14:28:22 +0000283
284* All :mimetype:`multipart` type messages will be parsed as a container message
285 object with a list of sub-message objects for their payload. The outer
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300286 container message will return ``True`` for
287 :meth:`~email.message.Message.is_multipart` and their
288 :meth:`~email.message.Message.get_payload` method will return the list of
289 :class:`~email.message.Message` subparts.
Georg Brandl116aa622007-08-15 14:28:22 +0000290
291* Most messages with a content type of :mimetype:`message/\*` (e.g.
292 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
293 parsed as container object containing a list payload of length 1. Their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300294 :meth:`~email.message.Message.is_multipart` method will return ``True``.
295 The single element in the list payload will be a sub-message object.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
297* Some non-standards compliant messages may not be internally consistent about
298 their :mimetype:`multipart`\ -edness. Such messages may have a
299 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300300 :meth:`~email.message.Message.is_multipart` method may return ``False``.
301 If such messages were parsed with the :class:`~email.parser.FeedParser`,
302 they will have an instance of the
303 :class:`~email.errors.MultipartInvariantViolationDefect` class in their
304 *defects* attribute list. See :mod:`email.errors` for details.
Georg Brandl116aa622007-08-15 14:28:22 +0000305
306.. rubric:: Footnotes
307
308.. [#] As of email package version 3.0, introduced in Python 2.4, the classic
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300309 :class:`~email.parser.Parser` was re-implemented in terms of the
310 :class:`~email.parser.FeedParser`, so the semantics and results are
311 identical between the two parsers.
Georg Brandl116aa622007-08-15 14:28:22 +0000312