blob: ec74fe028a58928284a269237d5dae7dfb363b2a [file] [log] [blame]
R David Murray79cf3ba2012-05-27 17:10:36 -04001:mod:`email.parser`: Parsing email messages
2-------------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +00003
4.. module:: email.parser
5 :synopsis: Parse flat text email messages to produce a message object structure.
6
7
8Message object structures can be created in one of two ways: they can be created
Georg Brandl3638e482009-04-27 16:46:17 +00009from whole cloth by instantiating :class:`~email.message.Message` objects and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030010stringing them together via :meth:`~email.message.Message.attach` and
11:meth:`~email.message.Message.set_payload` calls, or they
Georg Brandl3638e482009-04-27 16:46:17 +000012can be created by parsing a flat text representation of the email message.
Georg Brandl116aa622007-08-15 14:28:22 +000013
14The :mod:`email` package provides a standard parser that understands most email
15document structures, including MIME documents. You can pass the parser a string
Georg Brandl3638e482009-04-27 16:46:17 +000016or a file object, and the parser will return to you the root
17:class:`~email.message.Message` instance of the object structure. For simple,
18non-MIME messages the payload of this root object will likely be a string
19containing the text of the message. For MIME messages, the root object will
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030020return ``True`` from its :meth:`~email.message.Message.is_multipart` method, and
21the subparts can be accessed via the :meth:`~email.message.Message.get_payload`
22and :meth:`~email.message.Message.walk` methods.
Georg Brandl116aa622007-08-15 14:28:22 +000023
24There are actually two parser interfaces available for use, the classic
25:class:`Parser` API and the incremental :class:`FeedParser` API. The classic
26:class:`Parser` API is fine if you have the entire text of the message in memory
27as a string, or if the entire message lives in a file on the file system.
28:class:`FeedParser` is more appropriate for when you're reading the message from
29a stream which might block waiting for more input (e.g. reading an email message
30from a socket). The :class:`FeedParser` can consume and parse the message
31incrementally, and only returns the root object when you close the parser [#]_.
32
33Note that the parser can be extended in limited ways, and of course you can
34implement your own parser completely from scratch. There is no magical
35connection between the :mod:`email` package's bundled parser and the
Georg Brandl3638e482009-04-27 16:46:17 +000036:class:`~email.message.Message` class, so your custom parser can create message
37object trees any way it finds necessary.
Georg Brandl116aa622007-08-15 14:28:22 +000038
39
40FeedParser API
41^^^^^^^^^^^^^^
42
Georg Brandl116aa622007-08-15 14:28:22 +000043The :class:`FeedParser`, imported from the :mod:`email.feedparser` module,
44provides an API that is conducive to incremental parsing of email messages, such
45as would be necessary when reading the text of an email message from a source
46that can block (e.g. a socket). The :class:`FeedParser` can of course be used
47to parse an email message fully contained in a string or a file, but the classic
48:class:`Parser` API may be more convenient for such use cases. The semantics
49and results of the two parser APIs are identical.
50
51The :class:`FeedParser`'s API is simple; you create an instance, feed it a bunch
52of text until there's no more to feed it, then close the parser to retrieve the
53root message object. The :class:`FeedParser` is extremely accurate when parsing
54standards-compliant messages, and it does a very good job of parsing
55non-compliant messages, providing information about how a message was deemed
56broken. It will populate a message object's *defects* attribute with a list of
57any problems it found in a message. See the :mod:`email.errors` module for the
58list of defects that it can find.
59
60Here is the API for the :class:`FeedParser`:
61
62
R David Murraye2524462014-05-06 21:33:18 -040063.. class:: FeedParser(_factory=email.message.Message, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +000064
65 Create a :class:`FeedParser` instance. Optional *_factory* is a no-argument
66 callable that will be called whenever a new message object is needed. It
67 defaults to the :class:`email.message.Message` class.
68
R David Murraye2524462014-05-06 21:33:18 -040069 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -040070 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -040071 message. If *policy* is not set, use the :class:`compat32
72 <email.policy.Compat32>` policy, which maintains backward compatibility with
73 the Python 3.2 version of the email package. For more information see the
74 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -040075
76 .. versionchanged:: 3.3 Added the *policy* keyword.
77
Benjamin Petersone41251e2008-04-25 01:59:09 +000078 .. method:: feed(data)
Georg Brandl116aa622007-08-15 14:28:22 +000079
Benjamin Petersone41251e2008-04-25 01:59:09 +000080 Feed the :class:`FeedParser` some more data. *data* should be a string
81 containing one or more lines. The lines can be partial and the
82 :class:`FeedParser` will stitch such partial lines together properly. The
83 lines in the string can have any of the common three line endings,
84 carriage return, newline, or carriage return and newline (they can even be
85 mixed).
Georg Brandl116aa622007-08-15 14:28:22 +000086
Benjamin Petersone41251e2008-04-25 01:59:09 +000087 .. method:: close()
Georg Brandl116aa622007-08-15 14:28:22 +000088
Benjamin Petersone41251e2008-04-25 01:59:09 +000089 Closing a :class:`FeedParser` completes the parsing of all previously fed
90 data, and returns the root message object. It is undefined what happens
91 if you feed more data to a closed :class:`FeedParser`.
Georg Brandl116aa622007-08-15 14:28:22 +000092
93
R. David Murray96fd54e2010-10-08 15:55:28 +000094.. class:: BytesFeedParser(_factory=email.message.Message)
95
96 Works exactly like :class:`FeedParser` except that the input to the
97 :meth:`~FeedParser.feed` method must be bytes and not string.
98
99 .. versionadded:: 3.2
100
101
Georg Brandl116aa622007-08-15 14:28:22 +0000102Parser class API
103^^^^^^^^^^^^^^^^
104
105The :class:`Parser` class, imported from the :mod:`email.parser` module,
106provides an API that can be used to parse a message when the complete contents
107of the message are available in a string or file. The :mod:`email.parser`
R David Murrayb35c8502011-04-13 16:46:05 -0400108module also provides header-only parsers, called :class:`HeaderParser` and
109:class:`BytesHeaderParser`, which can be used if you're only interested in the
110headers of the message. :class:`HeaderParser` and :class:`BytesHeaderParser`
111can be much faster in these situations, since they do not attempt to parse the
112message body, instead setting the payload to the raw body as a string. They
113have the same API as the :class:`Parser` and :class:`BytesParser` classes.
114
Georg Brandl61063cc2012-06-24 22:48:30 +0200115.. versionadded:: 3.3
116 The BytesHeaderParser class.
Georg Brandl116aa622007-08-15 14:28:22 +0000117
118
R David Murraye2524462014-05-06 21:33:18 -0400119.. class:: Parser(_class=email.message.Message, *, policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000120
121 The constructor for the :class:`Parser` class takes an optional argument
122 *_class*. This must be a callable factory (such as a function or a class), and
123 it is used whenever a sub-message object needs to be created. It defaults to
Georg Brandl3638e482009-04-27 16:46:17 +0000124 :class:`~email.message.Message` (see :mod:`email.message`). The factory will
125 be called without arguments.
Georg Brandl116aa622007-08-15 14:28:22 +0000126
R David Murraye2524462014-05-06 21:33:18 -0400127 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -0400128 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -0400129 message. If *policy* is not set, use the :class:`compat32
130 <email.policy.Compat32>` policy, which maintains backward compatibility with
131 the Python 3.2 version of the email package. For more information see the
132 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -0400133
134 .. versionchanged:: 3.3
135 Removed the *strict* argument that was deprecated in 2.4. Added the
136 *policy* keyword.
Georg Brandl116aa622007-08-15 14:28:22 +0000137
Benjamin Petersone41251e2008-04-25 01:59:09 +0000138 The other public :class:`Parser` methods are:
Georg Brandl116aa622007-08-15 14:28:22 +0000139
140
Georg Brandl3f076d82009-05-17 11:28:33 +0000141 .. method:: parse(fp, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000142
Benjamin Petersone41251e2008-04-25 01:59:09 +0000143 Read all the data from the file-like object *fp*, parse the resulting
144 text, and return the root message object. *fp* must support both the
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300145 :meth:`~io.TextIOBase.readline` and the :meth:`~io.TextIOBase.read`
146 methods on file-like objects.
Georg Brandl116aa622007-08-15 14:28:22 +0000147
Benjamin Petersone41251e2008-04-25 01:59:09 +0000148 The text contained in *fp* must be formatted as a block of :rfc:`2822`
149 style headers and header continuation lines, optionally preceded by a
150 envelope header. The header block is terminated either by the end of the
151 data or by a blank line. Following the header block is the body of the
152 message (which may contain MIME-encoded subparts).
Georg Brandl116aa622007-08-15 14:28:22 +0000153
Georg Brandlc875d202012-01-29 15:38:47 +0100154 Optional *headersonly* is a flag specifying whether to stop parsing after
155 reading the headers or not. The default is ``False``, meaning it parses
156 the entire contents of the file.
Georg Brandl116aa622007-08-15 14:28:22 +0000157
Georg Brandl3f076d82009-05-17 11:28:33 +0000158 .. method:: parsestr(text, headersonly=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000159
Benjamin Petersone41251e2008-04-25 01:59:09 +0000160 Similar to the :meth:`parse` method, except it takes a string object
161 instead of a file-like object. Calling this method on a string is exactly
R. David Murray96fd54e2010-10-08 15:55:28 +0000162 equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
Benjamin Petersone41251e2008-04-25 01:59:09 +0000163 calling :meth:`parse`.
Georg Brandl116aa622007-08-15 14:28:22 +0000164
Georg Brandlc875d202012-01-29 15:38:47 +0100165 Optional *headersonly* is as with the :meth:`parse` method.
Georg Brandl116aa622007-08-15 14:28:22 +0000166
Georg Brandl116aa622007-08-15 14:28:22 +0000167
R David Murraye2524462014-05-06 21:33:18 -0400168.. class:: BytesParser(_class=email.message.Message, *, policy=policy.compat32)
R. David Murray96fd54e2010-10-08 15:55:28 +0000169
170 This class is exactly parallel to :class:`Parser`, but handles bytes input.
171 The *_class* and *strict* arguments are interpreted in the same way as for
R David Murray3edd22a2011-04-18 13:59:37 -0400172 the :class:`Parser` constructor.
173
R David Murraye2524462014-05-06 21:33:18 -0400174 If *policy* is specified (it must be an instance of a :mod:`~email.policy`
R David Murraya83ade12014-05-08 10:05:47 -0400175 class) use the rules it specifies to update the representation of the
R David Murraye2524462014-05-06 21:33:18 -0400176 message. If *policy* is not set, use the :class:`compat32
177 <email.policy.Compat32>` policy, which maintains backward compatibility with
178 the Python 3.2 version of the email package. For more information see the
179 :mod:`~email.policy` documentation.
R David Murray3edd22a2011-04-18 13:59:37 -0400180
181 .. versionchanged:: 3.3
182 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000183
Jesus Ceaca2e02c2014-09-22 00:43:39 +0200184 .. method:: parse(fp, headersonly=False)
R. David Murray96fd54e2010-10-08 15:55:28 +0000185
186 Read all the data from the binary file-like object *fp*, parse the
187 resulting bytes, and return the message object. *fp* must support
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300188 both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read`
189 methods on file-like objects.
R. David Murray96fd54e2010-10-08 15:55:28 +0000190
191 The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
192 style headers and header continuation lines, optionally preceded by a
193 envelope header. The header block is terminated either by the end of the
194 data or by a blank line. Following the header block is the body of the
195 message (which may contain MIME-encoded subparts, including subparts
196 with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
197
198 Optional *headersonly* is a flag specifying whether to stop parsing after
199 reading the headers or not. The default is ``False``, meaning it parses
200 the entire contents of the file.
201
202 .. method:: parsebytes(bytes, headersonly=False)
203
204 Similar to the :meth:`parse` method, except it takes a byte string object
205 instead of a file-like object. Calling this method on a byte string is
206 exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
207 first and calling :meth:`parse`.
208
209 Optional *headersonly* is as with the :meth:`parse` method.
210
211 .. versionadded:: 3.2
212
213
Georg Brandl116aa622007-08-15 14:28:22 +0000214Since creating a message object structure from a string or a file object is such
R. David Murray96fd54e2010-10-08 15:55:28 +0000215a common task, four functions are provided as a convenience. They are available
Georg Brandl116aa622007-08-15 14:28:22 +0000216in the top-level :mod:`email` package namespace.
217
Georg Brandla971c652008-11-07 09:39:56 +0000218.. currentmodule:: email
Georg Brandl116aa622007-08-15 14:28:22 +0000219
R David Murray3edd22a2011-04-18 13:59:37 -0400220.. function:: message_from_string(s, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400221 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000222
223 Return a message object structure from a string. This is exactly equivalent to
R David Murray3edd22a2011-04-18 13:59:37 -0400224 ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300225 with the :class:`~email.parser.Parser` class constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000226
R David Murray6a45d3b2011-04-18 16:00:47 -0400227 .. versionchanged:: 3.3
228 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400229
R David Murray6a45d3b2011-04-18 16:00:47 -0400230.. function:: message_from_bytes(s, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400231 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000232
R. David Murray96fd54e2010-10-08 15:55:28 +0000233 Return a message object structure from a byte string. This is exactly
234 equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300235 *strict* are interpreted as with the :class:`~email.parser.Parser` class
236 constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000237
238 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400239 .. versionchanged:: 3.3
240 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000241
R David Murray3edd22a2011-04-18 13:59:37 -0400242.. function:: message_from_file(fp, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400243 policy=policy.compat32)
Georg Brandl116aa622007-08-15 14:28:22 +0000244
Antoine Pitrou11cb9612010-09-15 11:11:28 +0000245 Return a message object structure tree from an open :term:`file object`.
R David Murray3edd22a2011-04-18 13:59:37 -0400246 This is exactly equivalent to ``Parser().parse(fp)``. *_class*
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300247 and *policy* are interpreted as with the :class:`~email.parser.Parser` class
248 constructor.
Georg Brandl116aa622007-08-15 14:28:22 +0000249
R David Murray6a45d3b2011-04-18 16:00:47 -0400250 .. versionchanged::
251 Removed the *strict* argument. Added the *policy* keyword.
R David Murray3edd22a2011-04-18 13:59:37 -0400252
253.. function:: message_from_binary_file(fp, _class=email.message.Message, *, \
R David Murraye2524462014-05-06 21:33:18 -0400254 policy=policy.compat32)
R. David Murray96fd54e2010-10-08 15:55:28 +0000255
256 Return a message object structure tree from an open binary :term:`file
257 object`. This is exactly equivalent to ``BytesParser().parse(fp)``.
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300258 *_class* and *policy* are interpreted as with the
259 :class:`~email.parser.Parser` class constructor.
R. David Murray96fd54e2010-10-08 15:55:28 +0000260
261 .. versionadded:: 3.2
R David Murray6a45d3b2011-04-18 16:00:47 -0400262 .. versionchanged:: 3.3
263 Removed the *strict* argument. Added the *policy* keyword.
R. David Murray96fd54e2010-10-08 15:55:28 +0000264
Georg Brandl116aa622007-08-15 14:28:22 +0000265Here's an example of how you might use this at an interactive Python prompt::
266
267 >>> import email
Andrew Svetlov439e17f2012-08-12 15:16:42 +0300268 >>> msg = email.message_from_string(myString) # doctest: +SKIP
Georg Brandl116aa622007-08-15 14:28:22 +0000269
270
271Additional notes
272^^^^^^^^^^^^^^^^
273
274Here are some notes on the parsing semantics:
275
276* Most non-\ :mimetype:`multipart` type messages are parsed as a single message
277 object with a string payload. These objects will return ``False`` for
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300278 :meth:`~email.message.Message.is_multipart`. Their
279 :meth:`~email.message.Message.get_payload` method will return a string object.
Georg Brandl116aa622007-08-15 14:28:22 +0000280
281* All :mimetype:`multipart` type messages will be parsed as a container message
282 object with a list of sub-message objects for their payload. The outer
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300283 container message will return ``True`` for
284 :meth:`~email.message.Message.is_multipart` and their
285 :meth:`~email.message.Message.get_payload` method will return the list of
286 :class:`~email.message.Message` subparts.
Georg Brandl116aa622007-08-15 14:28:22 +0000287
288* Most messages with a content type of :mimetype:`message/\*` (e.g.
289 :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also be
290 parsed as container object containing a list payload of length 1. Their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300291 :meth:`~email.message.Message.is_multipart` method will return ``True``.
292 The single element in the list payload will be a sub-message object.
Georg Brandl116aa622007-08-15 14:28:22 +0000293
294* Some non-standards compliant messages may not be internally consistent about
295 their :mimetype:`multipart`\ -edness. Such messages may have a
296 :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300297 :meth:`~email.message.Message.is_multipart` method may return ``False``.
298 If such messages were parsed with the :class:`~email.parser.FeedParser`,
299 they will have an instance of the
300 :class:`~email.errors.MultipartInvariantViolationDefect` class in their
301 *defects* attribute list. See :mod:`email.errors` for details.
Georg Brandl116aa622007-08-15 14:28:22 +0000302
303.. rubric:: Footnotes
304
305.. [#] As of email package version 3.0, introduced in Python 2.4, the classic
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300306 :class:`~email.parser.Parser` was re-implemented in terms of the
307 :class:`~email.parser.FeedParser`, so the semantics and results are
308 identical between the two parsers.
Georg Brandl116aa622007-08-15 14:28:22 +0000309