| :mod:`email.parser`: Parsing email messages |
| ------------------------------------------- |
| |
| .. module:: email.parser |
| :synopsis: Parse flat text email messages to produce a message object structure. |
| |
| **Source code:** :source:`Lib/email/parser.py` |
| |
| -------------- |
| |
| Message object structures can be created in one of two ways: they can be |
| created from whole cloth by creating an :class:`~email.message.EmailMessage` |
| object, adding headers using the dictionary interface, and adding payload(s) |
| using :meth:`~email.message.EmailMessage.set_content` and related methods, or |
| they can be created by parsing a serialized representation of the email |
| message. |
| |
| The :mod:`email` package provides a standard parser that understands most email |
| document structures, including MIME documents. You can pass the parser a |
| bytes, string or file object, and the parser will return to you the root |
| :class:`~email.message.EmailMessage` instance of the object structure. For |
| simple, non-MIME messages the payload of this root object will likely be a |
| string containing the text of the message. For MIME messages, the root object |
| will return ``True`` from its :meth:`~email.message.EmailMessage.is_multipart` |
| method, and the subparts can be accessed via the payload manipulation methods, |
| such as :meth:`~email.message.EmailMessage.get_body`, |
| :meth:`~email.message.EmailMessage.iter_parts`, and |
| :meth:`~email.message.EmailMessage.walk`. |
| |
| There are actually two parser interfaces available for use, the :class:`Parser` |
| API and the incremental :class:`FeedParser` API. The :class:`Parser` API is |
| most useful if you have the entire text of the message in memory, or if the |
| entire message lives in a file on the file system. :class:`FeedParser` is more |
| appropriate when you are reading the message from a stream which might block |
| waiting for more input (such as reading an email message from a socket). The |
| :class:`FeedParser` can consume and parse the message incrementally, and only |
| returns the root object when you close the parser. |
| |
| Note that the parser can be extended in limited ways, and of course you can |
| implement your own parser completely from scratch. All of the logic that |
| connects the :mod:`email` package's bundled parser and the |
| :class:`~email.message.EmailMessage` class is embodied in the :mod:`policy` |
| class, so a custom parser can create message object trees any way it finds |
| necessary by implementing custom versions of the appropriate :mod:`policy` |
| methods. |
| |
| |
| FeedParser API |
| ^^^^^^^^^^^^^^ |
| |
| The :class:`BytesFeedParser`, imported from the :mod:`email.feedparser` module, |
| provides an API that is conducive to incremental parsing of email messages, |
| such as would be necessary when reading the text of an email message from a |
| source that can block (such as a socket). The :class:`BytesFeedParser` can of |
| course be used to parse an email message fully contained in a :term:`bytes-like |
| object`, string, or file, but the :class:`BytesParser` API may be more |
| convenient for such use cases. The semantics and results of the two parser |
| APIs are identical. |
| |
| The :class:`BytesFeedParser`'s API is simple; you create an instance, feed it a |
| bunch of bytes until there's no more to feed it, then close the parser to |
| retrieve the root message object. The :class:`BytesFeedParser` is extremely |
| accurate when parsing standards-compliant messages, and it does a very good job |
| of parsing non-compliant messages, providing information about how a message |
| was deemed broken. It will populate a message object's |
| :attr:`~email.message.EmailMessage.defects` attribute with a list of any |
| problems it found in a message. See the :mod:`email.errors` module for the |
| list of defects that it can find. |
| |
| Here is the API for the :class:`BytesFeedParser`: |
| |
| |
| .. class:: BytesFeedParser(_factory=None, *, policy=policy.compat32) |
| |
| Create a :class:`BytesFeedParser` instance. Optional *_factory* is a |
| no-argument callable; if not specified use the |
| :attr:`~email.policy.Policy.message_factory` from the *policy*. Call |
| *_factory* whenever a new message object is needed. |
| |
| If *policy* is specified use the rules it specifies to update the |
| representation of the message. If *policy* is not set, use the |
| :class:`compat32 <email.policy.Compat32>` policy, which maintains backward |
| compatibility with the Python 3.2 version of the email package and provides |
| :class:`~email.message.Message` as the default factory. All other policies |
| provide :class:`~email.message.EmailMessage` as the default *_factory*. For |
| more information on what else *policy* controls, see the |
| :mod:`~email.policy` documentation. |
| |
| Note: **The policy keyword should always be specified**; The default will |
| change to :data:`email.policy.default` in a future version of Python. |
| |
| .. versionadded:: 3.2 |
| |
| .. versionchanged:: 3.3 Added the *policy* keyword. |
| .. versionchanged:: 3.6 *_factory* defaults to the policy ``message_factory``. |
| |
| |
| .. method:: feed(data) |
| |
| Feed the parser some more data. *data* should be a :term:`bytes-like |
| object` containing one or more lines. The lines can be partial and the |
| parser will stitch such partial lines together properly. The lines can |
| have any of the three common line endings: carriage return, newline, or |
| carriage return and newline (they can even be mixed). |
| |
| |
| .. method:: close() |
| |
| Complete the parsing of all previously fed data and return the root |
| message object. It is undefined what happens if :meth:`~feed` is called |
| after this method has been called. |
| |
| |
| .. class:: FeedParser(_factory=None, *, policy=policy.compat32) |
| |
| Works like :class:`BytesFeedParser` except that the input to the |
| :meth:`~BytesFeedParser.feed` method must be a string. This is of limited |
| utility, since the only way for such a message to be valid is for it to |
| contain only ASCII text or, if :attr:`~email.policy.Policy.utf8` is |
| ``True``, no binary attachments. |
| |
| .. versionchanged:: 3.3 Added the *policy* keyword. |
| |
| |
| Parser API |
| ^^^^^^^^^^ |
| |
| The :class:`BytesParser` class, imported from the :mod:`email.parser` module, |
| provides an API that can be used to parse a message when the complete contents |
| of the message are available in a :term:`bytes-like object` or file. The |
| :mod:`email.parser` module also provides :class:`Parser` for parsing strings, |
| and header-only parsers, :class:`BytesHeaderParser` and |
| :class:`HeaderParser`, which can be used if you're only interested in the |
| headers of the message. :class:`BytesHeaderParser` and :class:`HeaderParser` |
| can be much faster in these situations, since they do not attempt to parse the |
| message body, instead setting the payload to the raw body. |
| |
| |
| .. class:: BytesParser(_class=None, *, policy=policy.compat32) |
| |
| Create a :class:`BytesParser` instance. The *_class* and *policy* |
| arguments have the same meaning and semantics as the *_factory* |
| and *policy* arguments of :class:`BytesFeedParser`. |
| |
| Note: **The policy keyword should always be specified**; The default will |
| change to :data:`email.policy.default` in a future version of Python. |
| |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument that was deprecated in 2.4. Added the |
| *policy* keyword. |
| .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. |
| |
| |
| .. method:: parse(fp, headersonly=False) |
| |
| Read all the data from the binary file-like object *fp*, parse the |
| resulting bytes, and return the message object. *fp* must support |
| both the :meth:`~io.IOBase.readline` and the :meth:`~io.IOBase.read` |
| methods. |
| |
| The bytes contained in *fp* must be formatted as a block of :rfc:`5322` |
| (or, if :attr:`~email.policy.Policy.utf8` is ``True``, :rfc:`6532`) |
| style headers and header continuation lines, optionally preceded by an |
| envelope header. The header block is terminated either by the end of the |
| data or by a blank line. Following the header block is the body of the |
| message (which may contain MIME-encoded subparts, including subparts |
| with a :mailheader:`Content-Transfer-Encoding` of ``8bit``). |
| |
| Optional *headersonly* is a flag specifying whether to stop parsing after |
| reading the headers or not. The default is ``False``, meaning it parses |
| the entire contents of the file. |
| |
| |
| .. method:: parsebytes(bytes, headersonly=False) |
| |
| Similar to the :meth:`parse` method, except it takes a :term:`bytes-like |
| object` instead of a file-like object. Calling this method on a |
| :term:`bytes-like object` is equivalent to wrapping *bytes* in a |
| :class:`~io.BytesIO` instance first and calling :meth:`parse`. |
| |
| Optional *headersonly* is as with the :meth:`parse` method. |
| |
| .. versionadded:: 3.2 |
| |
| |
| .. class:: BytesHeaderParser(_class=None, *, policy=policy.compat32) |
| |
| Exactly like :class:`BytesParser`, except that *headersonly* |
| defaults to ``True``. |
| |
| .. versionadded:: 3.3 |
| |
| |
| .. class:: Parser(_class=None, *, policy=policy.compat32) |
| |
| This class is parallel to :class:`BytesParser`, but handles string input. |
| |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument. Added the *policy* keyword. |
| .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. |
| |
| |
| .. method:: parse(fp, headersonly=False) |
| |
| Read all the data from the text-mode file-like object *fp*, parse the |
| resulting text, and return the root message object. *fp* must support |
| both the :meth:`~io.TextIOBase.readline` and the |
| :meth:`~io.TextIOBase.read` methods on file-like objects. |
| |
| Other than the text mode requirement, this method operates like |
| :meth:`BytesParser.parse`. |
| |
| |
| .. method:: parsestr(text, headersonly=False) |
| |
| Similar to the :meth:`parse` method, except it takes a string object |
| instead of a file-like object. Calling this method on a string is |
| equivalent to wrapping *text* in a :class:`~io.StringIO` instance first |
| and calling :meth:`parse`. |
| |
| Optional *headersonly* is as with the :meth:`parse` method. |
| |
| |
| .. class:: HeaderParser(_class=None, *, policy=policy.compat32) |
| |
| Exactly like :class:`Parser`, except that *headersonly* |
| defaults to ``True``. |
| |
| |
| Since creating a message object structure from a string or a file object is such |
| a common task, four functions are provided as a convenience. They are available |
| in the top-level :mod:`email` package namespace. |
| |
| .. currentmodule:: email |
| |
| |
| .. function:: message_from_bytes(s, _class=None, *, policy=policy.compat32) |
| |
| Return a message object structure from a :term:`bytes-like object`. This is |
| equivalent to ``BytesParser().parsebytes(s)``. Optional *_class* and |
| *policy* are interpreted as with the :class:`~email.parser.BytesParser` class |
| constructor. |
| |
| .. versionadded:: 3.2 |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument. Added the *policy* keyword. |
| |
| |
| .. function:: message_from_binary_file(fp, _class=None, *, \ |
| policy=policy.compat32) |
| |
| Return a message object structure tree from an open binary :term:`file |
| object`. This is equivalent to ``BytesParser().parse(fp)``. *_class* and |
| *policy* are interpreted as with the :class:`~email.parser.BytesParser` class |
| constructor. |
| |
| .. versionadded:: 3.2 |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument. Added the *policy* keyword. |
| |
| |
| .. function:: message_from_string(s, _class=None, *, policy=policy.compat32) |
| |
| Return a message object structure from a string. This is equivalent to |
| ``Parser().parsestr(s)``. *_class* and *policy* are interpreted as |
| with the :class:`~email.parser.Parser` class constructor. |
| |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument. Added the *policy* keyword. |
| |
| |
| .. function:: message_from_file(fp, _class=None, *, policy=policy.compat32) |
| |
| Return a message object structure tree from an open :term:`file object`. |
| This is equivalent to ``Parser().parse(fp)``. *_class* and *policy* are |
| interpreted as with the :class:`~email.parser.Parser` class constructor. |
| |
| .. versionchanged:: 3.3 |
| Removed the *strict* argument. Added the *policy* keyword. |
| .. versionchanged:: 3.6 *_class* defaults to the policy ``message_factory``. |
| |
| |
| Here's an example of how you might use :func:`message_from_bytes` at an |
| interactive Python prompt:: |
| |
| >>> import email |
| >>> msg = email.message_from_bytes(myBytes) # doctest: +SKIP |
| |
| |
| Additional notes |
| ^^^^^^^^^^^^^^^^ |
| |
| Here are some notes on the parsing semantics: |
| |
| * Most non-\ :mimetype:`multipart` type messages are parsed as a single message |
| object with a string payload. These objects will return ``False`` for |
| :meth:`~email.message.EmailMessage.is_multipart`, and |
| :meth:`~email.message.EmailMessage.iter_parts` will yield an empty list. |
| |
| * All :mimetype:`multipart` type messages will be parsed as a container message |
| object with a list of sub-message objects for their payload. The outer |
| container message will return ``True`` for |
| :meth:`~email.message.EmailMessage.is_multipart`, and |
| :meth:`~email.message.EmailMessage.iter_parts` will yield a list of subparts. |
| |
| * Most messages with a content type of :mimetype:`message/\*` (such as |
| :mimetype:`message/delivery-status` and :mimetype:`message/rfc822`) will also |
| be parsed as container object containing a list payload of length 1. Their |
| :meth:`~email.message.EmailMessage.is_multipart` method will return ``True``. |
| The single element yielded by :meth:`~email.message.EmailMessage.iter_parts` |
| will be a sub-message object. |
| |
| * Some non-standards-compliant messages may not be internally consistent about |
| their :mimetype:`multipart`\ -edness. Such messages may have a |
| :mailheader:`Content-Type` header of type :mimetype:`multipart`, but their |
| :meth:`~email.message.EmailMessage.is_multipart` method may return ``False``. |
| If such messages were parsed with the :class:`~email.parser.FeedParser`, |
| they will have an instance of the |
| :class:`~email.errors.MultipartInvariantViolationDefect` class in their |
| *defects* attribute list. See :mod:`email.errors` for details. |