| |
| :mod:`rfc822` --- Parse RFC 2822 mail headers |
| ============================================= |
| |
| .. module:: rfc822 |
| :synopsis: Parse 2822 style mail messages. |
| :deprecated: |
| |
| |
| .. deprecated:: 2.3 |
| The :mod:`email` package should be used in preference to the :mod:`rfc822` |
| module. This module is present only to maintain backward compatibility. |
| |
| This module defines a class, :class:`Message`, which represents an "email |
| message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages |
| consist of a collection of message headers, and a message body. This module |
| also defines a helper class :class:`AddressList` for parsing :rfc:`2822` |
| addresses. Please refer to the RFC for information on the specific syntax of |
| :rfc:`2822` messages. |
| |
| .. index:: module: mailbox |
| |
| The :mod:`mailbox` module provides classes to read mailboxes produced by |
| various end-user mail programs. |
| |
| |
| .. class:: Message(file[, seekable]) |
| |
| A :class:`Message` instance is instantiated with an input object as parameter. |
| Message relies only on the input object having a :meth:`readline` method; in |
| particular, ordinary file objects qualify. Instantiation reads headers from the |
| input object up to a delimiter line (normally a blank line) and stores them in |
| the instance. The message body, following the headers, is not consumed. |
| |
| This class can work with any input object that supports a :meth:`readline` |
| method. If the input object has seek and tell capability, the |
| :meth:`rewindbody` method will work; also, illegal lines will be pushed back |
| onto the input stream. If the input object lacks seek but has an :meth:`unread` |
| method that can push back a line of input, :class:`Message` will use that to |
| push back illegal lines. Thus this class can be used to parse messages coming |
| from a buffered stream. |
| |
| The optional *seekable* argument is provided as a workaround for certain stdio |
| libraries in which :cfunc:`tell` discards buffered data before discovering that |
| the :cfunc:`lseek` system call doesn't work. For maximum portability, you |
| should set the seekable argument to zero to prevent that initial :meth:`tell` |
| when passing in an unseekable object such as a file object created from a socket |
| object. |
| |
| Input lines as read from the file may either be terminated by CR-LF or by a |
| single linefeed; a terminating CR-LF is replaced by a single linefeed before the |
| line is stored. |
| |
| All header matching is done independent of upper or lower case; e.g. |
| ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result. |
| |
| |
| .. class:: AddressList(field) |
| |
| You may instantiate the :class:`AddressList` helper class using a single string |
| parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The |
| parameter ``None`` yields an empty list.) |
| |
| |
| .. function:: quote(str) |
| |
| Return a new string with backslashes in *str* replaced by two backslashes and |
| double quotes replaced by backslash-double quote. |
| |
| |
| .. function:: unquote(str) |
| |
| Return a new string which is an *unquoted* version of *str*. If *str* ends and |
| begins with double quotes, they are stripped off. Likewise if *str* ends and |
| begins with angle brackets, they are stripped off. |
| |
| |
| .. function:: parseaddr(address) |
| |
| Parse *address*, which should be the value of some address-containing field such |
| as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and |
| "email address" parts. Returns a tuple of that information, unless the parse |
| fails, in which case a 2-tuple ``(None, None)`` is returned. |
| |
| |
| .. function:: dump_address_pair(pair) |
| |
| The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname, |
| email_address)`` and returns the string value suitable for a :mailheader:`To` or |
| :mailheader:`Cc` header. If the first element of *pair* is false, then the |
| second element is returned unmodified. |
| |
| |
| .. function:: parsedate(date) |
| |
| Attempts to parse a date according to the rules in :rfc:`2822`. however, some |
| mailers don't follow that format as specified, so :func:`parsedate` tries to |
| guess correctly in such cases. *date* is a string containing an :rfc:`2822` |
| date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing |
| the date, :func:`parsedate` returns a 9-tuple that can be passed directly to |
| :func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6, |
| 7, and 8 of the result tuple are not usable. |
| |
| |
| .. function:: parsedate_tz(date) |
| |
| Performs the same function as :func:`parsedate`, but returns either ``None`` or |
| a 10-tuple; the first 9 elements make up a tuple that can be passed directly to |
| :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC |
| (which is the official term for Greenwich Mean Time). (Note that the sign of |
| the timezone offset is the opposite of the sign of the ``time.timezone`` |
| variable for the same timezone; the latter variable follows the POSIX standard |
| while this module follows :rfc:`2822`.) If the input string has no timezone, |
| the last element of the tuple returned is ``None``. Note that indexes 6, 7, and |
| 8 of the result tuple are not usable. |
| |
| |
| .. function:: mktime_tz(tuple) |
| |
| Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If |
| the timezone item in the tuple is ``None``, assume local time. Minor |
| deficiency: this first interprets the first 8 elements as a local time and then |
| compensates for the timezone difference; this may yield a slight error around |
| daylight savings time switch dates. Not enough to worry about for common use. |
| |
| |
| .. seealso:: |
| |
| Module :mod:`email` |
| Comprehensive email handling package; supersedes the :mod:`rfc822` module. |
| |
| Module :mod:`mailbox` |
| Classes to read various mailbox formats produced by end-user mail programs. |
| |
| Module :mod:`mimetools` |
| Subclass of :class:`rfc822.Message` that handles MIME encoded messages. |
| |
| |
| .. _message-objects: |
| |
| Message Objects |
| --------------- |
| |
| A :class:`Message` instance has the following methods: |
| |
| |
| .. method:: Message.rewindbody() |
| |
| Seek to the start of the message body. This only works if the file object is |
| seekable. |
| |
| |
| .. method:: Message.isheader(line) |
| |
| Returns a line's canonicalized fieldname (the dictionary key that will be used |
| to index it) if the line is a legal :rfc:`2822` header; otherwise returns |
| ``None`` (implying that parsing should stop here and the line be pushed back on |
| the input stream). It is sometimes useful to override this method in a |
| subclass. |
| |
| |
| .. method:: Message.islast(line) |
| |
| Return true if the given line is a delimiter on which Message should stop. The |
| delimiter line is consumed, and the file object's read location positioned |
| immediately after it. By default this method just checks that the line is |
| blank, but you can override it in a subclass. |
| |
| |
| .. method:: Message.iscomment(line) |
| |
| Return ``True`` if the given line should be ignored entirely, just skipped. By |
| default this is a stub that always returns ``False``, but you can override it in |
| a subclass. |
| |
| |
| .. method:: Message.getallmatchingheaders(name) |
| |
| Return a list of lines consisting of all headers matching *name*, if any. Each |
| physical line, whether it is a continuation line or not, is a separate list |
| item. Return the empty list if no header matches *name*. |
| |
| |
| .. method:: Message.getfirstmatchingheader(name) |
| |
| Return a list of lines comprising the first header matching *name*, and its |
| continuation line(s), if any. Return ``None`` if there is no header matching |
| *name*. |
| |
| |
| .. method:: Message.getrawheader(name) |
| |
| Return a single string consisting of the text after the colon in the first |
| header matching *name*. This includes leading whitespace, the trailing |
| linefeed, and internal linefeeds and whitespace if there any continuation |
| line(s) were present. Return ``None`` if there is no header matching *name*. |
| |
| |
| .. method:: Message.getheader(name[, default]) |
| |
| Return a single string consisting of the last header matching *name*, |
| but strip leading and trailing whitespace. |
| Internal whitespace is not stripped. The optional *default* argument can be |
| used to specify a different default to be returned when there is no header |
| matching *name*; it defaults to ``None``. |
| This is the preferred way to get parsed headers. |
| |
| |
| .. method:: Message.get(name[, default]) |
| |
| An alias for :meth:`getheader`, to make the interface more compatible with |
| regular dictionaries. |
| |
| |
| .. method:: Message.getaddr(name) |
| |
| Return a pair ``(full name, email address)`` parsed from the string returned by |
| ``getheader(name)``. If no header matching *name* exists, return ``(None, |
| None)``; otherwise both the full name and the address are (possibly empty) |
| strings. |
| |
| Example: If *m*'s first :mailheader:`From` header contains the string |
| ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair |
| ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen |
| <jack@cwi.nl>'`` instead, it would yield the exact same result. |
| |
| |
| .. method:: Message.getaddrlist(name) |
| |
| This is similar to ``getaddr(list)``, but parses a header containing a list of |
| email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full |
| name, email address)`` pairs (even if there was only one address in the header). |
| If there is no header matching *name*, return an empty list. |
| |
| If multiple headers exist that match the named header (e.g. if there are several |
| :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines |
| the named headers contain are also parsed. |
| |
| |
| .. method:: Message.getdate(name) |
| |
| Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible |
| with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If |
| there is no header matching *name*, or it is unparsable, return ``None``. |
| |
| Date parsing appears to be a black art, and not all mailers adhere to the |
| standard. While it has been tested and found correct on a large collection of |
| email from many sources, it is still possible that this function may |
| occasionally yield an incorrect result. |
| |
| |
| .. method:: Message.getdate_tz(name) |
| |
| Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the |
| first 9 elements will make a tuple compatible with :func:`time.mktime`, and the |
| 10th is a number giving the offset of the date's timezone from UTC. Note that |
| fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is |
| no header matching *name*, or it is unparsable, return ``None``. |
| |
| :class:`Message` instances also support a limited mapping interface. In |
| particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError` |
| if there is no matching header; and ``len(m)``, ``m.get(name[, default])``, |
| ``m.has_key(name)``, ``m.keys()``, ``m.values()`` ``m.items()``, and |
| ``m.setdefault(name[, default])`` act as expected, with the one difference |
| that :meth:`setdefault` uses an empty string as the default value. |
| :class:`Message` instances also support the mapping writable interface ``m[name] |
| = value`` and ``del m[name]``. :class:`Message` objects do not support the |
| :meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the |
| mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only |
| added in Python 2.2.) |
| |
| Finally, :class:`Message` instances have some public instance variables: |
| |
| |
| .. attribute:: Message.headers |
| |
| A list containing the entire set of header lines, in the order in which they |
| were read (except that setitem calls may disturb this order). Each line contains |
| a trailing newline. The blank line terminating the headers is not contained in |
| the list. |
| |
| |
| .. attribute:: Message.fp |
| |
| The file or file-like object passed at instantiation time. This can be used to |
| read the message content. |
| |
| |
| .. attribute:: Message.unixfrom |
| |
| The Unix ``From`` line, if the message had one, or an empty string. This is |
| needed to regenerate the message in some contexts, such as an ``mbox``\ -style |
| mailbox file. |
| |
| |
| .. _addresslist-objects: |
| |
| AddressList Objects |
| ------------------- |
| |
| An :class:`AddressList` instance has the following methods: |
| |
| |
| .. method:: AddressList.__len__() |
| |
| Return the number of addresses in the address list. |
| |
| |
| .. method:: AddressList.__str__() |
| |
| Return a canonicalized string representation of the address list. Addresses are |
| rendered in "name" <host@domain> form, comma-separated. |
| |
| |
| .. method:: AddressList.__add__(alist) |
| |
| Return a new :class:`AddressList` instance that contains all addresses in both |
| :class:`AddressList` operands, with duplicates removed (set union). |
| |
| |
| .. method:: AddressList.__iadd__(alist) |
| |
| In-place version of :meth:`__add__`; turns this :class:`AddressList` instance |
| into the union of itself and the right-hand instance, *alist*. |
| |
| |
| .. method:: AddressList.__sub__(alist) |
| |
| Return a new :class:`AddressList` instance that contains every address in the |
| left-hand :class:`AddressList` operand that is not present in the right-hand |
| address operand (set difference). |
| |
| |
| .. method:: AddressList.__isub__(alist) |
| |
| In-place version of :meth:`__sub__`, removing addresses in this list which are |
| also in *alist*. |
| |
| Finally, :class:`AddressList` instances have one public instance variable: |
| |
| |
| .. attribute:: AddressList.addresslist |
| |
| A list of tuple string pairs, one per address. In each member, the first is the |
| canonicalized name part, the second is the actual route-address (``'@'``\ |
| -separated username-host.domain pair). |
| |
| .. rubric:: Footnotes |
| |
| .. [#] This module originally conformed to :rfc:`822`, hence the name. Since then, |
| :rfc:`2822` has been released as an update to :rfc:`822`. This module should be |
| considered :rfc:`2822`\ -conformant, especially in cases where the syntax or |
| semantics have changed since :rfc:`822`. |
| |