| \section{Standard Module \sectcode{rfc822}} |
| \label{module-rfc822} |
| \stmodindex{rfc822} |
| |
| \setindexsubitem{(in module rfc822)} |
| |
| This module defines a class, \code{Message}, which represents a |
| collection of ``email headers'' as defined by the Internet standard |
| \rfc{822}. It is used in various contexts, usually to read such |
| headers from a file. |
| |
| Note that there's a separate module to read \UNIX{}, MH, and MMDF |
| style mailbox files: \code{mailbox}. |
| \refstmodindex{mailbox} |
| |
| A \code{Message} instance is instantiated with an open file object as |
| parameter. The optional \code{seekable} parameter indicates if the |
| file object is seekable; the default value is 1 for true. |
| Instantiation reads headers from the file up to a blank line and |
| stores them in the instance; after instantiation, the file is |
| positioned directly after the blank line that terminates the headers. |
| |
| Input lines as read from the file may either be terminated by CR-LF or |
| by a single linefeed; a terminating CR-LF is replaced by a single |
| linefeed before the line is stored. |
| |
| All header matching is done independent of upper or lower case; |
| e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield |
| the same result. |
| |
| \begin{funcdesc}{parsedate}{date} |
| Attempts to parse a date according to the rules in \rfc{822}. however, |
| some mailers don't follow that format as specified, so |
| \code{parsedate()} tries to guess correctly in such cases. |
| \var{date} is a string containing an \rfc{822} date, such as |
| \code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing |
| the date, \code{parsedate()} returns a 9-tuple that can be passed |
| directly to \code{time.mktime()}; otherwise \code{None} will be |
| returned. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{parsedate_tz}{date} |
| Performs the same function as \code{parsedate()}, but returns either |
| \code{None} or a 10-tuple; the first 9 elements make up a tuple that |
| can be passed directly to \code{time.mktime()}, and the tenth is the |
| offset of the date's timezone from UTC (which is the official term |
| for Greenwich Mean Time). (Note that the sign of the timezone offset |
| is the opposite of the sign of the \code{time.timezone} variable for |
| the same timezone; the latter variable follows the \POSIX{} standard |
| while this module follows \rfc{822}.) If the input string has no |
| timezone, the last element of the tuple returned is \code{None}. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{mktime_tz}{tuple} |
| Turn a 10-tuple as returned by \code{parsedate_tz()} into a UTC timestamp. |
| It the timezone item in the tuple is \code{None}, assume local time. |
| Minor deficiency: this first interprets the first 8 elements as a |
| local time and then compensates for the timezone difference; |
| this may yield a slight error around daylight savings time |
| switch dates. Not enough to worry about for common use. |
| \end{funcdesc} |
| |
| \subsection{Message Objects} |
| |
| A \code{Message} instance has the following methods: |
| |
| \begin{funcdesc}{rewindbody}{} |
| Seek to the start of the message body. This only works if the file |
| object is seekable. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getallmatchingheaders}{name} |
| Return a list of lines consisting of all headers matching |
| \var{name}, if any. Each physical line, whether it is a continuation |
| line or not, is a separate list item. Return the empty list if no |
| header matches \var{name}. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getfirstmatchingheader}{name} |
| Return a list of lines comprising the first header matching |
| \var{name}, and its continuation line(s), if any. Return \code{None} |
| if there is no header matching \var{name}. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getrawheader}{name} |
| Return a single string consisting of the text after the colon in the |
| first header matching \var{name}. This includes leading whitespace, |
| the trailing linefeed, and internal linefeeds and whitespace if there |
| any continuation line(s) were present. Return \code{None} if there is |
| no header matching \var{name}. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getheader}{name} |
| Like \code{getrawheader(\var{name})}, but strip leading and trailing |
| whitespace (but not internal whitespace). |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getaddr}{name} |
| Return a pair (full name, email address) parsed from the string |
| returned by \code{getheader(\var{name})}. If no header matching |
| \var{name} exists, return \code{None, None}; otherwise both the full |
| name and the address are (possibly empty )strings. |
| |
| Example: If \code{m}'s first \code{From} header contains the string\\ |
| \code{'jack@cwi.nl (Jack Jansen)'}, then |
| \code{m.getaddr('From')} will yield the pair |
| \code{('Jack Jansen', 'jack@cwi.nl')}. |
| If the header contained |
| \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the |
| exact same result. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getaddrlist}{name} |
| This is similar to \code{getaddr(\var{list})}, but parses a header |
| containing a list of email addresses (e.g. a \code{To} header) and |
| returns a list of (full name, email address) pairs (even if there was |
| only one address in the header). If there is no header matching |
| \var{name}, return an empty list. |
| |
| XXX The current version of this function is not really correct. It |
| yields bogus results if a full name contains a comma. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getdate}{name} |
| Retrieve a header using \code{getheader} and parse it into a 9-tuple |
| compatible with \code{time.mktime()}. If there is no header matching |
| \var{name}, or it is unparsable, return \code{None}. |
| |
| Date parsing appears to be a black art, and not all mailers adhere to |
| the standard. While it has been tested and found correct on a large |
| collection of email from many sources, it is still possible that this |
| function may occasionally yield an incorrect result. |
| \end{funcdesc} |
| |
| \begin{funcdesc}{getdate_tz}{name} |
| Retrieve a header using \code{getheader} and parse it into a 10-tuple; |
| the first 9 elements will make a tuple compatible with |
| \code{time.mktime()}, and the 10th is a number giving the offset of |
| the date's timezone from UTC. Similarly to \code{getdate()}, if |
| there is no header matching \var{name}, or it is unparsable, return |
| \code{None}. |
| \end{funcdesc} |
| |
| \code{Message} instances also support a read-only mapping interface. |
| In particular: \code{m[name]} is the same as \code{m.getheader(name)}; |
| and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()}, |
| \code{m.values()} and \code{m.items()} act as expected (and |
| consistently). |
| |
| Finally, \code{Message} instances have two public instance variables: |
| |
| \begin{datadesc}{headers} |
| A list containing the entire set of header lines, in the order in |
| which they were read. Each line contains a trailing newline. The |
| blank line terminating the headers is not contained in the list. |
| \end{datadesc} |
| |
| \begin{datadesc}{fp} |
| The file object passed at instantiation time. |
| \end{datadesc} |