blob: b65f18ec5005365a474a3f2551f10d2e901aba5c [file] [log] [blame]
Fred Drake3a0351c1998-04-04 07:23:21 +00001\section{Standard Module \module{rfc822}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003\stmodindex{rfc822}
4
Guido van Rossum86751151995-02-28 17:14:32 +00005
Fred Drakecdea8a31998-03-14 06:17:43 +00006This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00007collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00008\rfc{822}. It is used in various contexts, usually to read such
9headers from a file.
Guido van Rossuma12ef941995-02-27 17:53:25 +000010
Fred Drake5ca90331997-12-16 15:19:47 +000011Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000012style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000013
Fred Drakecdea8a31998-03-14 06:17:43 +000014\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000015A \class{Message} instance is instantiated with an input object as
16parameter. Message relies only on the input object having a
17\code{readline} method; in particular, ordinary file objects qualify.
18Instantiation reads headers from the input object up to a delimiter
19line (normally a blank line) and stores them in the instance.
20
21If the input object has \code{seek} and \code{tell} methods, the
22last action of the class initialization is to try to seek the object
23to just before the blank line that terminates the headers.
24Otherwise, if the input object has an \code{unread} method, that
25method is used to push back the delimiter line.
26
27The optional \code{seekable} argument is provided as a workaround for
28certain stdio libraries in which tell() discards buffered data before
29discovering that the \code{lseek()} system call doesn't work. For
30maximum portability, you should set the seekable argument to zero to
31prevent that initial \code{tell} when passing in an unseekable object
32such as a a file object created from a socket object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000033
34Input lines as read from the file may either be terminated by CR-LF or
35by a single linefeed; a terminating CR-LF is replaced by a single
36linefeed before the line is stored.
37
38All header matching is done independent of upper or lower case;
Fred Drakecdea8a31998-03-14 06:17:43 +000039e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and
40\code{\var{m}['FROM']} all yield the same result.
41\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000042
Guido van Rossum843e7121996-12-06 21:23:53 +000043\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000044Attempts to parse a date according to the rules in \rfc{822}.
45however, some mailers don't follow that format as specified, so
46\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000047\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000048\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
49the date, \function{parsedate()} returns a 9-tuple that can be passed
50directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000051returned.
52\end{funcdesc}
53
54\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000055Performs the same function as \function{parsedate()}, but returns
56either \code{None} or a 10-tuple; the first 9 elements make up a tuple
57that can be passed directly to \function{time.mktime()}, and the tenth
58is the offset of the date's timezone from UTC (which is the official
59term for Greenwich Mean Time). (Note that the sign of the timezone
60offset is the opposite of the sign of the \code{time.timezone}
61variable for the same timezone; the latter variable follows the
62\POSIX{} standard while this module follows \rfc{822}.) If the input
63string has no timezone, the last element of the tuple returned is
64\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000065\end{funcdesc}
66
Guido van Rossum8cf94e61998-02-18 05:09:14 +000067\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000068Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
69timestamp. It the timezone item in the tuple is \code{None}, assume
70local time. Minor deficiency: this first interprets the first 8
71elements as a local time and then compensates for the timezone
72difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000073switch dates. Not enough to worry about for common use.
74\end{funcdesc}
75
Guido van Rossumecde7811995-03-28 13:35:14 +000076\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000077\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000078
Fred Drakecdea8a31998-03-14 06:17:43 +000079A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000080
Fred Drakee14dde21998-04-04 06:19:30 +000081\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000082Seek to the start of the message body. This only works if the file
83object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000084\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000085
Guido van Rossum12991001998-06-10 21:34:27 +000086\begin{methoddesc}{islast}{line}
87Return true if the given line is a delimiter on which Message should
88stop. By default this method just checks that the line is blank, but
89you can override it in a subclass.
90\end{methoddesc}
91
92\begin{methoddesc}{iscomment}{line}
93Return true if the given line should be ignored entirely, just skipped.
94By default this is a stub that always returns false, but you can
95override it in a subclass.
96\end{methoddesc}
97
Fred Drakee14dde21998-04-04 06:19:30 +000098\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +000099Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000100\var{name}, if any. Each physical line, whether it is a continuation
101line or not, is a separate list item. Return the empty list if no
102header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000103\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000104
Fred Drakee14dde21998-04-04 06:19:30 +0000105\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000106Return a list of lines comprising the first header matching
107\var{name}, and its continuation line(s), if any. Return \code{None}
108if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000109\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000110
Fred Drakee14dde21998-04-04 06:19:30 +0000111\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000112Return a single string consisting of the text after the colon in the
113first header matching \var{name}. This includes leading whitespace,
114the trailing linefeed, and internal linefeeds and whitespace if there
115any continuation line(s) were present. Return \code{None} if there is
116no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000117\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000118
Guido van Rossum12991001998-06-10 21:34:27 +0000119\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000120Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000121whitespace. Internal whitespace is not stripped. The optional
122\var{default} argument can be used to specify a different default to
123be returned when there is no header matching \var{name}.
124\end{methoddesc}
125
126\begin{methoddesc}{get}{name\optional{, default}}
127An alias for \code{getheader()}, to make the interface more compatible
128with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000129\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000130
Fred Drakee14dde21998-04-04 06:19:30 +0000131\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000132Return a pair \code{(\var{full name}, \var{email address})} parsed
133from the string returned by \code{getheader(\var{name})}. If no
134header matching \var{name} exists, return \code{(None, None)};
135otherwise both the full name and the address are (possibly empty)
136strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000137
Fred Drakecdea8a31998-03-14 06:17:43 +0000138Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000139\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000140\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000141\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000142If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000143\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000144exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000145\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000146
Fred Drakee14dde21998-04-04 06:19:30 +0000147\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000148This is similar to \code{getaddr(\var{list})}, but parses a header
149containing a list of email addresses (e.g. a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000150returns a list of \code{(\var{full name}, \var{email address})} pairs
151(even if there was only one address in the header). If there is no
152header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000153
154XXX The current version of this function is not really correct. It
155yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000156\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000157
Fred Drakee14dde21998-04-04 06:19:30 +0000158\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000159Retrieve a header using \method{getheader()} and parse it into a 9-tuple
160compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000161\var{name}, or it is unparsable, return \code{None}.
162
163Date parsing appears to be a black art, and not all mailers adhere to
164the standard. While it has been tested and found correct on a large
165collection of email from many sources, it is still possible that this
166function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000167\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000168
Fred Drakee14dde21998-04-04 06:19:30 +0000169\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000170Retrieve a header using \method{getheader()} and parse it into a
17110-tuple; the first 9 elements will make a tuple compatible with
172\function{time.mktime()}, and the 10th is a number giving the offset
173of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000174there is no header matching \var{name}, or it is unparsable, return
175\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000176\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000177
Fred Drakecdea8a31998-03-14 06:17:43 +0000178\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000179In particular: \code{\var{m}[name]} is like
180\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
181there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000182\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
183\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
184(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000185
Fred Drakecdea8a31998-03-14 06:17:43 +0000186Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000187
Fred Drakee14dde21998-04-04 06:19:30 +0000188\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000189A list containing the entire set of header lines, in the order in
190which they were read. Each line contains a trailing newline. The
191blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000192\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000193
Fred Drakee14dde21998-04-04 06:19:30 +0000194\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000195The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000196\end{memberdesc}