blob: bfc63c901a39ea0d66087653586b48bd41e9e651 [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{rfc822}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003\stmodindex{rfc822}
4
Guido van Rossum86751151995-02-28 17:14:32 +00005
Fred Drakecdea8a31998-03-14 06:17:43 +00006This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00007collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00008\rfc{822}. It is used in various contexts, usually to read such
9headers from a file.
Guido van Rossuma12ef941995-02-27 17:53:25 +000010
Fred Drake5ca90331997-12-16 15:19:47 +000011Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000012style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000013
Fred Drakecdea8a31998-03-14 06:17:43 +000014\begin{classdesc}{Message}{file\optional{, seekable}}
15A \class{Message} instance is instantiated with an open file object as
16parameter. The optional \var{seekable} parameter indicates if the
17file object is seekable; the default value is \code{1} for true.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000018Instantiation reads headers from the file up to a blank line and
19stores them in the instance; after instantiation, the file is
Guido van Rossuma12ef941995-02-27 17:53:25 +000020positioned directly after the blank line that terminates the headers.
21
22Input lines as read from the file may either be terminated by CR-LF or
23by a single linefeed; a terminating CR-LF is replaced by a single
24linefeed before the line is stored.
25
26All header matching is done independent of upper or lower case;
Fred Drakecdea8a31998-03-14 06:17:43 +000027e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and
28\code{\var{m}['FROM']} all yield the same result.
29\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000030
Guido van Rossum843e7121996-12-06 21:23:53 +000031\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000032Attempts to parse a date according to the rules in \rfc{822}.
33however, some mailers don't follow that format as specified, so
34\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000035\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000036\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
37the date, \function{parsedate()} returns a 9-tuple that can be passed
38directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000039returned.
40\end{funcdesc}
41
42\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000043Performs the same function as \function{parsedate()}, but returns
44either \code{None} or a 10-tuple; the first 9 elements make up a tuple
45that can be passed directly to \function{time.mktime()}, and the tenth
46is the offset of the date's timezone from UTC (which is the official
47term for Greenwich Mean Time). (Note that the sign of the timezone
48offset is the opposite of the sign of the \code{time.timezone}
49variable for the same timezone; the latter variable follows the
50\POSIX{} standard while this module follows \rfc{822}.) If the input
51string has no timezone, the last element of the tuple returned is
52\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000053\end{funcdesc}
54
Guido van Rossum8cf94e61998-02-18 05:09:14 +000055\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000056Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
57timestamp. It the timezone item in the tuple is \code{None}, assume
58local time. Minor deficiency: this first interprets the first 8
59elements as a local time and then compensates for the timezone
60difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000061switch dates. Not enough to worry about for common use.
62\end{funcdesc}
63
Guido van Rossumecde7811995-03-28 13:35:14 +000064\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000065\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000066
Fred Drakecdea8a31998-03-14 06:17:43 +000067A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000068
Fred Drakee14dde21998-04-04 06:19:30 +000069\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000070Seek to the start of the message body. This only works if the file
71object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000072\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000073
Fred Drakee14dde21998-04-04 06:19:30 +000074\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +000075Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +000076\var{name}, if any. Each physical line, whether it is a continuation
77line or not, is a separate list item. Return the empty list if no
78header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +000079\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000080
Fred Drakee14dde21998-04-04 06:19:30 +000081\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +000082Return a list of lines comprising the first header matching
83\var{name}, and its continuation line(s), if any. Return \code{None}
84if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +000085\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000086
Fred Drakee14dde21998-04-04 06:19:30 +000087\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +000088Return a single string consisting of the text after the colon in the
89first header matching \var{name}. This includes leading whitespace,
90the trailing linefeed, and internal linefeeds and whitespace if there
91any continuation line(s) were present. Return \code{None} if there is
92no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +000093\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000094
Fred Drakee14dde21998-04-04 06:19:30 +000095\begin{methoddesc}{getheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +000096Like \code{getrawheader(\var{name})}, but strip leading and trailing
Fred Drakecdea8a31998-03-14 06:17:43 +000097whitespace. Internal whitespace is not stripped.
Fred Drakee14dde21998-04-04 06:19:30 +000098\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000099
Fred Drakee14dde21998-04-04 06:19:30 +0000100\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000101Return a pair \code{(\var{full name}, \var{email address})} parsed
102from the string returned by \code{getheader(\var{name})}. If no
103header matching \var{name} exists, return \code{(None, None)};
104otherwise both the full name and the address are (possibly empty)
105strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000106
Fred Drakecdea8a31998-03-14 06:17:43 +0000107Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000108\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000109\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000110\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000111If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000112\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000113exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000114\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000115
Fred Drakee14dde21998-04-04 06:19:30 +0000116\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000117This is similar to \code{getaddr(\var{list})}, but parses a header
118containing a list of email addresses (e.g. a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000119returns a list of \code{(\var{full name}, \var{email address})} pairs
120(even if there was only one address in the header). If there is no
121header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000122
123XXX The current version of this function is not really correct. It
124yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000125\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000126
Fred Drakee14dde21998-04-04 06:19:30 +0000127\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000128Retrieve a header using \method{getheader()} and parse it into a 9-tuple
129compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000130\var{name}, or it is unparsable, return \code{None}.
131
132Date parsing appears to be a black art, and not all mailers adhere to
133the standard. While it has been tested and found correct on a large
134collection of email from many sources, it is still possible that this
135function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000136\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000137
Fred Drakee14dde21998-04-04 06:19:30 +0000138\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000139Retrieve a header using \method{getheader()} and parse it into a
14010-tuple; the first 9 elements will make a tuple compatible with
141\function{time.mktime()}, and the 10th is a number giving the offset
142of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000143there is no header matching \var{name}, or it is unparsable, return
144\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000145\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000146
Fred Drakecdea8a31998-03-14 06:17:43 +0000147\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000148In particular: \code{\var{m}[name]} is like
149\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
150there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000151\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
152\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
153(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000154
Fred Drakecdea8a31998-03-14 06:17:43 +0000155Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000156
Fred Drakee14dde21998-04-04 06:19:30 +0000157\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000158A list containing the entire set of header lines, in the order in
159which they were read. Each line contains a trailing newline. The
160blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000161\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000162
Fred Drakee14dde21998-04-04 06:19:30 +0000163\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000164The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000165\end{memberdesc}