blob: d97e0f33e899abe59d7638aca59b5355174840f8 [file] [log] [blame]
Fred Drake3a0351c1998-04-04 07:23:21 +00001\section{Standard Module \module{rfc822}}
Guido van Rossume47da0a1997-07-17 16:34:52 +00002\label{module-rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003\stmodindex{rfc822}
4
Guido van Rossum86751151995-02-28 17:14:32 +00005
Fred Drakecdea8a31998-03-14 06:17:43 +00006This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00007collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00008\rfc{822}. It is used in various contexts, usually to read such
9headers from a file.
Guido van Rossuma12ef941995-02-27 17:53:25 +000010
Fred Drake5ca90331997-12-16 15:19:47 +000011Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000012style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000013
Fred Drakecdea8a31998-03-14 06:17:43 +000014\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000015A \class{Message} instance is instantiated with an input object as
16parameter. Message relies only on the input object having a
17\code{readline} method; in particular, ordinary file objects qualify.
18Instantiation reads headers from the input object up to a delimiter
19line (normally a blank line) and stores them in the instance.
20
Guido van Rossum444d0f81998-06-11 13:50:02 +000021This class can work with any input object that supports a readline
22method. If the input object has seek and tell capability, the
23\code{rewindbody} method will work; also, illegal lines will be pushed back
24onto the input stream. If the input object lacks seek but has an
25\code{unread} method that can push back a line of input, Message will use
26that to push back illegal lines. Thus this class can be used to parse
27messages coming from a buffered stream.
Guido van Rossum12991001998-06-10 21:34:27 +000028
29The optional \code{seekable} argument is provided as a workaround for
30certain stdio libraries in which tell() discards buffered data before
31discovering that the \code{lseek()} system call doesn't work. For
32maximum portability, you should set the seekable argument to zero to
33prevent that initial \code{tell} when passing in an unseekable object
34such as a a file object created from a socket object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000035
36Input lines as read from the file may either be terminated by CR-LF or
37by a single linefeed; a terminating CR-LF is replaced by a single
38linefeed before the line is stored.
39
40All header matching is done independent of upper or lower case;
Fred Drakecdea8a31998-03-14 06:17:43 +000041e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and
42\code{\var{m}['FROM']} all yield the same result.
43\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000044
Guido van Rossum843e7121996-12-06 21:23:53 +000045\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000046Attempts to parse a date according to the rules in \rfc{822}.
47however, some mailers don't follow that format as specified, so
48\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000049\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000050\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
51the date, \function{parsedate()} returns a 9-tuple that can be passed
52directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000053returned.
54\end{funcdesc}
55
56\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000057Performs the same function as \function{parsedate()}, but returns
58either \code{None} or a 10-tuple; the first 9 elements make up a tuple
59that can be passed directly to \function{time.mktime()}, and the tenth
60is the offset of the date's timezone from UTC (which is the official
61term for Greenwich Mean Time). (Note that the sign of the timezone
62offset is the opposite of the sign of the \code{time.timezone}
63variable for the same timezone; the latter variable follows the
64\POSIX{} standard while this module follows \rfc{822}.) If the input
65string has no timezone, the last element of the tuple returned is
66\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000067\end{funcdesc}
68
Guido van Rossum8cf94e61998-02-18 05:09:14 +000069\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000070Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
71timestamp. It the timezone item in the tuple is \code{None}, assume
72local time. Minor deficiency: this first interprets the first 8
73elements as a local time and then compensates for the timezone
74difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000075switch dates. Not enough to worry about for common use.
76\end{funcdesc}
77
Guido van Rossumecde7811995-03-28 13:35:14 +000078\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000079\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000080
Fred Drakecdea8a31998-03-14 06:17:43 +000081A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000082
Fred Drakee14dde21998-04-04 06:19:30 +000083\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000084Seek to the start of the message body. This only works if the file
85object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000086\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000087
Guido van Rossum444d0f81998-06-11 13:50:02 +000088\begin{methoddesc}{isheader}{line}
89Returns a line's canonicalized fieldname (the dictionary key that will
90be used to index it) if the line is a legal RFC822 header; otherwise
91returns None (implying that parsing should stop here and the line be
92pushed back on the input stream). It is sometimes useful to override
93this method in a subclass.
94\end{methoddesc}
95
Guido van Rossum12991001998-06-10 21:34:27 +000096\begin{methoddesc}{islast}{line}
97Return true if the given line is a delimiter on which Message should
Guido van Rossum444d0f81998-06-11 13:50:02 +000098stop. The delimiter line is consumed, and the file object's read
99location positioned immediately after it. By default this method just
100checks that the line is blank, but you can override it in a subclass.
Guido van Rossum12991001998-06-10 21:34:27 +0000101\end{methoddesc}
102
103\begin{methoddesc}{iscomment}{line}
104Return true if the given line should be ignored entirely, just skipped.
105By default this is a stub that always returns false, but you can
106override it in a subclass.
107\end{methoddesc}
108
Fred Drakee14dde21998-04-04 06:19:30 +0000109\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000110Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000111\var{name}, if any. Each physical line, whether it is a continuation
112line or not, is a separate list item. Return the empty list if no
113header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000114\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000115
Fred Drakee14dde21998-04-04 06:19:30 +0000116\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000117Return a list of lines comprising the first header matching
118\var{name}, and its continuation line(s), if any. Return \code{None}
119if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000120\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000121
Fred Drakee14dde21998-04-04 06:19:30 +0000122\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000123Return a single string consisting of the text after the colon in the
124first header matching \var{name}. This includes leading whitespace,
125the trailing linefeed, and internal linefeeds and whitespace if there
126any continuation line(s) were present. Return \code{None} if there is
127no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000128\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000129
Guido van Rossum12991001998-06-10 21:34:27 +0000130\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000131Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000132whitespace. Internal whitespace is not stripped. The optional
133\var{default} argument can be used to specify a different default to
134be returned when there is no header matching \var{name}.
135\end{methoddesc}
136
137\begin{methoddesc}{get}{name\optional{, default}}
138An alias for \code{getheader()}, to make the interface more compatible
139with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000140\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000141
Fred Drakee14dde21998-04-04 06:19:30 +0000142\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000143Return a pair \code{(\var{full name}, \var{email address})} parsed
144from the string returned by \code{getheader(\var{name})}. If no
145header matching \var{name} exists, return \code{(None, None)};
146otherwise both the full name and the address are (possibly empty)
147strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000148
Fred Drakecdea8a31998-03-14 06:17:43 +0000149Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000150\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000151\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000152\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000153If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000154\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000155exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000156\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000157
Fred Drakee14dde21998-04-04 06:19:30 +0000158\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000159This is similar to \code{getaddr(\var{list})}, but parses a header
160containing a list of email addresses (e.g. a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000161returns a list of \code{(\var{full name}, \var{email address})} pairs
162(even if there was only one address in the header). If there is no
163header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000164
165XXX The current version of this function is not really correct. It
166yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000167\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000168
Fred Drakee14dde21998-04-04 06:19:30 +0000169\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000170Retrieve a header using \method{getheader()} and parse it into a 9-tuple
171compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000172\var{name}, or it is unparsable, return \code{None}.
173
174Date parsing appears to be a black art, and not all mailers adhere to
175the standard. While it has been tested and found correct on a large
176collection of email from many sources, it is still possible that this
177function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000178\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000179
Fred Drakee14dde21998-04-04 06:19:30 +0000180\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000181Retrieve a header using \method{getheader()} and parse it into a
18210-tuple; the first 9 elements will make a tuple compatible with
183\function{time.mktime()}, and the 10th is a number giving the offset
184of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000185there is no header matching \var{name}, or it is unparsable, return
186\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000187\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000188
Fred Drakecdea8a31998-03-14 06:17:43 +0000189\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000190In particular: \code{\var{m}[name]} is like
191\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
192there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000193\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
194\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
195(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000196
Fred Drakecdea8a31998-03-14 06:17:43 +0000197Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000198
Fred Drakee14dde21998-04-04 06:19:30 +0000199\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000200A list containing the entire set of header lines, in the order in
201which they were read. Each line contains a trailing newline. The
202blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000203\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000204
Fred Drakee14dde21998-04-04 06:19:30 +0000205\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000206The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000207\end{memberdesc}