blob: fc6a58dba89e3c42a17168b4b53279d570ec8f44 [file] [log] [blame]
Fred Drake3a0351c1998-04-04 07:23:21 +00001\section{Standard Module \module{rfc822}}
Fred Drakeb91e9341998-07-23 17:59:49 +00002\declaremodule{standard}{rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Parse \rfc{822} style mail headers.}
5
6%\index{RFC!RFC 822}
Guido van Rossum86751151995-02-28 17:14:32 +00007
Fred Drakecdea8a31998-03-14 06:17:43 +00008This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00009collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +000010\rfc{822}. It is used in various contexts, usually to read such
Guido van Rossum87294831998-06-16 22:27:40 +000011headers from a file. This module also defines a helper class
12\class{AddressList} for parsing RFC822 addresses.
Guido van Rossuma12ef941995-02-27 17:53:25 +000013
Fred Drake5ca90331997-12-16 15:19:47 +000014Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000015style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000016
Fred Drakecdea8a31998-03-14 06:17:43 +000017\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000018A \class{Message} instance is instantiated with an input object as
19parameter. Message relies only on the input object having a
20\code{readline} method; in particular, ordinary file objects qualify.
21Instantiation reads headers from the input object up to a delimiter
22line (normally a blank line) and stores them in the instance.
23
Guido van Rossum444d0f81998-06-11 13:50:02 +000024This class can work with any input object that supports a readline
25method. If the input object has seek and tell capability, the
26\code{rewindbody} method will work; also, illegal lines will be pushed back
27onto the input stream. If the input object lacks seek but has an
28\code{unread} method that can push back a line of input, Message will use
29that to push back illegal lines. Thus this class can be used to parse
30messages coming from a buffered stream.
Guido van Rossum12991001998-06-10 21:34:27 +000031
32The optional \code{seekable} argument is provided as a workaround for
33certain stdio libraries in which tell() discards buffered data before
34discovering that the \code{lseek()} system call doesn't work. For
35maximum portability, you should set the seekable argument to zero to
36prevent that initial \code{tell} when passing in an unseekable object
37such as a a file object created from a socket object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000038
39Input lines as read from the file may either be terminated by CR-LF or
40by a single linefeed; a terminating CR-LF is replaced by a single
41linefeed before the line is stored.
42
43All header matching is done independent of upper or lower case;
Fred Drakecdea8a31998-03-14 06:17:43 +000044e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and
45\code{\var{m}['FROM']} all yield the same result.
46\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000047
Guido van Rossum87294831998-06-16 22:27:40 +000048\begin{classdesc}{AddressList}{field}
49You may instantiate the AddresssList helper class using a single
50string parameter, a comma-separated list of RFC822 addresses to be
51parsed. (The parameter None yields an empty list.)
52\end{classdesc}
53
Guido van Rossum843e7121996-12-06 21:23:53 +000054\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000055Attempts to parse a date according to the rules in \rfc{822}.
56however, some mailers don't follow that format as specified, so
57\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000058\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000059\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
60the date, \function{parsedate()} returns a 9-tuple that can be passed
61directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000062returned.
63\end{funcdesc}
64
65\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000066Performs the same function as \function{parsedate()}, but returns
67either \code{None} or a 10-tuple; the first 9 elements make up a tuple
68that can be passed directly to \function{time.mktime()}, and the tenth
69is the offset of the date's timezone from UTC (which is the official
70term for Greenwich Mean Time). (Note that the sign of the timezone
71offset is the opposite of the sign of the \code{time.timezone}
72variable for the same timezone; the latter variable follows the
73\POSIX{} standard while this module follows \rfc{822}.) If the input
74string has no timezone, the last element of the tuple returned is
75\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000076\end{funcdesc}
77
Guido van Rossum8cf94e61998-02-18 05:09:14 +000078\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000079Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
80timestamp. It the timezone item in the tuple is \code{None}, assume
81local time. Minor deficiency: this first interprets the first 8
82elements as a local time and then compensates for the timezone
83difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000084switch dates. Not enough to worry about for common use.
85\end{funcdesc}
86
Guido van Rossumecde7811995-03-28 13:35:14 +000087\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000088\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000089
Fred Drakecdea8a31998-03-14 06:17:43 +000090A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000091
Fred Drakee14dde21998-04-04 06:19:30 +000092\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000093Seek to the start of the message body. This only works if the file
94object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000095\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000096
Guido van Rossum444d0f81998-06-11 13:50:02 +000097\begin{methoddesc}{isheader}{line}
98Returns a line's canonicalized fieldname (the dictionary key that will
99be used to index it) if the line is a legal RFC822 header; otherwise
100returns None (implying that parsing should stop here and the line be
101pushed back on the input stream). It is sometimes useful to override
102this method in a subclass.
103\end{methoddesc}
104
Guido van Rossum12991001998-06-10 21:34:27 +0000105\begin{methoddesc}{islast}{line}
106Return true if the given line is a delimiter on which Message should
Guido van Rossum444d0f81998-06-11 13:50:02 +0000107stop. The delimiter line is consumed, and the file object's read
108location positioned immediately after it. By default this method just
109checks that the line is blank, but you can override it in a subclass.
Guido van Rossum12991001998-06-10 21:34:27 +0000110\end{methoddesc}
111
112\begin{methoddesc}{iscomment}{line}
113Return true if the given line should be ignored entirely, just skipped.
114By default this is a stub that always returns false, but you can
115override it in a subclass.
116\end{methoddesc}
117
Fred Drakee14dde21998-04-04 06:19:30 +0000118\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000119Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000120\var{name}, if any. Each physical line, whether it is a continuation
121line or not, is a separate list item. Return the empty list if no
122header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000123\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000124
Fred Drakee14dde21998-04-04 06:19:30 +0000125\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000126Return a list of lines comprising the first header matching
127\var{name}, and its continuation line(s), if any. Return \code{None}
128if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000129\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000130
Fred Drakee14dde21998-04-04 06:19:30 +0000131\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000132Return a single string consisting of the text after the colon in the
133first header matching \var{name}. This includes leading whitespace,
134the trailing linefeed, and internal linefeeds and whitespace if there
135any continuation line(s) were present. Return \code{None} if there is
136no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000137\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000138
Guido van Rossum12991001998-06-10 21:34:27 +0000139\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000140Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000141whitespace. Internal whitespace is not stripped. The optional
142\var{default} argument can be used to specify a different default to
143be returned when there is no header matching \var{name}.
144\end{methoddesc}
145
146\begin{methoddesc}{get}{name\optional{, default}}
147An alias for \code{getheader()}, to make the interface more compatible
148with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000149\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000150
Fred Drakee14dde21998-04-04 06:19:30 +0000151\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000152Return a pair \code{(\var{full name}, \var{email address})} parsed
153from the string returned by \code{getheader(\var{name})}. If no
154header matching \var{name} exists, return \code{(None, None)};
155otherwise both the full name and the address are (possibly empty)
156strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000157
Fred Drakecdea8a31998-03-14 06:17:43 +0000158Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000159\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000160\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000161\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000162If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000163\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000164exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000165\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000166
Fred Drakee14dde21998-04-04 06:19:30 +0000167\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000168This is similar to \code{getaddr(\var{list})}, but parses a header
169containing a list of email addresses (e.g. a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000170returns a list of \code{(\var{full name}, \var{email address})} pairs
171(even if there was only one address in the header). If there is no
172header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000173
174XXX The current version of this function is not really correct. It
175yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000176\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000177
Fred Drakee14dde21998-04-04 06:19:30 +0000178\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000179Retrieve a header using \method{getheader()} and parse it into a 9-tuple
180compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000181\var{name}, or it is unparsable, return \code{None}.
182
183Date parsing appears to be a black art, and not all mailers adhere to
184the standard. While it has been tested and found correct on a large
185collection of email from many sources, it is still possible that this
186function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000187\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000188
Fred Drakee14dde21998-04-04 06:19:30 +0000189\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000190Retrieve a header using \method{getheader()} and parse it into a
19110-tuple; the first 9 elements will make a tuple compatible with
192\function{time.mktime()}, and the 10th is a number giving the offset
193of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000194there is no header matching \var{name}, or it is unparsable, return
195\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000196\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000197
Fred Drakecdea8a31998-03-14 06:17:43 +0000198\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000199In particular: \code{\var{m}[name]} is like
200\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
201there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000202\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
203\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
204(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000205
Fred Drakecdea8a31998-03-14 06:17:43 +0000206Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000207
Fred Drakee14dde21998-04-04 06:19:30 +0000208\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000209A list containing the entire set of header lines, in the order in
Guido van Rossum87294831998-06-16 22:27:40 +0000210which they were read (except that setitem calls may disturb this
211order). Each line contains a trailing newline. The
Guido van Rossuma12ef941995-02-27 17:53:25 +0000212blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000213\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000214
Fred Drakee14dde21998-04-04 06:19:30 +0000215\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000216The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000217\end{memberdesc}
Guido van Rossum87294831998-06-16 22:27:40 +0000218
219\subsection{AddressList Objects}
220\label{addresslist-objects}
221
222An \class{AddressList} instance has the following methods:
223
224\begin{methoddesc}{__len__}{name}
225Return the number of addresses in the address list.
226\end{methoddesc}
227
228\begin{methoddesc}{__str__}{name}
229Return a canonicalized string representation of the address list.
230Addresses are rendered in "name" <host@domain> form, comma-separated.
231\end{methoddesc}
232
233\begin{methoddesc}{__add__}{name}
234Return an AddressList instance that contains all addresses in both
235AddressList operands, with duplicates removed (set union).
236\end{methoddesc}
237
238\begin{methoddesc}{__sub__}{name}
239Return an AddressList instance that contains every address in the
240left-hand AddressList operand that is not present in the right-hand
241address operand (set difference).
242\end{methoddesc}
243
244
245Finally, \class{AddressList} instances have one public instance variable:
246
247\begin{memberdesc}{addresslist}
248A list of tuple string pairs, one per address. In each member, the
249first is the canonicalized name part of the address, the second is the
250route-address (@-separated host-domain pair).
251\end{memberdesc}