blob: 61aadd896927b00e9999038781a6fa70eb69e7d5 [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{rfc822} ---
2 Parse RFC 822 mail headers.}
Fred Drakeb91e9341998-07-23 17:59:49 +00003\declaremodule{standard}{rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00004
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Parse \rfc{822} style mail headers.}
6
Fred Drakecdea8a31998-03-14 06:17:43 +00007This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00008collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00009\rfc{822}. It is used in various contexts, usually to read such
Guido van Rossum87294831998-06-16 22:27:40 +000010headers from a file. This module also defines a helper class
11\class{AddressList} for parsing RFC822 addresses.
Guido van Rossuma12ef941995-02-27 17:53:25 +000012
Fred Drake5ca90331997-12-16 15:19:47 +000013Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000014style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000015
Fred Drakecdea8a31998-03-14 06:17:43 +000016\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000017A \class{Message} instance is instantiated with an input object as
18parameter. Message relies only on the input object having a
Fred Drake23329d41998-08-10 17:46:22 +000019\method{readline()} method; in particular, ordinary file objects
20qualify. Instantiation reads headers from the input object up to a
21delimiter line (normally a blank line) and stores them in the
22instance.
Guido van Rossum12991001998-06-10 21:34:27 +000023
Fred Drake23329d41998-08-10 17:46:22 +000024This class can work with any input object that supports a
25\method{readline()} method. If the input object has seek and tell
26capability, the \method{rewindbody()} method will work; also, illegal
27lines will be pushed back onto the input stream. If the input object
28lacks seek but has an \method{unread()} method that can push back a
29line of input, \class{Message} will use that to push back illegal
30lines. Thus this class can be used to parse messages coming from a
31buffered stream.
Guido van Rossum12991001998-06-10 21:34:27 +000032
Fred Drake23329d41998-08-10 17:46:22 +000033The optional \var{seekable} argument is provided as a workaround for
34certain stdio libraries in which \cfunction{tell()} discards buffered
35data before discovering that the \cfunction{lseek()} system call
36doesn't work. For maximum portability, you should set the seekable
37argument to zero to prevent that initial \method{tell()} when passing
38in an unseekable object such as a a file object created from a socket
39object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000040
41Input lines as read from the file may either be terminated by CR-LF or
42by a single linefeed; a terminating CR-LF is replaced by a single
43linefeed before the line is stored.
44
45All header matching is done independent of upper or lower case;
Fred Drake23329d41998-08-10 17:46:22 +000046e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
Fred Drakecdea8a31998-03-14 06:17:43 +000047\code{\var{m}['FROM']} all yield the same result.
48\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000049
Guido van Rossum87294831998-06-16 22:27:40 +000050\begin{classdesc}{AddressList}{field}
51You may instantiate the AddresssList helper class using a single
Fred Drake23329d41998-08-10 17:46:22 +000052string parameter, a comma-separated list of \rfc{822} addresses to be
53parsed. (The parameter \code{None} yields an empty list.)
Guido van Rossum87294831998-06-16 22:27:40 +000054\end{classdesc}
55
Guido van Rossum843e7121996-12-06 21:23:53 +000056\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000057Attempts to parse a date according to the rules in \rfc{822}.
58however, some mailers don't follow that format as specified, so
59\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000060\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000061\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
62the date, \function{parsedate()} returns a 9-tuple that can be passed
63directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000064returned.
65\end{funcdesc}
66
67\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000068Performs the same function as \function{parsedate()}, but returns
69either \code{None} or a 10-tuple; the first 9 elements make up a tuple
70that can be passed directly to \function{time.mktime()}, and the tenth
71is the offset of the date's timezone from UTC (which is the official
72term for Greenwich Mean Time). (Note that the sign of the timezone
73offset is the opposite of the sign of the \code{time.timezone}
74variable for the same timezone; the latter variable follows the
75\POSIX{} standard while this module follows \rfc{822}.) If the input
76string has no timezone, the last element of the tuple returned is
77\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000078\end{funcdesc}
79
Guido van Rossum8cf94e61998-02-18 05:09:14 +000080\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000081Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
82timestamp. It the timezone item in the tuple is \code{None}, assume
83local time. Minor deficiency: this first interprets the first 8
84elements as a local time and then compensates for the timezone
85difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000086switch dates. Not enough to worry about for common use.
87\end{funcdesc}
88
Guido van Rossumecde7811995-03-28 13:35:14 +000089\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000090\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000091
Fred Drakecdea8a31998-03-14 06:17:43 +000092A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000093
Fred Drakee14dde21998-04-04 06:19:30 +000094\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000095Seek to the start of the message body. This only works if the file
96object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000097\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000098
Guido van Rossum444d0f81998-06-11 13:50:02 +000099\begin{methoddesc}{isheader}{line}
100Returns a line's canonicalized fieldname (the dictionary key that will
101be used to index it) if the line is a legal RFC822 header; otherwise
102returns None (implying that parsing should stop here and the line be
103pushed back on the input stream). It is sometimes useful to override
104this method in a subclass.
105\end{methoddesc}
106
Guido van Rossum12991001998-06-10 21:34:27 +0000107\begin{methoddesc}{islast}{line}
108Return true if the given line is a delimiter on which Message should
Guido van Rossum444d0f81998-06-11 13:50:02 +0000109stop. The delimiter line is consumed, and the file object's read
110location positioned immediately after it. By default this method just
111checks that the line is blank, but you can override it in a subclass.
Guido van Rossum12991001998-06-10 21:34:27 +0000112\end{methoddesc}
113
114\begin{methoddesc}{iscomment}{line}
115Return true if the given line should be ignored entirely, just skipped.
116By default this is a stub that always returns false, but you can
117override it in a subclass.
118\end{methoddesc}
119
Fred Drakee14dde21998-04-04 06:19:30 +0000120\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000121Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000122\var{name}, if any. Each physical line, whether it is a continuation
123line or not, is a separate list item. Return the empty list if no
124header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000125\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000126
Fred Drakee14dde21998-04-04 06:19:30 +0000127\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000128Return a list of lines comprising the first header matching
129\var{name}, and its continuation line(s), if any. Return \code{None}
130if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000131\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000132
Fred Drakee14dde21998-04-04 06:19:30 +0000133\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000134Return a single string consisting of the text after the colon in the
135first header matching \var{name}. This includes leading whitespace,
136the trailing linefeed, and internal linefeeds and whitespace if there
137any continuation line(s) were present. Return \code{None} if there is
138no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000139\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000140
Guido van Rossum12991001998-06-10 21:34:27 +0000141\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000142Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000143whitespace. Internal whitespace is not stripped. The optional
144\var{default} argument can be used to specify a different default to
145be returned when there is no header matching \var{name}.
146\end{methoddesc}
147
148\begin{methoddesc}{get}{name\optional{, default}}
Fred Drake23329d41998-08-10 17:46:22 +0000149An alias for \method{getheader()}, to make the interface more compatible
Guido van Rossum12991001998-06-10 21:34:27 +0000150with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000151\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000152
Fred Drakee14dde21998-04-04 06:19:30 +0000153\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000154Return a pair \code{(\var{full name}, \var{email address})} parsed
155from the string returned by \code{getheader(\var{name})}. If no
156header matching \var{name} exists, return \code{(None, None)};
157otherwise both the full name and the address are (possibly empty)
158strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000159
Fred Drakecdea8a31998-03-14 06:17:43 +0000160Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000161\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000162\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000163\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000164If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000165\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000166exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000167\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000168
Fred Drakee14dde21998-04-04 06:19:30 +0000169\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000170This is similar to \code{getaddr(\var{list})}, but parses a header
Fred Drake23329d41998-08-10 17:46:22 +0000171containing a list of email addresses (e.g.\ a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000172returns a list of \code{(\var{full name}, \var{email address})} pairs
173(even if there was only one address in the header). If there is no
174header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000175
176XXX The current version of this function is not really correct. It
177yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000178\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000179
Fred Drakee14dde21998-04-04 06:19:30 +0000180\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000181Retrieve a header using \method{getheader()} and parse it into a 9-tuple
182compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000183\var{name}, or it is unparsable, return \code{None}.
184
185Date parsing appears to be a black art, and not all mailers adhere to
186the standard. While it has been tested and found correct on a large
187collection of email from many sources, it is still possible that this
188function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000189\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000190
Fred Drakee14dde21998-04-04 06:19:30 +0000191\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000192Retrieve a header using \method{getheader()} and parse it into a
19310-tuple; the first 9 elements will make a tuple compatible with
194\function{time.mktime()}, and the 10th is a number giving the offset
195of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000196there is no header matching \var{name}, or it is unparsable, return
197\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000198\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000199
Fred Drakecdea8a31998-03-14 06:17:43 +0000200\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000201In particular: \code{\var{m}[name]} is like
202\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
203there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000204\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
205\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
206(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000207
Fred Drakecdea8a31998-03-14 06:17:43 +0000208Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000209
Fred Drakee14dde21998-04-04 06:19:30 +0000210\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000211A list containing the entire set of header lines, in the order in
Guido van Rossum87294831998-06-16 22:27:40 +0000212which they were read (except that setitem calls may disturb this
213order). Each line contains a trailing newline. The
Guido van Rossuma12ef941995-02-27 17:53:25 +0000214blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000215\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000216
Fred Drakee14dde21998-04-04 06:19:30 +0000217\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000218The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000219\end{memberdesc}
Guido van Rossum87294831998-06-16 22:27:40 +0000220
221\subsection{AddressList Objects}
222\label{addresslist-objects}
223
224An \class{AddressList} instance has the following methods:
225
226\begin{methoddesc}{__len__}{name}
227Return the number of addresses in the address list.
228\end{methoddesc}
229
230\begin{methoddesc}{__str__}{name}
231Return a canonicalized string representation of the address list.
232Addresses are rendered in "name" <host@domain> form, comma-separated.
233\end{methoddesc}
234
235\begin{methoddesc}{__add__}{name}
236Return an AddressList instance that contains all addresses in both
237AddressList operands, with duplicates removed (set union).
238\end{methoddesc}
239
240\begin{methoddesc}{__sub__}{name}
241Return an AddressList instance that contains every address in the
242left-hand AddressList operand that is not present in the right-hand
243address operand (set difference).
244\end{methoddesc}
245
246
247Finally, \class{AddressList} instances have one public instance variable:
248
249\begin{memberdesc}{addresslist}
250A list of tuple string pairs, one per address. In each member, the
251first is the canonicalized name part of the address, the second is the
252route-address (@-separated host-domain pair).
253\end{memberdesc}