blob: 56f6c7445769c0413a02619a716e98475c5d717c [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{rfc822} ---
Fred Drakeffbe6871999-04-22 21:23:22 +00002 Parse RFC 822 mail headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003
Fred Drakeffbe6871999-04-22 21:23:22 +00004\declaremodule{standard}{rfc822}
Fred Drakeb91e9341998-07-23 17:59:49 +00005\modulesynopsis{Parse \rfc{822} style mail headers.}
6
Fred Drakecdea8a31998-03-14 06:17:43 +00007This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00008collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00009\rfc{822}. It is used in various contexts, usually to read such
Guido van Rossum87294831998-06-16 22:27:40 +000010headers from a file. This module also defines a helper class
Fred Drake38e5d272000-04-03 20:13:55 +000011\class{AddressList} for parsing \rfc{822} addresses. Please refer to
12the RFC for information on the specific syntax of \rfc{822} headers.
Guido van Rossuma12ef941995-02-27 17:53:25 +000013
Fred Drake38e5d272000-04-03 20:13:55 +000014The \refmodule{mailbox}\refstmodindex{mailbox} module provides classes
15to read mailboxes produced by various end-user mail programs.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000016
Fred Drakecdea8a31998-03-14 06:17:43 +000017\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000018A \class{Message} instance is instantiated with an input object as
19parameter. Message relies only on the input object having a
Fred Drake23329d41998-08-10 17:46:22 +000020\method{readline()} method; in particular, ordinary file objects
21qualify. Instantiation reads headers from the input object up to a
22delimiter line (normally a blank line) and stores them in the
Eric S. Raymonde7213c72001-01-27 10:56:14 +000023instance. The message body, following the headers, is not consumed.
Guido van Rossum12991001998-06-10 21:34:27 +000024
Fred Drake23329d41998-08-10 17:46:22 +000025This class can work with any input object that supports a
26\method{readline()} method. If the input object has seek and tell
27capability, the \method{rewindbody()} method will work; also, illegal
28lines will be pushed back onto the input stream. If the input object
29lacks seek but has an \method{unread()} method that can push back a
30line of input, \class{Message} will use that to push back illegal
31lines. Thus this class can be used to parse messages coming from a
32buffered stream.
Guido van Rossum12991001998-06-10 21:34:27 +000033
Fred Drake23329d41998-08-10 17:46:22 +000034The optional \var{seekable} argument is provided as a workaround for
35certain stdio libraries in which \cfunction{tell()} discards buffered
36data before discovering that the \cfunction{lseek()} system call
37doesn't work. For maximum portability, you should set the seekable
38argument to zero to prevent that initial \method{tell()} when passing
39in an unseekable object such as a a file object created from a socket
40object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000041
42Input lines as read from the file may either be terminated by CR-LF or
43by a single linefeed; a terminating CR-LF is replaced by a single
44linefeed before the line is stored.
45
46All header matching is done independent of upper or lower case;
Fred Drake23329d41998-08-10 17:46:22 +000047e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
Fred Drakecdea8a31998-03-14 06:17:43 +000048\code{\var{m}['FROM']} all yield the same result.
49\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000050
Guido van Rossum87294831998-06-16 22:27:40 +000051\begin{classdesc}{AddressList}{field}
Fred Drakeae0f2921999-06-10 15:03:07 +000052You may instantiate the \class{AddressList} helper class using a single
Fred Drake23329d41998-08-10 17:46:22 +000053string parameter, a comma-separated list of \rfc{822} addresses to be
54parsed. (The parameter \code{None} yields an empty list.)
Guido van Rossum87294831998-06-16 22:27:40 +000055\end{classdesc}
56
Guido van Rossum843e7121996-12-06 21:23:53 +000057\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000058Attempts to parse a date according to the rules in \rfc{822}.
59however, some mailers don't follow that format as specified, so
60\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000061\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000062\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
63the date, \function{parsedate()} returns a 9-tuple that can be passed
64directly to \function{time.mktime()}; otherwise \code{None} will be
Fred Drake38e5d272000-04-03 20:13:55 +000065returned. Note that fields 6, 7, and 8 of the result tuple are not
66usable.
Guido van Rossum843e7121996-12-06 21:23:53 +000067\end{funcdesc}
68
69\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000070Performs the same function as \function{parsedate()}, but returns
71either \code{None} or a 10-tuple; the first 9 elements make up a tuple
72that can be passed directly to \function{time.mktime()}, and the tenth
73is the offset of the date's timezone from UTC (which is the official
74term for Greenwich Mean Time). (Note that the sign of the timezone
75offset is the opposite of the sign of the \code{time.timezone}
76variable for the same timezone; the latter variable follows the
77\POSIX{} standard while this module follows \rfc{822}.) If the input
78string has no timezone, the last element of the tuple returned is
Fred Drake38e5d272000-04-03 20:13:55 +000079\code{None}. Note that fields 6, 7, and 8 of the result tuple are not
80usable.
Guido van Rossum843e7121996-12-06 21:23:53 +000081\end{funcdesc}
82
Guido van Rossum8cf94e61998-02-18 05:09:14 +000083\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000084Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
85timestamp. It the timezone item in the tuple is \code{None}, assume
86local time. Minor deficiency: this first interprets the first 8
87elements as a local time and then compensates for the timezone
88difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000089switch dates. Not enough to worry about for common use.
90\end{funcdesc}
91
Fred Drakeea002051999-04-28 18:11:09 +000092
Fred Drake38e5d272000-04-03 20:13:55 +000093\begin{seealso}
94 \seemodule{mailbox}{Classes to read various mailbox formats produced
95 by end-user mail programs.}
Skip Montanaro6634b142000-09-15 18:20:20 +000096 \seemodule{mimetools}{Subclass of rfc.Message that handles MIME encoded
97 messages.}
Fred Drake38e5d272000-04-03 20:13:55 +000098\end{seealso}
99
100
Fred Drakeea002051999-04-28 18:11:09 +0000101\subsection{Message Objects \label{message-objects}}
Guido van Rossumecde7811995-03-28 13:35:14 +0000102
Fred Drakecdea8a31998-03-14 06:17:43 +0000103A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000104
Fred Drakee14dde21998-04-04 06:19:30 +0000105\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000106Seek to the start of the message body. This only works if the file
107object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +0000108\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000109
Guido van Rossum444d0f81998-06-11 13:50:02 +0000110\begin{methoddesc}{isheader}{line}
111Returns a line's canonicalized fieldname (the dictionary key that will
Fred Drakeea002051999-04-28 18:11:09 +0000112be used to index it) if the line is a legal \rfc{822} header; otherwise
Guido van Rossum444d0f81998-06-11 13:50:02 +0000113returns None (implying that parsing should stop here and the line be
114pushed back on the input stream). It is sometimes useful to override
115this method in a subclass.
116\end{methoddesc}
117
Guido van Rossum12991001998-06-10 21:34:27 +0000118\begin{methoddesc}{islast}{line}
119Return true if the given line is a delimiter on which Message should
Guido van Rossum444d0f81998-06-11 13:50:02 +0000120stop. The delimiter line is consumed, and the file object's read
121location positioned immediately after it. By default this method just
122checks that the line is blank, but you can override it in a subclass.
Guido van Rossum12991001998-06-10 21:34:27 +0000123\end{methoddesc}
124
125\begin{methoddesc}{iscomment}{line}
126Return true if the given line should be ignored entirely, just skipped.
127By default this is a stub that always returns false, but you can
128override it in a subclass.
129\end{methoddesc}
130
Fred Drakee14dde21998-04-04 06:19:30 +0000131\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000132Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000133\var{name}, if any. Each physical line, whether it is a continuation
134line or not, is a separate list item. Return the empty list if no
135header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000136\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000137
Fred Drakee14dde21998-04-04 06:19:30 +0000138\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000139Return a list of lines comprising the first header matching
Fred Drakeea002051999-04-28 18:11:09 +0000140\var{name}, and its continuation line(s), if any. Return
141\code{None} if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000142\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000143
Fred Drakee14dde21998-04-04 06:19:30 +0000144\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000145Return a single string consisting of the text after the colon in the
146first header matching \var{name}. This includes leading whitespace,
147the trailing linefeed, and internal linefeeds and whitespace if there
148any continuation line(s) were present. Return \code{None} if there is
149no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000150\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000151
Guido van Rossum12991001998-06-10 21:34:27 +0000152\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000153Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000154whitespace. Internal whitespace is not stripped. The optional
155\var{default} argument can be used to specify a different default to
156be returned when there is no header matching \var{name}.
157\end{methoddesc}
158
159\begin{methoddesc}{get}{name\optional{, default}}
Fred Drake23329d41998-08-10 17:46:22 +0000160An alias for \method{getheader()}, to make the interface more compatible
Guido van Rossum12991001998-06-10 21:34:27 +0000161with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000162\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000163
Fred Drakee14dde21998-04-04 06:19:30 +0000164\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000165Return a pair \code{(\var{full name}, \var{email address})} parsed
166from the string returned by \code{getheader(\var{name})}. If no
167header matching \var{name} exists, return \code{(None, None)};
168otherwise both the full name and the address are (possibly empty)
169strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000170
Fred Drakecdea8a31998-03-14 06:17:43 +0000171Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000172\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000173\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000174\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000175If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000176\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000177exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000178\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000179
Fred Drakee14dde21998-04-04 06:19:30 +0000180\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000181This is similar to \code{getaddr(\var{list})}, but parses a header
Fred Drake23329d41998-08-10 17:46:22 +0000182containing a list of email addresses (e.g.\ a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000183returns a list of \code{(\var{full name}, \var{email address})} pairs
184(even if there was only one address in the header). If there is no
185header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000186
Barry Warsaw53610ca1999-01-14 21:26:54 +0000187If multiple headers exist that match the named header (e.g. if there
188are several \code{Cc} headers), all are parsed for addresses. Any
189continuation lines the named headers contain are also parsed.
Fred Drakee14dde21998-04-04 06:19:30 +0000190\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000191
Fred Drakee14dde21998-04-04 06:19:30 +0000192\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000193Retrieve a header using \method{getheader()} and parse it into a 9-tuple
Fred Drake38e5d272000-04-03 20:13:55 +0000194compatible with \function{time.mktime()}; note that fields 6, 7, and 8
195are not usable. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000196\var{name}, or it is unparsable, return \code{None}.
197
198Date parsing appears to be a black art, and not all mailers adhere to
199the standard. While it has been tested and found correct on a large
200collection of email from many sources, it is still possible that this
201function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000202\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000203
Fred Drakee14dde21998-04-04 06:19:30 +0000204\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000205Retrieve a header using \method{getheader()} and parse it into a
20610-tuple; the first 9 elements will make a tuple compatible with
207\function{time.mktime()}, and the 10th is a number giving the offset
Fred Drake38e5d272000-04-03 20:13:55 +0000208of the date's timezone from UTC. Note that fields 6, 7, and 8
209are not usable. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000210there is no header matching \var{name}, or it is unparsable, return
211\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000212\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000213
Fred Drake70631492001-05-22 14:36:30 +0000214\class{Message} instances also support a limited mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000215In particular: \code{\var{m}[name]} is like
216\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
217there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000218\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
219\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
Fred Drake70631492001-05-22 14:36:30 +0000220(and consistently). \class{Message} instances also support the
221mapping writable interface \code{\var{m}[name] = value} and \code{del
222\var{m}[name]}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000223
Fred Drakecdea8a31998-03-14 06:17:43 +0000224Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000225
Fred Drakee14dde21998-04-04 06:19:30 +0000226\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000227A list containing the entire set of header lines, in the order in
Guido van Rossum87294831998-06-16 22:27:40 +0000228which they were read (except that setitem calls may disturb this
229order). Each line contains a trailing newline. The
Guido van Rossuma12ef941995-02-27 17:53:25 +0000230blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000231\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000232
Fred Drakee14dde21998-04-04 06:19:30 +0000233\begin{memberdesc}{fp}
Fred Drakeea002051999-04-28 18:11:09 +0000234The file or file-like object passed at instantiation time. This can
235be used to read the message content.
Fred Drakee14dde21998-04-04 06:19:30 +0000236\end{memberdesc}
Guido van Rossum87294831998-06-16 22:27:40 +0000237
Fred Drakeea002051999-04-28 18:11:09 +0000238
239\subsection{AddressList Objects \label{addresslist-objects}}
Guido van Rossum87294831998-06-16 22:27:40 +0000240
241An \class{AddressList} instance has the following methods:
242
Fred Drake9c846362001-04-09 15:42:56 +0000243\begin{methoddesc}{__len__}{}
Guido van Rossum87294831998-06-16 22:27:40 +0000244Return the number of addresses in the address list.
245\end{methoddesc}
246
Fred Drake9c846362001-04-09 15:42:56 +0000247\begin{methoddesc}{__str__}{}
Guido van Rossum87294831998-06-16 22:27:40 +0000248Return a canonicalized string representation of the address list.
249Addresses are rendered in "name" <host@domain> form, comma-separated.
250\end{methoddesc}
251
Fred Drake9c846362001-04-09 15:42:56 +0000252\begin{methoddesc}{__add__}{alist}
253Return a new \class{AddressList} instance that contains all addresses
254in both \class{AddressList} operands, with duplicates removed (set
255union).
Guido van Rossum87294831998-06-16 22:27:40 +0000256\end{methoddesc}
257
Fred Drake9c846362001-04-09 15:42:56 +0000258\begin{methoddesc}{__iadd__}{alist}
259In-place version of \method{__add__()}; turns this \class{AddressList}
260instance into the union of itself and the right-hand instance,
261\var{alist}.
262\end{methoddesc}
263
264\begin{methoddesc}{__sub__}{alist}
265Return a new \class{AddressList} instance that contains every address
266in the left-hand \class{AddressList} operand that is not present in
267the right-hand address operand (set difference).
268\end{methoddesc}
269
270\begin{methoddesc}{__isub__}{alist}
271In-place version of \method{__sub__()}, removing addresses in this
272list which are also in \var{alist}.
Guido van Rossum87294831998-06-16 22:27:40 +0000273\end{methoddesc}
274
275
276Finally, \class{AddressList} instances have one public instance variable:
277
278\begin{memberdesc}{addresslist}
279A list of tuple string pairs, one per address. In each member, the
Eric S. Raymonde7213c72001-01-27 10:56:14 +0000280first is the canonicalized name part, the second is the
Fred Drake9c846362001-04-09 15:42:56 +0000281actual route-address (\character{@}-separated username-host.domain
282pair).
Guido van Rossum87294831998-06-16 22:27:40 +0000283\end{memberdesc}