blob: e9317d2f849623edff19ad2d8fafe06e3efb0418 [file] [log] [blame]
Fred Drake3a0351c1998-04-04 07:23:21 +00001\section{Standard Module \module{rfc822}}
Fred Drakeb91e9341998-07-23 17:59:49 +00002\declaremodule{standard}{rfc822}
Guido van Rossuma12ef941995-02-27 17:53:25 +00003
Fred Drakeb91e9341998-07-23 17:59:49 +00004\modulesynopsis{Parse \rfc{822} style mail headers.}
5
Fred Drakecdea8a31998-03-14 06:17:43 +00006This module defines a class, \class{Message}, which represents a
Guido van Rossuma12ef941995-02-27 17:53:25 +00007collection of ``email headers'' as defined by the Internet standard
Fred Drakec5891241998-02-09 19:16:20 +00008\rfc{822}. It is used in various contexts, usually to read such
Guido van Rossum87294831998-06-16 22:27:40 +00009headers from a file. This module also defines a helper class
10\class{AddressList} for parsing RFC822 addresses.
Guido van Rossuma12ef941995-02-27 17:53:25 +000011
Fred Drake5ca90331997-12-16 15:19:47 +000012Note that there's a separate module to read \UNIX{}, MH, and MMDF
Fred Drakecdea8a31998-03-14 06:17:43 +000013style mailbox files: \module{mailbox}\refstmodindex{mailbox}.
Guido van Rossum067a2ac1997-06-02 17:30:03 +000014
Fred Drakecdea8a31998-03-14 06:17:43 +000015\begin{classdesc}{Message}{file\optional{, seekable}}
Guido van Rossum12991001998-06-10 21:34:27 +000016A \class{Message} instance is instantiated with an input object as
17parameter. Message relies only on the input object having a
Fred Drake23329d41998-08-10 17:46:22 +000018\method{readline()} method; in particular, ordinary file objects
19qualify. Instantiation reads headers from the input object up to a
20delimiter line (normally a blank line) and stores them in the
21instance.
Guido van Rossum12991001998-06-10 21:34:27 +000022
Fred Drake23329d41998-08-10 17:46:22 +000023This class can work with any input object that supports a
24\method{readline()} method. If the input object has seek and tell
25capability, the \method{rewindbody()} method will work; also, illegal
26lines will be pushed back onto the input stream. If the input object
27lacks seek but has an \method{unread()} method that can push back a
28line of input, \class{Message} will use that to push back illegal
29lines. Thus this class can be used to parse messages coming from a
30buffered stream.
Guido van Rossum12991001998-06-10 21:34:27 +000031
Fred Drake23329d41998-08-10 17:46:22 +000032The optional \var{seekable} argument is provided as a workaround for
33certain stdio libraries in which \cfunction{tell()} discards buffered
34data before discovering that the \cfunction{lseek()} system call
35doesn't work. For maximum portability, you should set the seekable
36argument to zero to prevent that initial \method{tell()} when passing
37in an unseekable object such as a a file object created from a socket
38object.
Guido van Rossuma12ef941995-02-27 17:53:25 +000039
40Input lines as read from the file may either be terminated by CR-LF or
41by a single linefeed; a terminating CR-LF is replaced by a single
42linefeed before the line is stored.
43
44All header matching is done independent of upper or lower case;
Fred Drake23329d41998-08-10 17:46:22 +000045e.g.\ \code{\var{m}['From']}, \code{\var{m}['from']} and
Fred Drakecdea8a31998-03-14 06:17:43 +000046\code{\var{m}['FROM']} all yield the same result.
47\end{classdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000048
Guido van Rossum87294831998-06-16 22:27:40 +000049\begin{classdesc}{AddressList}{field}
50You may instantiate the AddresssList helper class using a single
Fred Drake23329d41998-08-10 17:46:22 +000051string parameter, a comma-separated list of \rfc{822} addresses to be
52parsed. (The parameter \code{None} yields an empty list.)
Guido van Rossum87294831998-06-16 22:27:40 +000053\end{classdesc}
54
Guido van Rossum843e7121996-12-06 21:23:53 +000055\begin{funcdesc}{parsedate}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000056Attempts to parse a date according to the rules in \rfc{822}.
57however, some mailers don't follow that format as specified, so
58\function{parsedate()} tries to guess correctly in such cases.
Fred Drakec5891241998-02-09 19:16:20 +000059\var{date} is a string containing an \rfc{822} date, such as
Fred Drakecdea8a31998-03-14 06:17:43 +000060\code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing
61the date, \function{parsedate()} returns a 9-tuple that can be passed
62directly to \function{time.mktime()}; otherwise \code{None} will be
Guido van Rossum843e7121996-12-06 21:23:53 +000063returned.
64\end{funcdesc}
65
66\begin{funcdesc}{parsedate_tz}{date}
Fred Drakecdea8a31998-03-14 06:17:43 +000067Performs the same function as \function{parsedate()}, but returns
68either \code{None} or a 10-tuple; the first 9 elements make up a tuple
69that can be passed directly to \function{time.mktime()}, and the tenth
70is the offset of the date's timezone from UTC (which is the official
71term for Greenwich Mean Time). (Note that the sign of the timezone
72offset is the opposite of the sign of the \code{time.timezone}
73variable for the same timezone; the latter variable follows the
74\POSIX{} standard while this module follows \rfc{822}.) If the input
75string has no timezone, the last element of the tuple returned is
76\code{None}.
Guido van Rossum843e7121996-12-06 21:23:53 +000077\end{funcdesc}
78
Guido van Rossum8cf94e61998-02-18 05:09:14 +000079\begin{funcdesc}{mktime_tz}{tuple}
Fred Drakecdea8a31998-03-14 06:17:43 +000080Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC
81timestamp. It the timezone item in the tuple is \code{None}, assume
82local time. Minor deficiency: this first interprets the first 8
83elements as a local time and then compensates for the timezone
84difference; this may yield a slight error around daylight savings time
Guido van Rossum8cf94e61998-02-18 05:09:14 +000085switch dates. Not enough to worry about for common use.
86\end{funcdesc}
87
Guido van Rossumecde7811995-03-28 13:35:14 +000088\subsection{Message Objects}
Fred Drakee14dde21998-04-04 06:19:30 +000089\label{message-objects}
Guido van Rossumecde7811995-03-28 13:35:14 +000090
Fred Drakecdea8a31998-03-14 06:17:43 +000091A \class{Message} instance has the following methods:
Guido van Rossuma12ef941995-02-27 17:53:25 +000092
Fred Drakee14dde21998-04-04 06:19:30 +000093\begin{methoddesc}{rewindbody}{}
Guido van Rossuma12ef941995-02-27 17:53:25 +000094Seek to the start of the message body. This only works if the file
95object is seekable.
Fred Drakee14dde21998-04-04 06:19:30 +000096\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +000097
Guido van Rossum444d0f81998-06-11 13:50:02 +000098\begin{methoddesc}{isheader}{line}
99Returns a line's canonicalized fieldname (the dictionary key that will
100be used to index it) if the line is a legal RFC822 header; otherwise
101returns None (implying that parsing should stop here and the line be
102pushed back on the input stream). It is sometimes useful to override
103this method in a subclass.
104\end{methoddesc}
105
Guido van Rossum12991001998-06-10 21:34:27 +0000106\begin{methoddesc}{islast}{line}
107Return true if the given line is a delimiter on which Message should
Guido van Rossum444d0f81998-06-11 13:50:02 +0000108stop. The delimiter line is consumed, and the file object's read
109location positioned immediately after it. By default this method just
110checks that the line is blank, but you can override it in a subclass.
Guido van Rossum12991001998-06-10 21:34:27 +0000111\end{methoddesc}
112
113\begin{methoddesc}{iscomment}{line}
114Return true if the given line should be ignored entirely, just skipped.
115By default this is a stub that always returns false, but you can
116override it in a subclass.
117\end{methoddesc}
118
Fred Drakee14dde21998-04-04 06:19:30 +0000119\begin{methoddesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000120Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000121\var{name}, if any. Each physical line, whether it is a continuation
122line or not, is a separate list item. Return the empty list if no
123header matches \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000124\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000125
Fred Drakee14dde21998-04-04 06:19:30 +0000126\begin{methoddesc}{getfirstmatchingheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000127Return a list of lines comprising the first header matching
128\var{name}, and its continuation line(s), if any. Return \code{None}
129if there is no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000130\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000131
Fred Drakee14dde21998-04-04 06:19:30 +0000132\begin{methoddesc}{getrawheader}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000133Return a single string consisting of the text after the colon in the
134first header matching \var{name}. This includes leading whitespace,
135the trailing linefeed, and internal linefeeds and whitespace if there
136any continuation line(s) were present. Return \code{None} if there is
137no header matching \var{name}.
Fred Drakee14dde21998-04-04 06:19:30 +0000138\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000139
Guido van Rossum12991001998-06-10 21:34:27 +0000140\begin{methoddesc}{getheader}{name\optional{, default}}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000141Like \code{getrawheader(\var{name})}, but strip leading and trailing
Guido van Rossum12991001998-06-10 21:34:27 +0000142whitespace. Internal whitespace is not stripped. The optional
143\var{default} argument can be used to specify a different default to
144be returned when there is no header matching \var{name}.
145\end{methoddesc}
146
147\begin{methoddesc}{get}{name\optional{, default}}
Fred Drake23329d41998-08-10 17:46:22 +0000148An alias for \method{getheader()}, to make the interface more compatible
Guido van Rossum12991001998-06-10 21:34:27 +0000149with regular dictionaries.
Fred Drakee14dde21998-04-04 06:19:30 +0000150\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000151
Fred Drakee14dde21998-04-04 06:19:30 +0000152\begin{methoddesc}{getaddr}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000153Return a pair \code{(\var{full name}, \var{email address})} parsed
154from the string returned by \code{getheader(\var{name})}. If no
155header matching \var{name} exists, return \code{(None, None)};
156otherwise both the full name and the address are (possibly empty)
157strings.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000158
Fred Drakecdea8a31998-03-14 06:17:43 +0000159Example: If \var{m}'s first \code{From} header contains the string
Guido van Rossum470be141995-03-17 16:07:09 +0000160\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +0000161\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +0000162\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000163If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +0000164\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +0000165exact same result.
Fred Drakee14dde21998-04-04 06:19:30 +0000166\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000167
Fred Drakee14dde21998-04-04 06:19:30 +0000168\begin{methoddesc}{getaddrlist}{name}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000169This is similar to \code{getaddr(\var{list})}, but parses a header
Fred Drake23329d41998-08-10 17:46:22 +0000170containing a list of email addresses (e.g.\ a \code{To} header) and
Fred Drakecdea8a31998-03-14 06:17:43 +0000171returns a list of \code{(\var{full name}, \var{email address})} pairs
172(even if there was only one address in the header). If there is no
173header matching \var{name}, return an empty list.
Guido van Rossuma12ef941995-02-27 17:53:25 +0000174
175XXX The current version of this function is not really correct. It
176yields bogus results if a full name contains a comma.
Fred Drakee14dde21998-04-04 06:19:30 +0000177\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000178
Fred Drakee14dde21998-04-04 06:19:30 +0000179\begin{methoddesc}{getdate}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000180Retrieve a header using \method{getheader()} and parse it into a 9-tuple
181compatible with \function{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000182\var{name}, or it is unparsable, return \code{None}.
183
184Date parsing appears to be a black art, and not all mailers adhere to
185the standard. While it has been tested and found correct on a large
186collection of email from many sources, it is still possible that this
187function may occasionally yield an incorrect result.
Fred Drakee14dde21998-04-04 06:19:30 +0000188\end{methoddesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000189
Fred Drakee14dde21998-04-04 06:19:30 +0000190\begin{methoddesc}{getdate_tz}{name}
Fred Drakecdea8a31998-03-14 06:17:43 +0000191Retrieve a header using \method{getheader()} and parse it into a
19210-tuple; the first 9 elements will make a tuple compatible with
193\function{time.mktime()}, and the 10th is a number giving the offset
194of the date's timezone from UTC. Similarly to \method{getdate()}, if
Guido van Rossum843e7121996-12-06 21:23:53 +0000195there is no header matching \var{name}, or it is unparsable, return
196\code{None}.
Fred Drakee14dde21998-04-04 06:19:30 +0000197\end{methoddesc}
Guido van Rossum843e7121996-12-06 21:23:53 +0000198
Fred Drakecdea8a31998-03-14 06:17:43 +0000199\class{Message} instances also support a read-only mapping interface.
Fred Drakee14dde21998-04-04 06:19:30 +0000200In particular: \code{\var{m}[name]} is like
201\code{\var{m}.getheader(name)} but raises \exception{KeyError} if
202there is no matching header; and \code{len(\var{m})},
Fred Drakecdea8a31998-03-14 06:17:43 +0000203\code{\var{m}.has_key(name)}, \code{\var{m}.keys()},
204\code{\var{m}.values()} and \code{\var{m}.items()} act as expected
205(and consistently).
Guido van Rossuma12ef941995-02-27 17:53:25 +0000206
Fred Drakecdea8a31998-03-14 06:17:43 +0000207Finally, \class{Message} instances have two public instance variables:
Guido van Rossuma12ef941995-02-27 17:53:25 +0000208
Fred Drakee14dde21998-04-04 06:19:30 +0000209\begin{memberdesc}{headers}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000210A list containing the entire set of header lines, in the order in
Guido van Rossum87294831998-06-16 22:27:40 +0000211which they were read (except that setitem calls may disturb this
212order). Each line contains a trailing newline. The
Guido van Rossuma12ef941995-02-27 17:53:25 +0000213blank line terminating the headers is not contained in the list.
Fred Drakee14dde21998-04-04 06:19:30 +0000214\end{memberdesc}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000215
Fred Drakee14dde21998-04-04 06:19:30 +0000216\begin{memberdesc}{fp}
Guido van Rossuma12ef941995-02-27 17:53:25 +0000217The file object passed at instantiation time.
Fred Drakee14dde21998-04-04 06:19:30 +0000218\end{memberdesc}
Guido van Rossum87294831998-06-16 22:27:40 +0000219
220\subsection{AddressList Objects}
221\label{addresslist-objects}
222
223An \class{AddressList} instance has the following methods:
224
225\begin{methoddesc}{__len__}{name}
226Return the number of addresses in the address list.
227\end{methoddesc}
228
229\begin{methoddesc}{__str__}{name}
230Return a canonicalized string representation of the address list.
231Addresses are rendered in "name" <host@domain> form, comma-separated.
232\end{methoddesc}
233
234\begin{methoddesc}{__add__}{name}
235Return an AddressList instance that contains all addresses in both
236AddressList operands, with duplicates removed (set union).
237\end{methoddesc}
238
239\begin{methoddesc}{__sub__}{name}
240Return an AddressList instance that contains every address in the
241left-hand AddressList operand that is not present in the right-hand
242address operand (set difference).
243\end{methoddesc}
244
245
246Finally, \class{AddressList} instances have one public instance variable:
247
248\begin{memberdesc}{addresslist}
249A list of tuple string pairs, one per address. In each member, the
250first is the canonicalized name part of the address, the second is the
251route-address (@-separated host-domain pair).
252\end{memberdesc}