blob: 3617e8aa655801591c1772cee90e6ceb386334cb [file] [log] [blame]
Guido van Rossum470be141995-03-17 16:07:09 +00001\section{Standard Module \sectcode{rfc822}}
Guido van Rossuma12ef941995-02-27 17:53:25 +00002\stmodindex{rfc822}
3
Guido van Rossum86751151995-02-28 17:14:32 +00004\renewcommand{\indexsubitem}{(in module rfc822)}
5
Guido van Rossuma12ef941995-02-27 17:53:25 +00006This module defines a class, \code{Message}, which represents a
7collection of ``email headers'' as defined by the Internet standard
8RFC 822. It is used in various contexts, usually to read such headers
9from a file.
10
Guido van Rossum067a2ac1997-06-02 17:30:03 +000011(Note that there's a separate, currently undocumented, module to read
12Unix style mailbox files: \code{mailbox}.)
13
Guido van Rossuma12ef941995-02-27 17:53:25 +000014A \code{Message} instance is instantiated with an open file object as
Guido van Rossum067a2ac1997-06-02 17:30:03 +000015parameter. The optional \code{seekable} parameter indicates if the
16file object is seekable; the default value is 1 for true.
17Instantiation reads headers from the file up to a blank line and
18stores them in the instance; after instantiation, the file is
Guido van Rossuma12ef941995-02-27 17:53:25 +000019positioned directly after the blank line that terminates the headers.
20
21Input lines as read from the file may either be terminated by CR-LF or
22by a single linefeed; a terminating CR-LF is replaced by a single
23linefeed before the line is stored.
24
25All header matching is done independent of upper or lower case;
26e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield
27the same result.
28
Guido van Rossum843e7121996-12-06 21:23:53 +000029\begin{funcdesc}{parsedate}{date}
30Attempts to parse a date according to the rules in RFC822. however,
31some mailers don't follow that format as specified, so
32\code{parsedate()} tries to guess correctly in such cases.
33\var{date} is a string containing an RFC822 date, such as
34\code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing
35the date, \code{parsedate()} returns a 9-tuple that can be passed
36directly to \code{time.mktime()}; otherwise \code{None} will be
37returned.
38\end{funcdesc}
39
40\begin{funcdesc}{parsedate_tz}{date}
41Performs the same function as \code{parsedate}, but returns either
42\code{None} or a 10-tuple; the first 9 elements make up a tuple that
43can be passed directly to \code{time.mktime()}, and the tenth is the
44offset of the date's time zone from UTC (which is the official term
45for Greenwich Mean Time).
46\end{funcdesc}
47
Guido van Rossumecde7811995-03-28 13:35:14 +000048\subsection{Message Objects}
49
Guido van Rossuma12ef941995-02-27 17:53:25 +000050A \code{Message} instance has the following methods:
51
52\begin{funcdesc}{rewindbody}{}
53Seek to the start of the message body. This only works if the file
54object is seekable.
55\end{funcdesc}
56
57\begin{funcdesc}{getallmatchingheaders}{name}
Guido van Rossum6c4f0031995-03-07 10:14:09 +000058Return a list of lines consisting of all headers matching
Guido van Rossuma12ef941995-02-27 17:53:25 +000059\var{name}, if any. Each physical line, whether it is a continuation
60line or not, is a separate list item. Return the empty list if no
61header matches \var{name}.
62\end{funcdesc}
63
64\begin{funcdesc}{getfirstmatchingheader}{name}
65Return a list of lines comprising the first header matching
66\var{name}, and its continuation line(s), if any. Return \code{None}
67if there is no header matching \var{name}.
68\end{funcdesc}
69
70\begin{funcdesc}{getrawheader}{name}
71Return a single string consisting of the text after the colon in the
72first header matching \var{name}. This includes leading whitespace,
73the trailing linefeed, and internal linefeeds and whitespace if there
74any continuation line(s) were present. Return \code{None} if there is
75no header matching \var{name}.
76\end{funcdesc}
77
78\begin{funcdesc}{getheader}{name}
79Like \code{getrawheader(\var{name})}, but strip leading and trailing
80whitespace (but not internal whitespace).
81\end{funcdesc}
82
83\begin{funcdesc}{getaddr}{name}
84Return a pair (full name, email address) parsed from the string
85returned by \code{getheader(\var{name})}. If no header matching
86\var{name} exists, return \code{None, None}; otherwise both the full
87name and the address are (possibly empty )strings.
88
Guido van Rossum470be141995-03-17 16:07:09 +000089Example: If \code{m}'s first \code{From} header contains the string\\
90\code{'jack@cwi.nl (Jack Jansen)'}, then
Guido van Rossuma12ef941995-02-27 17:53:25 +000091\code{m.getaddr('From')} will yield the pair
Guido van Rossum470be141995-03-17 16:07:09 +000092\code{('Jack Jansen', 'jack@cwi.nl')}.
Guido van Rossuma12ef941995-02-27 17:53:25 +000093If the header contained
Guido van Rossum470be141995-03-17 16:07:09 +000094\code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the
Guido van Rossuma12ef941995-02-27 17:53:25 +000095exact same result.
96\end{funcdesc}
97
98\begin{funcdesc}{getaddrlist}{name}
99This is similar to \code{getaddr(\var{list})}, but parses a header
100containing a list of email addresses (e.g. a \code{To} header) and
101returns a list of (full name, email address) pairs (even if there was
102only one address in the header). If there is no header matching
103\var{name}, return an empty list.
104
105XXX The current version of this function is not really correct. It
106yields bogus results if a full name contains a comma.
107\end{funcdesc}
108
109\begin{funcdesc}{getdate}{name}
110Retrieve a header using \code{getheader} and parse it into a 9-tuple
Guido van Rossum6c4f0031995-03-07 10:14:09 +0000111compatible with \code{time.mktime()}. If there is no header matching
Guido van Rossuma12ef941995-02-27 17:53:25 +0000112\var{name}, or it is unparsable, return \code{None}.
113
114Date parsing appears to be a black art, and not all mailers adhere to
115the standard. While it has been tested and found correct on a large
116collection of email from many sources, it is still possible that this
117function may occasionally yield an incorrect result.
118\end{funcdesc}
119
Guido van Rossum843e7121996-12-06 21:23:53 +0000120\begin{funcdesc}{getdate_tz}{name}
121Retrieve a header using \code{getheader} and parse it into a 10-tuple;
122the first 9 elements will make a tuple compatible with
123\code{time.mktime()}, and the 10th is a number giving the offset of
124the date's time zone from UTC. Similarly to \code{getdate()}, if
125there is no header matching \var{name}, or it is unparsable, return
126\code{None}.
127\end{funcdesc}
128
Guido van Rossuma12ef941995-02-27 17:53:25 +0000129\code{Message} instances also support a read-only mapping interface.
130In particular: \code{m[name]} is the same as \code{m.getheader(name)};
131and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()},
132\code{m.values()} and \code{m.items()} act as expected (and
133consistently).
134
135Finally, \code{Message} instances have two public instance variables:
136
137\begin{datadesc}{headers}
138A list containing the entire set of header lines, in the order in
139which they were read. Each line contains a trailing newline. The
140blank line terminating the headers is not contained in the list.
141\end{datadesc}
142
143\begin{datadesc}{fp}
144The file object passed at instantiation time.
145\end{datadesc}