Fred Drake | 3a0351c | 1998-04-04 07:23:21 +0000 | [diff] [blame] | 1 | \section{Standard Module \module{rfc822}} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 2 | \label{module-rfc822} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 3 | \stmodindex{rfc822} |
| 4 | |
Guido van Rossum | 8675115 | 1995-02-28 17:14:32 +0000 | [diff] [blame] | 5 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 6 | This module defines a class, \class{Message}, which represents a |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 7 | collection of ``email headers'' as defined by the Internet standard |
Fred Drake | c589124 | 1998-02-09 19:16:20 +0000 | [diff] [blame] | 8 | \rfc{822}. It is used in various contexts, usually to read such |
Guido van Rossum | 8729483 | 1998-06-16 22:27:40 +0000 | [diff] [blame] | 9 | headers from a file. This module also defines a helper class |
| 10 | \class{AddressList} for parsing RFC822 addresses. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 11 | |
Fred Drake | 5ca9033 | 1997-12-16 15:19:47 +0000 | [diff] [blame] | 12 | Note that there's a separate module to read \UNIX{}, MH, and MMDF |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 13 | style mailbox files: \module{mailbox}\refstmodindex{mailbox}. |
Guido van Rossum | 067a2ac | 1997-06-02 17:30:03 +0000 | [diff] [blame] | 14 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 15 | \begin{classdesc}{Message}{file\optional{, seekable}} |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 16 | A \class{Message} instance is instantiated with an input object as |
| 17 | parameter. Message relies only on the input object having a |
| 18 | \code{readline} method; in particular, ordinary file objects qualify. |
| 19 | Instantiation reads headers from the input object up to a delimiter |
| 20 | line (normally a blank line) and stores them in the instance. |
| 21 | |
Guido van Rossum | 444d0f8 | 1998-06-11 13:50:02 +0000 | [diff] [blame] | 22 | This class can work with any input object that supports a readline |
| 23 | method. If the input object has seek and tell capability, the |
| 24 | \code{rewindbody} method will work; also, illegal lines will be pushed back |
| 25 | onto the input stream. If the input object lacks seek but has an |
| 26 | \code{unread} method that can push back a line of input, Message will use |
| 27 | that to push back illegal lines. Thus this class can be used to parse |
| 28 | messages coming from a buffered stream. |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 29 | |
| 30 | The optional \code{seekable} argument is provided as a workaround for |
| 31 | certain stdio libraries in which tell() discards buffered data before |
| 32 | discovering that the \code{lseek()} system call doesn't work. For |
| 33 | maximum portability, you should set the seekable argument to zero to |
| 34 | prevent that initial \code{tell} when passing in an unseekable object |
| 35 | such as a a file object created from a socket object. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 36 | |
| 37 | Input lines as read from the file may either be terminated by CR-LF or |
| 38 | by a single linefeed; a terminating CR-LF is replaced by a single |
| 39 | linefeed before the line is stored. |
| 40 | |
| 41 | All header matching is done independent of upper or lower case; |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 42 | e.g. \code{\var{m}['From']}, \code{\var{m}['from']} and |
| 43 | \code{\var{m}['FROM']} all yield the same result. |
| 44 | \end{classdesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 45 | |
Guido van Rossum | 8729483 | 1998-06-16 22:27:40 +0000 | [diff] [blame] | 46 | \begin{classdesc}{AddressList}{field} |
| 47 | You may instantiate the AddresssList helper class using a single |
| 48 | string parameter, a comma-separated list of RFC822 addresses to be |
| 49 | parsed. (The parameter None yields an empty list.) |
| 50 | \end{classdesc} |
| 51 | |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 52 | \begin{funcdesc}{parsedate}{date} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 53 | Attempts to parse a date according to the rules in \rfc{822}. |
| 54 | however, some mailers don't follow that format as specified, so |
| 55 | \function{parsedate()} tries to guess correctly in such cases. |
Fred Drake | c589124 | 1998-02-09 19:16:20 +0000 | [diff] [blame] | 56 | \var{date} is a string containing an \rfc{822} date, such as |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 57 | \code{'Mon, 20 Nov 1995 19:12:08 -0500'}. If it succeeds in parsing |
| 58 | the date, \function{parsedate()} returns a 9-tuple that can be passed |
| 59 | directly to \function{time.mktime()}; otherwise \code{None} will be |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 60 | returned. |
| 61 | \end{funcdesc} |
| 62 | |
| 63 | \begin{funcdesc}{parsedate_tz}{date} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 64 | Performs the same function as \function{parsedate()}, but returns |
| 65 | either \code{None} or a 10-tuple; the first 9 elements make up a tuple |
| 66 | that can be passed directly to \function{time.mktime()}, and the tenth |
| 67 | is the offset of the date's timezone from UTC (which is the official |
| 68 | term for Greenwich Mean Time). (Note that the sign of the timezone |
| 69 | offset is the opposite of the sign of the \code{time.timezone} |
| 70 | variable for the same timezone; the latter variable follows the |
| 71 | \POSIX{} standard while this module follows \rfc{822}.) If the input |
| 72 | string has no timezone, the last element of the tuple returned is |
| 73 | \code{None}. |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 74 | \end{funcdesc} |
| 75 | |
Guido van Rossum | 8cf94e6 | 1998-02-18 05:09:14 +0000 | [diff] [blame] | 76 | \begin{funcdesc}{mktime_tz}{tuple} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 77 | Turn a 10-tuple as returned by \function{parsedate_tz()} into a UTC |
| 78 | timestamp. It the timezone item in the tuple is \code{None}, assume |
| 79 | local time. Minor deficiency: this first interprets the first 8 |
| 80 | elements as a local time and then compensates for the timezone |
| 81 | difference; this may yield a slight error around daylight savings time |
Guido van Rossum | 8cf94e6 | 1998-02-18 05:09:14 +0000 | [diff] [blame] | 82 | switch dates. Not enough to worry about for common use. |
| 83 | \end{funcdesc} |
| 84 | |
Guido van Rossum | ecde781 | 1995-03-28 13:35:14 +0000 | [diff] [blame] | 85 | \subsection{Message Objects} |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 86 | \label{message-objects} |
Guido van Rossum | ecde781 | 1995-03-28 13:35:14 +0000 | [diff] [blame] | 87 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 88 | A \class{Message} instance has the following methods: |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 89 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 90 | \begin{methoddesc}{rewindbody}{} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 91 | Seek to the start of the message body. This only works if the file |
| 92 | object is seekable. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 93 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 94 | |
Guido van Rossum | 444d0f8 | 1998-06-11 13:50:02 +0000 | [diff] [blame] | 95 | \begin{methoddesc}{isheader}{line} |
| 96 | Returns a line's canonicalized fieldname (the dictionary key that will |
| 97 | be used to index it) if the line is a legal RFC822 header; otherwise |
| 98 | returns None (implying that parsing should stop here and the line be |
| 99 | pushed back on the input stream). It is sometimes useful to override |
| 100 | this method in a subclass. |
| 101 | \end{methoddesc} |
| 102 | |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 103 | \begin{methoddesc}{islast}{line} |
| 104 | Return true if the given line is a delimiter on which Message should |
Guido van Rossum | 444d0f8 | 1998-06-11 13:50:02 +0000 | [diff] [blame] | 105 | stop. The delimiter line is consumed, and the file object's read |
| 106 | location positioned immediately after it. By default this method just |
| 107 | checks that the line is blank, but you can override it in a subclass. |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 108 | \end{methoddesc} |
| 109 | |
| 110 | \begin{methoddesc}{iscomment}{line} |
| 111 | Return true if the given line should be ignored entirely, just skipped. |
| 112 | By default this is a stub that always returns false, but you can |
| 113 | override it in a subclass. |
| 114 | \end{methoddesc} |
| 115 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 116 | \begin{methoddesc}{getallmatchingheaders}{name} |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 117 | Return a list of lines consisting of all headers matching |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 118 | \var{name}, if any. Each physical line, whether it is a continuation |
| 119 | line or not, is a separate list item. Return the empty list if no |
| 120 | header matches \var{name}. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 121 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 122 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 123 | \begin{methoddesc}{getfirstmatchingheader}{name} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 124 | Return a list of lines comprising the first header matching |
| 125 | \var{name}, and its continuation line(s), if any. Return \code{None} |
| 126 | if there is no header matching \var{name}. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 127 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 128 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 129 | \begin{methoddesc}{getrawheader}{name} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 130 | Return a single string consisting of the text after the colon in the |
| 131 | first header matching \var{name}. This includes leading whitespace, |
| 132 | the trailing linefeed, and internal linefeeds and whitespace if there |
| 133 | any continuation line(s) were present. Return \code{None} if there is |
| 134 | no header matching \var{name}. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 135 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 136 | |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 137 | \begin{methoddesc}{getheader}{name\optional{, default}} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 138 | Like \code{getrawheader(\var{name})}, but strip leading and trailing |
Guido van Rossum | 1299100 | 1998-06-10 21:34:27 +0000 | [diff] [blame] | 139 | whitespace. Internal whitespace is not stripped. The optional |
| 140 | \var{default} argument can be used to specify a different default to |
| 141 | be returned when there is no header matching \var{name}. |
| 142 | \end{methoddesc} |
| 143 | |
| 144 | \begin{methoddesc}{get}{name\optional{, default}} |
| 145 | An alias for \code{getheader()}, to make the interface more compatible |
| 146 | with regular dictionaries. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 147 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 148 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 149 | \begin{methoddesc}{getaddr}{name} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 150 | Return a pair \code{(\var{full name}, \var{email address})} parsed |
| 151 | from the string returned by \code{getheader(\var{name})}. If no |
| 152 | header matching \var{name} exists, return \code{(None, None)}; |
| 153 | otherwise both the full name and the address are (possibly empty) |
| 154 | strings. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 155 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 156 | Example: If \var{m}'s first \code{From} header contains the string |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 157 | \code{'jack@cwi.nl (Jack Jansen)'}, then |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 158 | \code{m.getaddr('From')} will yield the pair |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 159 | \code{('Jack Jansen', 'jack@cwi.nl')}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 160 | If the header contained |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 161 | \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 162 | exact same result. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 163 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 164 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 165 | \begin{methoddesc}{getaddrlist}{name} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 166 | This is similar to \code{getaddr(\var{list})}, but parses a header |
| 167 | containing a list of email addresses (e.g. a \code{To} header) and |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 168 | returns a list of \code{(\var{full name}, \var{email address})} pairs |
| 169 | (even if there was only one address in the header). If there is no |
| 170 | header matching \var{name}, return an empty list. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 171 | |
| 172 | XXX The current version of this function is not really correct. It |
| 173 | yields bogus results if a full name contains a comma. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 174 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 175 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 176 | \begin{methoddesc}{getdate}{name} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 177 | Retrieve a header using \method{getheader()} and parse it into a 9-tuple |
| 178 | compatible with \function{time.mktime()}. If there is no header matching |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 179 | \var{name}, or it is unparsable, return \code{None}. |
| 180 | |
| 181 | Date parsing appears to be a black art, and not all mailers adhere to |
| 182 | the standard. While it has been tested and found correct on a large |
| 183 | collection of email from many sources, it is still possible that this |
| 184 | function may occasionally yield an incorrect result. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 185 | \end{methoddesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 186 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 187 | \begin{methoddesc}{getdate_tz}{name} |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 188 | Retrieve a header using \method{getheader()} and parse it into a |
| 189 | 10-tuple; the first 9 elements will make a tuple compatible with |
| 190 | \function{time.mktime()}, and the 10th is a number giving the offset |
| 191 | of the date's timezone from UTC. Similarly to \method{getdate()}, if |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 192 | there is no header matching \var{name}, or it is unparsable, return |
| 193 | \code{None}. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 194 | \end{methoddesc} |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 195 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 196 | \class{Message} instances also support a read-only mapping interface. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 197 | In particular: \code{\var{m}[name]} is like |
| 198 | \code{\var{m}.getheader(name)} but raises \exception{KeyError} if |
| 199 | there is no matching header; and \code{len(\var{m})}, |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 200 | \code{\var{m}.has_key(name)}, \code{\var{m}.keys()}, |
| 201 | \code{\var{m}.values()} and \code{\var{m}.items()} act as expected |
| 202 | (and consistently). |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 203 | |
Fred Drake | cdea8a3 | 1998-03-14 06:17:43 +0000 | [diff] [blame] | 204 | Finally, \class{Message} instances have two public instance variables: |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 205 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 206 | \begin{memberdesc}{headers} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 207 | A list containing the entire set of header lines, in the order in |
Guido van Rossum | 8729483 | 1998-06-16 22:27:40 +0000 | [diff] [blame] | 208 | which they were read (except that setitem calls may disturb this |
| 209 | order). Each line contains a trailing newline. The |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 210 | blank line terminating the headers is not contained in the list. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 211 | \end{memberdesc} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 212 | |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 213 | \begin{memberdesc}{fp} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 214 | The file object passed at instantiation time. |
Fred Drake | e14dde2 | 1998-04-04 06:19:30 +0000 | [diff] [blame] | 215 | \end{memberdesc} |
Guido van Rossum | 8729483 | 1998-06-16 22:27:40 +0000 | [diff] [blame] | 216 | |
| 217 | \subsection{AddressList Objects} |
| 218 | \label{addresslist-objects} |
| 219 | |
| 220 | An \class{AddressList} instance has the following methods: |
| 221 | |
| 222 | \begin{methoddesc}{__len__}{name} |
| 223 | Return the number of addresses in the address list. |
| 224 | \end{methoddesc} |
| 225 | |
| 226 | \begin{methoddesc}{__str__}{name} |
| 227 | Return a canonicalized string representation of the address list. |
| 228 | Addresses are rendered in "name" <host@domain> form, comma-separated. |
| 229 | \end{methoddesc} |
| 230 | |
| 231 | \begin{methoddesc}{__add__}{name} |
| 232 | Return an AddressList instance that contains all addresses in both |
| 233 | AddressList operands, with duplicates removed (set union). |
| 234 | \end{methoddesc} |
| 235 | |
| 236 | \begin{methoddesc}{__sub__}{name} |
| 237 | Return an AddressList instance that contains every address in the |
| 238 | left-hand AddressList operand that is not present in the right-hand |
| 239 | address operand (set difference). |
| 240 | \end{methoddesc} |
| 241 | |
| 242 | |
| 243 | Finally, \class{AddressList} instances have one public instance variable: |
| 244 | |
| 245 | \begin{memberdesc}{addresslist} |
| 246 | A list of tuple string pairs, one per address. In each member, the |
| 247 | first is the canonicalized name part of the address, the second is the |
| 248 | route-address (@-separated host-domain pair). |
| 249 | \end{memberdesc} |