Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 1 | \section{Standard Module \sectcode{rfc822}} |
Guido van Rossum | e47da0a | 1997-07-17 16:34:52 +0000 | [diff] [blame] | 2 | \label{module-rfc822} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 3 | \stmodindex{rfc822} |
| 4 | |
Guido van Rossum | 8675115 | 1995-02-28 17:14:32 +0000 | [diff] [blame] | 5 | \renewcommand{\indexsubitem}{(in module rfc822)} |
| 6 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 7 | This module defines a class, \code{Message}, which represents a |
| 8 | collection of ``email headers'' as defined by the Internet standard |
| 9 | RFC 822. It is used in various contexts, usually to read such headers |
| 10 | from a file. |
Fred Drake | 5ca9033 | 1997-12-16 15:19:47 +0000 | [diff] [blame^] | 11 | \index{RFC!822} |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 12 | |
Fred Drake | 5ca9033 | 1997-12-16 15:19:47 +0000 | [diff] [blame^] | 13 | Note that there's a separate module to read \UNIX{}, MH, and MMDF |
| 14 | style mailbox files: \code{mailbox}. |
| 15 | \refstmodindex{mailbox} |
Guido van Rossum | 067a2ac | 1997-06-02 17:30:03 +0000 | [diff] [blame] | 16 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 17 | A \code{Message} instance is instantiated with an open file object as |
Guido van Rossum | 067a2ac | 1997-06-02 17:30:03 +0000 | [diff] [blame] | 18 | parameter. The optional \code{seekable} parameter indicates if the |
| 19 | file object is seekable; the default value is 1 for true. |
| 20 | Instantiation reads headers from the file up to a blank line and |
| 21 | stores them in the instance; after instantiation, the file is |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 22 | positioned directly after the blank line that terminates the headers. |
| 23 | |
| 24 | Input lines as read from the file may either be terminated by CR-LF or |
| 25 | by a single linefeed; a terminating CR-LF is replaced by a single |
| 26 | linefeed before the line is stored. |
| 27 | |
| 28 | All header matching is done independent of upper or lower case; |
| 29 | e.g. \code{m['From']}, \code{m['from']} and \code{m['FROM']} all yield |
| 30 | the same result. |
| 31 | |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 32 | \begin{funcdesc}{parsedate}{date} |
| 33 | Attempts to parse a date according to the rules in RFC822. however, |
| 34 | some mailers don't follow that format as specified, so |
| 35 | \code{parsedate()} tries to guess correctly in such cases. |
| 36 | \var{date} is a string containing an RFC822 date, such as |
| 37 | \code{"Mon, 20 Nov 1995 19:12:08 -0500"}. If it succeeds in parsing |
| 38 | the date, \code{parsedate()} returns a 9-tuple that can be passed |
| 39 | directly to \code{time.mktime()}; otherwise \code{None} will be |
| 40 | returned. |
| 41 | \end{funcdesc} |
| 42 | |
| 43 | \begin{funcdesc}{parsedate_tz}{date} |
| 44 | Performs the same function as \code{parsedate}, but returns either |
| 45 | \code{None} or a 10-tuple; the first 9 elements make up a tuple that |
| 46 | can be passed directly to \code{time.mktime()}, and the tenth is the |
| 47 | offset of the date's time zone from UTC (which is the official term |
| 48 | for Greenwich Mean Time). |
| 49 | \end{funcdesc} |
| 50 | |
Guido van Rossum | ecde781 | 1995-03-28 13:35:14 +0000 | [diff] [blame] | 51 | \subsection{Message Objects} |
| 52 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 53 | A \code{Message} instance has the following methods: |
| 54 | |
| 55 | \begin{funcdesc}{rewindbody}{} |
| 56 | Seek to the start of the message body. This only works if the file |
| 57 | object is seekable. |
| 58 | \end{funcdesc} |
| 59 | |
| 60 | \begin{funcdesc}{getallmatchingheaders}{name} |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 61 | Return a list of lines consisting of all headers matching |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 62 | \var{name}, if any. Each physical line, whether it is a continuation |
| 63 | line or not, is a separate list item. Return the empty list if no |
| 64 | header matches \var{name}. |
| 65 | \end{funcdesc} |
| 66 | |
| 67 | \begin{funcdesc}{getfirstmatchingheader}{name} |
| 68 | Return a list of lines comprising the first header matching |
| 69 | \var{name}, and its continuation line(s), if any. Return \code{None} |
| 70 | if there is no header matching \var{name}. |
| 71 | \end{funcdesc} |
| 72 | |
| 73 | \begin{funcdesc}{getrawheader}{name} |
| 74 | Return a single string consisting of the text after the colon in the |
| 75 | first header matching \var{name}. This includes leading whitespace, |
| 76 | the trailing linefeed, and internal linefeeds and whitespace if there |
| 77 | any continuation line(s) were present. Return \code{None} if there is |
| 78 | no header matching \var{name}. |
| 79 | \end{funcdesc} |
| 80 | |
| 81 | \begin{funcdesc}{getheader}{name} |
| 82 | Like \code{getrawheader(\var{name})}, but strip leading and trailing |
| 83 | whitespace (but not internal whitespace). |
| 84 | \end{funcdesc} |
| 85 | |
| 86 | \begin{funcdesc}{getaddr}{name} |
| 87 | Return a pair (full name, email address) parsed from the string |
| 88 | returned by \code{getheader(\var{name})}. If no header matching |
| 89 | \var{name} exists, return \code{None, None}; otherwise both the full |
| 90 | name and the address are (possibly empty )strings. |
| 91 | |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 92 | Example: If \code{m}'s first \code{From} header contains the string\\ |
| 93 | \code{'jack@cwi.nl (Jack Jansen)'}, then |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 94 | \code{m.getaddr('From')} will yield the pair |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 95 | \code{('Jack Jansen', 'jack@cwi.nl')}. |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 96 | If the header contained |
Guido van Rossum | 470be14 | 1995-03-17 16:07:09 +0000 | [diff] [blame] | 97 | \code{'Jack Jansen <jack@cwi.nl>'} instead, it would yield the |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 98 | exact same result. |
| 99 | \end{funcdesc} |
| 100 | |
| 101 | \begin{funcdesc}{getaddrlist}{name} |
| 102 | This is similar to \code{getaddr(\var{list})}, but parses a header |
| 103 | containing a list of email addresses (e.g. a \code{To} header) and |
| 104 | returns a list of (full name, email address) pairs (even if there was |
| 105 | only one address in the header). If there is no header matching |
| 106 | \var{name}, return an empty list. |
| 107 | |
| 108 | XXX The current version of this function is not really correct. It |
| 109 | yields bogus results if a full name contains a comma. |
| 110 | \end{funcdesc} |
| 111 | |
| 112 | \begin{funcdesc}{getdate}{name} |
| 113 | Retrieve a header using \code{getheader} and parse it into a 9-tuple |
Guido van Rossum | 6c4f003 | 1995-03-07 10:14:09 +0000 | [diff] [blame] | 114 | compatible with \code{time.mktime()}. If there is no header matching |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 115 | \var{name}, or it is unparsable, return \code{None}. |
| 116 | |
| 117 | Date parsing appears to be a black art, and not all mailers adhere to |
| 118 | the standard. While it has been tested and found correct on a large |
| 119 | collection of email from many sources, it is still possible that this |
| 120 | function may occasionally yield an incorrect result. |
| 121 | \end{funcdesc} |
| 122 | |
Guido van Rossum | 843e712 | 1996-12-06 21:23:53 +0000 | [diff] [blame] | 123 | \begin{funcdesc}{getdate_tz}{name} |
| 124 | Retrieve a header using \code{getheader} and parse it into a 10-tuple; |
| 125 | the first 9 elements will make a tuple compatible with |
| 126 | \code{time.mktime()}, and the 10th is a number giving the offset of |
| 127 | the date's time zone from UTC. Similarly to \code{getdate()}, if |
| 128 | there is no header matching \var{name}, or it is unparsable, return |
| 129 | \code{None}. |
| 130 | \end{funcdesc} |
| 131 | |
Guido van Rossum | a12ef94 | 1995-02-27 17:53:25 +0000 | [diff] [blame] | 132 | \code{Message} instances also support a read-only mapping interface. |
| 133 | In particular: \code{m[name]} is the same as \code{m.getheader(name)}; |
| 134 | and \code{len(m)}, \code{m.has_key(name)}, \code{m.keys()}, |
| 135 | \code{m.values()} and \code{m.items()} act as expected (and |
| 136 | consistently). |
| 137 | |
| 138 | Finally, \code{Message} instances have two public instance variables: |
| 139 | |
| 140 | \begin{datadesc}{headers} |
| 141 | A list containing the entire set of header lines, in the order in |
| 142 | which they were read. Each line contains a trailing newline. The |
| 143 | blank line terminating the headers is not contained in the list. |
| 144 | \end{datadesc} |
| 145 | |
| 146 | \begin{datadesc}{fp} |
| 147 | The file object passed at instantiation time. |
| 148 | \end{datadesc} |