Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{mailbox} --- |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 2 | Read various mailbox formats} |
| 3 | |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 4 | \declaremodule{standard}{mailbox} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 5 | \modulesynopsis{Read various mailbox formats.} |
| 6 | |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 7 | |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 8 | This module defines a number of classes that allow easy and uniform |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 9 | access to mail messages in a (\UNIX) mailbox. |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 10 | |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 11 | \begin{classdesc}{UnixMailbox}{fp\optional{, factory}} |
Fred Drake | 6270031 | 2001-02-02 03:51:05 +0000 | [diff] [blame] | 12 | Access to a classic \UNIX-style mailbox, where all messages are |
| 13 | contained in a single file and separated by \samp{From } |
| 14 | (a.k.a.\ \samp{From_}) lines. The file object \var{fp} points to the |
| 15 | mailbox file. The optional \var{factory} parameter is a callable that |
| 16 | should create new message objects. \var{factory} is called with one |
| 17 | argument, \var{fp} by the \method{next()} method of the mailbox |
| 18 | object. The default is the \class{rfc822.Message} class (see the |
Barry Warsaw | 47db252 | 2003-06-20 22:04:03 +0000 | [diff] [blame] | 19 | \refmodule{rfc822} module -- and the note below). |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 20 | |
Barry Warsaw | dd69b0a | 2004-05-10 23:12:52 +0000 | [diff] [blame^] | 21 | \note{For reasons of this module's internal implementation, you will probably |
| 22 | want to open the \var{fp} object in binary mode. This is especially important |
| 23 | on Windows.} |
| 24 | |
Fred Drake | 6270031 | 2001-02-02 03:51:05 +0000 | [diff] [blame] | 25 | For maximum portability, messages in a \UNIX-style mailbox are |
| 26 | separated by any line that begins exactly with the string \code{'From |
| 27 | '} (note the trailing space) if preceded by exactly two newlines. |
| 28 | Because of the wide-range of variations in practice, nothing else on |
| 29 | the From_ line should be considered. However, the current |
| 30 | implementation doesn't check for the leading two newlines. This is |
| 31 | usually fine for most applications. |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 32 | |
| 33 | The \class{UnixMailbox} class implements a more strict version of |
| 34 | From_ line checking, using a regular expression that usually correctly |
| 35 | matched From_ delimiters. It considers delimiter line to be separated |
Fred Drake | 6270031 | 2001-02-02 03:51:05 +0000 | [diff] [blame] | 36 | by \samp{From \var{name} \var{time}} lines. For maximum portability, |
| 37 | use the \class{PortableUnixMailbox} class instead. This class is |
| 38 | identical to \class{UnixMailbox} except that individual messages are |
| 39 | separated by only \samp{From } lines. |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 40 | |
Fred Drake | 6270031 | 2001-02-02 03:51:05 +0000 | [diff] [blame] | 41 | For more information, see |
| 42 | \citetitle[http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html]{Configuring |
| 43 | Netscape Mail on \UNIX: Why the Content-Length Format is Bad}. |
| 44 | \end{classdesc} |
| 45 | |
| 46 | \begin{classdesc}{PortableUnixMailbox}{fp\optional{, factory}} |
| 47 | A less-strict version of \class{UnixMailbox}, which considers only the |
| 48 | \samp{From } at the beginning of the line separating messages. The |
| 49 | ``\var{name} \var{time}'' portion of the From line is ignored, to |
| 50 | protect against some variations that are observed in practice. This |
| 51 | works since lines in the message which begin with \code{'From '} are |
Greg Ward | 02669a3 | 2002-09-23 19:32:42 +0000 | [diff] [blame] | 52 | quoted by mail handling software at delivery-time. |
Fred Drake | 2e495c9 | 1998-03-14 06:48:33 +0000 | [diff] [blame] | 53 | \end{classdesc} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 54 | |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 55 | \begin{classdesc}{MmdfMailbox}{fp\optional{, factory}} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 56 | Access an MMDF-style mailbox, where all messages are contained |
| 57 | in a single file and separated by lines consisting of 4 control-A |
Fred Drake | 6e99adb | 1998-02-13 22:17:21 +0000 | [diff] [blame] | 58 | characters. The file object \var{fp} points to the mailbox file. |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 59 | Optional \var{factory} is as with the \class{UnixMailbox} class. |
Fred Drake | 2e495c9 | 1998-03-14 06:48:33 +0000 | [diff] [blame] | 60 | \end{classdesc} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 61 | |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 62 | \begin{classdesc}{MHMailbox}{dirname\optional{, factory}} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 63 | Access an MH mailbox, a directory with each message in a separate |
Fred Drake | 6e99adb | 1998-02-13 22:17:21 +0000 | [diff] [blame] | 64 | file with a numeric name. |
| 65 | The name of the mailbox directory is passed in \var{dirname}. |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 66 | \var{factory} is as with the \class{UnixMailbox} class. |
Fred Drake | 2e495c9 | 1998-03-14 06:48:33 +0000 | [diff] [blame] | 67 | \end{classdesc} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 68 | |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 69 | \begin{classdesc}{Maildir}{dirname\optional{, factory}} |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 70 | Access a Qmail mail directory. All new and current mail for the |
| 71 | mailbox specified by \var{dirname} is made available. |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 72 | \var{factory} is as with the \class{UnixMailbox} class. |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 73 | \end{classdesc} |
| 74 | |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 75 | \begin{classdesc}{BabylMailbox}{fp\optional{, factory}} |
Barry Warsaw | c3cbbaf | 2001-04-11 20:12:33 +0000 | [diff] [blame] | 76 | Access a Babyl mailbox, which is similar to an MMDF mailbox. In |
| 77 | Babyl format, each message has two sets of headers, the |
| 78 | \emph{original} headers and the \emph{visible} headers. The original |
Raymond Hettinger | 999b57c | 2003-08-25 04:28:05 +0000 | [diff] [blame] | 79 | headers appear before a line containing only \code{'*** EOOH ***'} |
Barry Warsaw | c3cbbaf | 2001-04-11 20:12:33 +0000 | [diff] [blame] | 80 | (End-Of-Original-Headers) and the visible headers appear after the |
| 81 | \code{EOOH} line. Babyl-compliant mail readers will show you only the |
| 82 | visible headers, and \class{BabylMailbox} objects will return messages |
| 83 | containing only the visible headers. You'll have to do your own |
| 84 | parsing of the mailbox file to get at the original headers. Mail |
| 85 | messages start with the EOOH line and end with a line containing only |
| 86 | \code{'\e{}037\e{}014'}. \var{factory} is as with the |
| 87 | \class{UnixMailbox} class. |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 88 | \end{classdesc} |
| 89 | |
Barry Warsaw | 47db252 | 2003-06-20 22:04:03 +0000 | [diff] [blame] | 90 | Note that because the \refmodule{rfc822} module is deprecated, it is |
| 91 | recommended that you use the \refmodule{email} package to create |
| 92 | message objects from a mailbox. (The default can't be changed for |
| 93 | backwards compatibility reasons.) The safest way to do this is with |
| 94 | bit of code: |
| 95 | |
| 96 | \begin{verbatim} |
| 97 | import email |
| 98 | import email.Errors |
| 99 | import mailbox |
| 100 | |
| 101 | def msgfactory(fp): |
| 102 | try: |
| 103 | return email.message_from_file(fp) |
| 104 | except email.Errors.MessageParseError: |
| 105 | # Don't return None since that will |
| 106 | # stop the mailbox iterator |
| 107 | return '' |
| 108 | |
| 109 | mbox = mailbox.UnixMailbox(fp, msgfactory) |
| 110 | \end{verbatim} |
| 111 | |
| 112 | The above wrapper is defensive against ill-formed MIME messages in the |
| 113 | mailbox, but you have to be prepared to receive the empty string from |
| 114 | the mailbox's \function{next()} method. On the other hand, if you |
| 115 | know your mailbox contains only well-formed MIME messages, you can |
| 116 | simplify this to: |
| 117 | |
| 118 | \begin{verbatim} |
| 119 | import email |
| 120 | import mailbox |
| 121 | |
| 122 | mbox = mailbox.UnixMailbox(fp, email.message_from_file) |
| 123 | \end{verbatim} |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 124 | |
Fred Drake | 1400baa | 2001-05-21 21:23:01 +0000 | [diff] [blame] | 125 | \begin{seealso} |
| 126 | \seetitle[http://www.qmail.org/man/man5/mbox.html]{mbox - |
| 127 | file containing mail messages}{Description of the |
| 128 | traditional ``mbox'' mailbox format.} |
| 129 | \seetitle[http://www.qmail.org/man/man5/maildir.html]{maildir - |
| 130 | directory for incoming mail messages}{Description of the |
| 131 | ``maildir'' mailbox format.} |
| 132 | \seetitle[http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html]{Configuring |
| 133 | Netscape Mail on \UNIX: Why the Content-Length Format is |
| 134 | Bad}{A description of problems with relying on the |
Fred Drake | d86038d | 2001-08-03 18:39:36 +0000 | [diff] [blame] | 135 | \mailheader{Content-Length} header for messages stored in |
| 136 | mailbox files.} |
Fred Drake | 1400baa | 2001-05-21 21:23:01 +0000 | [diff] [blame] | 137 | \end{seealso} |
| 138 | |
| 139 | |
Fred Drake | 199b79c | 1999-02-20 05:04:59 +0000 | [diff] [blame] | 140 | \subsection{Mailbox Objects \label{mailbox-objects}} |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 141 | |
Fred Drake | e9ba525 | 2001-10-01 15:49:56 +0000 | [diff] [blame] | 142 | All implementations of mailbox objects are iterable objects, and |
| 143 | have one externally visible method. This method is used by iterators |
| 144 | created from mailbox objects and may also be used directly. |
Guido van Rossum | 39a23cc | 1997-06-02 21:04:41 +0000 | [diff] [blame] | 145 | |
Fred Drake | 182bd2d | 1998-04-02 18:50:21 +0000 | [diff] [blame] | 146 | \begin{methoddesc}[mailbox]{next}{} |
Barry Warsaw | 30dbd14 | 2001-01-31 22:14:01 +0000 | [diff] [blame] | 147 | Return the next message in the mailbox, created with the optional |
| 148 | \var{factory} argument passed into the mailbox object's constructor. |
Skip Montanaro | bb5a465 | 2001-09-05 19:27:13 +0000 | [diff] [blame] | 149 | By default this is an \class{rfc822.Message} |
Fred Drake | 806a467 | 1999-03-27 05:45:46 +0000 | [diff] [blame] | 150 | object (see the \refmodule{rfc822} module). Depending on the mailbox |
| 151 | implementation the \var{fp} attribute of this object may be a true |
| 152 | file object or a class instance simulating a file object, taking care |
| 153 | of things like message boundaries if multiple mail messages are |
Fred Drake | 2c4f554 | 2000-10-10 22:00:03 +0000 | [diff] [blame] | 154 | contained in a single file, etc. If no more messages are available, |
| 155 | this method returns \code{None}. |
Fred Drake | 182bd2d | 1998-04-02 18:50:21 +0000 | [diff] [blame] | 156 | \end{methoddesc} |