Fred Drake | 295da24 | 1998-08-10 19:42:37 +0000 | [diff] [blame] | 1 | \section{\module{multifile} --- |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 2 | Support for files containing distinct parts} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 4 | \declaremodule{standard}{multifile} |
Fred Drake | d795c5c | 1998-08-07 15:55:14 +0000 | [diff] [blame] | 5 | \modulesynopsis{Support for reading files which contain distinct |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 6 | parts, such as some MIME data.} |
| 7 | \sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 8 | |
Georg Brandl | 868e704 | 2006-02-21 19:23:49 +0000 | [diff] [blame] | 9 | \deprecated{2.5}{The \refmodule{email} package should be used in |
| 10 | preference to the \module{multifile} module. |
| 11 | This module is present only to maintain backward |
| 12 | compatibility.} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 13 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 14 | The \class{MultiFile} object enables you to treat sections of a text |
| 15 | file as file-like input objects, with \code{''} being returned by |
| 16 | \method{readline()} when a given delimiter pattern is encountered. The |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 17 | defaults of this class are designed to make it useful for parsing |
| 18 | MIME multipart messages, but by subclassing it and overriding methods |
| 19 | it can be easily adapted for more general use. |
| 20 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 21 | \begin{classdesc}{MultiFile}{fp\optional{, seekable}} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 22 | Create a multi-file. You must instantiate this class with an input |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 23 | object argument for the \class{MultiFile} instance to get lines from, |
Raymond Hettinger | 999b57c | 2003-08-25 04:28:05 +0000 | [diff] [blame] | 24 | such as a file object returned by \function{open()}. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 25 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 26 | \class{MultiFile} only ever looks at the input object's |
| 27 | \method{readline()}, \method{seek()} and \method{tell()} methods, and |
| 28 | the latter two are only needed if you want random access to the |
| 29 | individual MIME parts. To use \class{MultiFile} on a non-seekable |
| 30 | stream object, set the optional \var{seekable} argument to false; this |
| 31 | will prevent using the input object's \method{seek()} and |
| 32 | \method{tell()} methods. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 33 | \end{classdesc} |
| 34 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 35 | It will be useful to know that in \class{MultiFile}'s view of the world, text |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 36 | is composed of three kinds of lines: data, section-dividers, and |
| 37 | end-markers. MultiFile is designed to support parsing of |
| 38 | messages that may have multiple nested message parts, each with its |
| 39 | own pattern for section-divider and end-marker lines. |
| 40 | |
Fred Drake | 2d3c03d | 2002-08-06 21:26:01 +0000 | [diff] [blame] | 41 | \begin{seealso} |
Raymond Hettinger | 6880431 | 2005-01-01 00:28:46 +0000 | [diff] [blame] | 42 | \seemodule{email}{Comprehensive email handling package; supersedes |
Fred Drake | 2d3c03d | 2002-08-06 21:26:01 +0000 | [diff] [blame] | 43 | the \module{multifile} module.} |
| 44 | \end{seealso} |
| 45 | |
Fred Drake | d795c5c | 1998-08-07 15:55:14 +0000 | [diff] [blame] | 46 | |
| 47 | \subsection{MultiFile Objects \label{MultiFile-objects}} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 48 | |
| 49 | A \class{MultiFile} instance has the following methods: |
| 50 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 51 | \begin{methoddesc}[MultiFile]{readline}{str} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 52 | Read a line. If the line is data (not a section-divider or end-marker |
| 53 | or real EOF) return it. If the line matches the most-recently-stacked |
Guido van Rossum | 8ec619f | 1998-06-30 16:35:25 +0000 | [diff] [blame] | 54 | boundary, return \code{''} and set \code{self.last} to 1 or 0 according as |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 55 | the match is or is not an end-marker. If the line matches any other |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 56 | stacked boundary, raise an error. On encountering end-of-file on the |
| 57 | underlying stream object, the method raises \exception{Error} unless |
| 58 | all boundaries have been popped. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 59 | \end{methoddesc} |
| 60 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 61 | \begin{methoddesc}[MultiFile]{readlines}{str} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 62 | Return all lines remaining in this part as a list of strings. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 63 | \end{methoddesc} |
| 64 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 65 | \begin{methoddesc}[MultiFile]{read}{} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 66 | Read all lines, up to the next section. Return them as a single |
| 67 | (multiline) string. Note that this doesn't take a size argument! |
| 68 | \end{methoddesc} |
| 69 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 70 | \begin{methoddesc}[MultiFile]{seek}{pos\optional{, whence}} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 71 | Seek. Seek indices are relative to the start of the current section. |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 72 | The \var{pos} and \var{whence} arguments are interpreted as for a file |
| 73 | seek. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 74 | \end{methoddesc} |
| 75 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 76 | \begin{methoddesc}[MultiFile]{tell}{} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 77 | Return the file position relative to the start of the current section. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 78 | \end{methoddesc} |
| 79 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 80 | \begin{methoddesc}[MultiFile]{next}{} |
Fred Drake | f0ebbe0 | 2001-03-08 22:46:41 +0000 | [diff] [blame] | 81 | Skip lines to the next section (that is, read lines until a |
| 82 | section-divider or end-marker has been consumed). Return true if |
| 83 | there is such a section, false if an end-marker is seen. Re-enable |
| 84 | the most-recently-pushed boundary. |
| 85 | \end{methoddesc} |
| 86 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 87 | \begin{methoddesc}[MultiFile]{is_data}{str} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 88 | Return true if \var{str} is data and false if it might be a section |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 89 | boundary. As written, it tests for a prefix other than \code{'-}\code{-'} at |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 90 | start of line (which all MIME boundaries have) but it is declared so |
| 91 | it can be overridden in derived classes. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 92 | |
| 93 | Note that this test is used intended as a fast guard for the real |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 94 | boundary tests; if it always returns false it will merely slow |
| 95 | processing, not cause it to fail. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 96 | \end{methoddesc} |
| 97 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 98 | \begin{methoddesc}[MultiFile]{push}{str} |
Georg Brandl | 4bd165a | 2005-11-22 19:42:45 +0000 | [diff] [blame] | 99 | Push a boundary string. When a decorated version of this boundary |
| 100 | is found as an input line, it will be interpreted as a section-divider |
| 101 | or end-marker (depending on the decoration, see \rfc{2045}). All subsequent |
Fred Drake | f0ebbe0 | 2001-03-08 22:46:41 +0000 | [diff] [blame] | 102 | reads will return the empty string to indicate end-of-file, until a |
| 103 | call to \method{pop()} removes the boundary a or \method{next()} call |
| 104 | reenables it. |
| 105 | |
| 106 | It is possible to push more than one boundary. Encountering the |
| 107 | most-recently-pushed boundary will return EOF; encountering any other |
| 108 | boundary will raise an error. |
| 109 | \end{methoddesc} |
| 110 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 111 | \begin{methoddesc}[MultiFile]{pop}{} |
Fred Drake | f0ebbe0 | 2001-03-08 22:46:41 +0000 | [diff] [blame] | 112 | Pop a section boundary. This boundary will no longer be interpreted |
| 113 | as EOF. |
| 114 | \end{methoddesc} |
| 115 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 116 | \begin{methoddesc}[MultiFile]{section_divider}{str} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 117 | Turn a boundary into a section-divider line. By default, this |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 118 | method prepends \code{'-}\code{-'} (which MIME section boundaries have) but |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 119 | it is declared so it can be overridden in derived classes. This |
| 120 | method need not append LF or CR-LF, as comparison with the result |
| 121 | ignores trailing whitespace. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 122 | \end{methoddesc} |
| 123 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 124 | \begin{methoddesc}[MultiFile]{end_marker}{str} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 125 | Turn a boundary string into an end-marker line. By default, this |
Fred Drake | 812860e | 1999-04-23 14:46:18 +0000 | [diff] [blame] | 126 | method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 127 | MIME-multipart end-of-message marker) but it is declared so it can be |
Raymond Hettinger | 7e43110 | 2003-09-22 15:00:55 +0000 | [diff] [blame] | 128 | overridden in derived classes. This method need not append LF or |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 129 | CR-LF, as comparison with the result ignores trailing whitespace. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 130 | \end{methoddesc} |
| 131 | |
| 132 | Finally, \class{MultiFile} instances have two public instance variables: |
| 133 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 134 | \begin{memberdesc}[MultiFile]{level} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 135 | Nesting depth of the current part. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 136 | \end{memberdesc} |
| 137 | |
Guido van Rossum | d8faa36 | 2007-04-27 19:54:29 +0000 | [diff] [blame] | 138 | \begin{memberdesc}[MultiFile]{last} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 139 | True if the last end-of-file was for an end-of-message marker. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 140 | \end{memberdesc} |
| 141 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 142 | |
Fred Drake | d795c5c | 1998-08-07 15:55:14 +0000 | [diff] [blame] | 143 | \subsection{\class{MultiFile} Example \label{multifile-example}} |
Fred Drake | 9164f88 | 2000-04-08 04:53:29 +0000 | [diff] [blame] | 144 | \sectionauthor{Skip Montanaro}{skip@mojam.com} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 145 | |
| 146 | \begin{verbatim} |
Fred Drake | c2c46c3 | 2000-04-07 16:09:59 +0000 | [diff] [blame] | 147 | import mimetools |
Martin v. Löwis | d15a942 | 2000-09-30 17:04:40 +0000 | [diff] [blame] | 148 | import multifile |
Fred Drake | c2c46c3 | 2000-04-07 16:09:59 +0000 | [diff] [blame] | 149 | import StringIO |
| 150 | |
| 151 | def extract_mime_part_matching(stream, mimetype): |
| 152 | """Return the first element in a multipart MIME message on stream |
| 153 | matching mimetype.""" |
| 154 | |
| 155 | msg = mimetools.Message(stream) |
| 156 | msgtype = msg.gettype() |
| 157 | params = msg.getplist() |
| 158 | |
| 159 | data = StringIO.StringIO() |
| 160 | if msgtype[:10] == "multipart/": |
| 161 | |
| 162 | file = multifile.MultiFile(stream) |
| 163 | file.push(msg.getparam("boundary")) |
| 164 | while file.next(): |
| 165 | submsg = mimetools.Message(file) |
| 166 | try: |
| 167 | data = StringIO.StringIO() |
| 168 | mimetools.decode(file, data, submsg.getencoding()) |
| 169 | except ValueError: |
| 170 | continue |
| 171 | if submsg.gettype() == mimetype: |
| 172 | break |
| 173 | file.pop() |
| 174 | return data.getvalue() |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 175 | \end{verbatim} |