blob: 98ccafb25df1cc292d954ffdcc36cc2a2611d897 [file] [log] [blame]
Fred Drake295da241998-08-10 19:42:37 +00001\section{\module{multifile} ---
Fred Drake812860e1999-04-23 14:46:18 +00002 Support for files containing distinct parts}
Fred Drakeb91e9341998-07-23 17:59:49 +00003
Fred Drake812860e1999-04-23 14:46:18 +00004\declaremodule{standard}{multifile}
Fred Draked795c5c1998-08-07 15:55:14 +00005\modulesynopsis{Support for reading files which contain distinct
Fred Drake812860e1999-04-23 14:46:18 +00006 parts, such as some MIME data.}
7\sectionauthor{Eric S. Raymond}{esr@snark.thyrsus.com}
Fred Drakeb91e9341998-07-23 17:59:49 +00008
Guido van Rossum8668e8e1998-06-28 17:55:53 +00009
Fred Drake1717ba41998-07-02 19:36:50 +000010The \class{MultiFile} object enables you to treat sections of a text
11file as file-like input objects, with \code{''} being returned by
12\method{readline()} when a given delimiter pattern is encountered. The
Guido van Rossum8668e8e1998-06-28 17:55:53 +000013defaults of this class are designed to make it useful for parsing
14MIME multipart messages, but by subclassing it and overriding methods
15it can be easily adapted for more general use.
16
Fred Drake1717ba41998-07-02 19:36:50 +000017\begin{classdesc}{MultiFile}{fp\optional{, seekable}}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000018Create a multi-file. You must instantiate this class with an input
Fred Drake1717ba41998-07-02 19:36:50 +000019object argument for the \class{MultiFile} instance to get lines from,
Raymond Hettinger999b57c2003-08-25 04:28:05 +000020such as a file object returned by \function{open()}.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000021
Fred Drake1717ba41998-07-02 19:36:50 +000022\class{MultiFile} only ever looks at the input object's
23\method{readline()}, \method{seek()} and \method{tell()} methods, and
24the latter two are only needed if you want random access to the
25individual MIME parts. To use \class{MultiFile} on a non-seekable
26stream object, set the optional \var{seekable} argument to false; this
27will prevent using the input object's \method{seek()} and
28\method{tell()} methods.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000029\end{classdesc}
30
Fred Drake1717ba41998-07-02 19:36:50 +000031It will be useful to know that in \class{MultiFile}'s view of the world, text
Guido van Rossum8668e8e1998-06-28 17:55:53 +000032is composed of three kinds of lines: data, section-dividers, and
33end-markers. MultiFile is designed to support parsing of
34messages that may have multiple nested message parts, each with its
35own pattern for section-divider and end-marker lines.
36
Fred Drake2d3c03d2002-08-06 21:26:01 +000037\begin{seealso}
38 \seemodule{email}{Comprehensive email handling package; supercedes
39 the \module{multifile} module.}
40\end{seealso}
41
Fred Draked795c5c1998-08-07 15:55:14 +000042
43\subsection{MultiFile Objects \label{MultiFile-objects}}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000044
45A \class{MultiFile} instance has the following methods:
46
Guido van Rossum8668e8e1998-06-28 17:55:53 +000047\begin{methoddesc}{readline}{str}
48Read a line. If the line is data (not a section-divider or end-marker
49or real EOF) return it. If the line matches the most-recently-stacked
Guido van Rossum8ec619f1998-06-30 16:35:25 +000050boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
Guido van Rossum8668e8e1998-06-28 17:55:53 +000051the match is or is not an end-marker. If the line matches any other
Fred Drake1717ba41998-07-02 19:36:50 +000052stacked boundary, raise an error. On encountering end-of-file on the
53underlying stream object, the method raises \exception{Error} unless
54all boundaries have been popped.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000055\end{methoddesc}
56
57\begin{methoddesc}{readlines}{str}
Fred Drake1717ba41998-07-02 19:36:50 +000058Return all lines remaining in this part as a list of strings.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000059\end{methoddesc}
60
Fred Drake1717ba41998-07-02 19:36:50 +000061\begin{methoddesc}{read}{}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000062Read all lines, up to the next section. Return them as a single
63(multiline) string. Note that this doesn't take a size argument!
64\end{methoddesc}
65
Fred Drake1717ba41998-07-02 19:36:50 +000066\begin{methoddesc}{seek}{pos\optional{, whence}}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000067Seek. Seek indices are relative to the start of the current section.
Fred Drake1717ba41998-07-02 19:36:50 +000068The \var{pos} and \var{whence} arguments are interpreted as for a file
69seek.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000070\end{methoddesc}
71
Fred Drake1717ba41998-07-02 19:36:50 +000072\begin{methoddesc}{tell}{}
73Return the file position relative to the start of the current section.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000074\end{methoddesc}
75
Fred Drakef0ebbe02001-03-08 22:46:41 +000076\begin{methoddesc}{next}{}
77Skip lines to the next section (that is, read lines until a
78section-divider or end-marker has been consumed). Return true if
79there is such a section, false if an end-marker is seen. Re-enable
80the most-recently-pushed boundary.
81\end{methoddesc}
82
Guido van Rossum8668e8e1998-06-28 17:55:53 +000083\begin{methoddesc}{is_data}{str}
Fred Drake1717ba41998-07-02 19:36:50 +000084Return true if \var{str} is data and false if it might be a section
Fred Drake812860e1999-04-23 14:46:18 +000085boundary. As written, it tests for a prefix other than \code{'-}\code{-'} at
Fred Drake1717ba41998-07-02 19:36:50 +000086start of line (which all MIME boundaries have) but it is declared so
87it can be overridden in derived classes.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000088
89Note that this test is used intended as a fast guard for the real
Fred Drake1717ba41998-07-02 19:36:50 +000090boundary tests; if it always returns false it will merely slow
91processing, not cause it to fail.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000092\end{methoddesc}
93
Fred Drakef0ebbe02001-03-08 22:46:41 +000094\begin{methoddesc}{push}{str}
95Push a boundary string. When an appropriately decorated version of
96this boundary is found as an input line, it will be interpreted as a
97section-divider or end-marker. All subsequent
98reads will return the empty string to indicate end-of-file, until a
99call to \method{pop()} removes the boundary a or \method{next()} call
100reenables it.
101
102It is possible to push more than one boundary. Encountering the
103most-recently-pushed boundary will return EOF; encountering any other
104boundary will raise an error.
105\end{methoddesc}
106
107\begin{methoddesc}{pop}{}
108Pop a section boundary. This boundary will no longer be interpreted
109as EOF.
110\end{methoddesc}
111
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000112\begin{methoddesc}{section_divider}{str}
113Turn a boundary into a section-divider line. By default, this
Fred Drake812860e1999-04-23 14:46:18 +0000114method prepends \code{'-}\code{-'} (which MIME section boundaries have) but
Fred Drake1717ba41998-07-02 19:36:50 +0000115it is declared so it can be overridden in derived classes. This
116method need not append LF or CR-LF, as comparison with the result
117ignores trailing whitespace.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000118\end{methoddesc}
119
120\begin{methoddesc}{end_marker}{str}
121Turn a boundary string into an end-marker line. By default, this
Fred Drake812860e1999-04-23 14:46:18 +0000122method prepends \code{'-}\code{-'} and appends \code{'-}\code{-'} (like a
Fred Drake1717ba41998-07-02 19:36:50 +0000123MIME-multipart end-of-message marker) but it is declared so it can be
Raymond Hettinger7e431102003-09-22 15:00:55 +0000124overridden in derived classes. This method need not append LF or
Fred Drake1717ba41998-07-02 19:36:50 +0000125CR-LF, as comparison with the result ignores trailing whitespace.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000126\end{methoddesc}
127
128Finally, \class{MultiFile} instances have two public instance variables:
129
130\begin{memberdesc}{level}
Fred Drake1717ba41998-07-02 19:36:50 +0000131Nesting depth of the current part.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000132\end{memberdesc}
133
134\begin{memberdesc}{last}
Fred Drake1717ba41998-07-02 19:36:50 +0000135True if the last end-of-file was for an end-of-message marker.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000136\end{memberdesc}
137
Fred Drake1717ba41998-07-02 19:36:50 +0000138
Fred Draked795c5c1998-08-07 15:55:14 +0000139\subsection{\class{MultiFile} Example \label{multifile-example}}
Fred Drake9164f882000-04-08 04:53:29 +0000140\sectionauthor{Skip Montanaro}{skip@mojam.com}
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000141
142\begin{verbatim}
Fred Drakec2c46c32000-04-07 16:09:59 +0000143import mimetools
Martin v. Löwisd15a9422000-09-30 17:04:40 +0000144import multifile
Fred Drakec2c46c32000-04-07 16:09:59 +0000145import StringIO
146
147def extract_mime_part_matching(stream, mimetype):
148 """Return the first element in a multipart MIME message on stream
149 matching mimetype."""
150
151 msg = mimetools.Message(stream)
152 msgtype = msg.gettype()
153 params = msg.getplist()
154
155 data = StringIO.StringIO()
156 if msgtype[:10] == "multipart/":
157
158 file = multifile.MultiFile(stream)
159 file.push(msg.getparam("boundary"))
160 while file.next():
161 submsg = mimetools.Message(file)
162 try:
163 data = StringIO.StringIO()
164 mimetools.decode(file, data, submsg.getencoding())
165 except ValueError:
166 continue
167 if submsg.gettype() == mimetype:
168 break
169 file.pop()
170 return data.getvalue()
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000171\end{verbatim}