blob: ea92cdc62d3344ab38344f0b937ba80f1120a653 [file] [log] [blame]
Guido van Rossum8668e8e1998-06-28 17:55:53 +00001% Documentation by ESR
2\section{Standard Module \module{multifile}}
3\stmodindex{multiFile}
4\label{module-multifile}
5
Fred Drake1717ba41998-07-02 19:36:50 +00006The \class{MultiFile} object enables you to treat sections of a text
7file as file-like input objects, with \code{''} being returned by
8\method{readline()} when a given delimiter pattern is encountered. The
Guido van Rossum8668e8e1998-06-28 17:55:53 +00009defaults of this class are designed to make it useful for parsing
10MIME multipart messages, but by subclassing it and overriding methods
11it can be easily adapted for more general use.
12
Fred Drake1717ba41998-07-02 19:36:50 +000013\begin{classdesc}{MultiFile}{fp\optional{, seekable}}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000014Create a multi-file. You must instantiate this class with an input
Fred Drake1717ba41998-07-02 19:36:50 +000015object argument for the \class{MultiFile} instance to get lines from,
16such as as a file object returned by \function{open()}.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000017
Fred Drake1717ba41998-07-02 19:36:50 +000018\class{MultiFile} only ever looks at the input object's
19\method{readline()}, \method{seek()} and \method{tell()} methods, and
20the latter two are only needed if you want random access to the
21individual MIME parts. To use \class{MultiFile} on a non-seekable
22stream object, set the optional \var{seekable} argument to false; this
23will prevent using the input object's \method{seek()} and
24\method{tell()} methods.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000025\end{classdesc}
26
Fred Drake1717ba41998-07-02 19:36:50 +000027It will be useful to know that in \class{MultiFile}'s view of the world, text
Guido van Rossum8668e8e1998-06-28 17:55:53 +000028is composed of three kinds of lines: data, section-dividers, and
29end-markers. MultiFile is designed to support parsing of
30messages that may have multiple nested message parts, each with its
31own pattern for section-divider and end-marker lines.
32
33\subsection{MultiFile Objects}
34\label{MultiFile-objects}
35
36A \class{MultiFile} instance has the following methods:
37
38\begin{methoddesc}{push}{str}
39Push a boundary string. When an appropriately decorated version of
40this boundary is found as an input line, it will be interpreted as a
Fred Drake1717ba41998-07-02 19:36:50 +000041section-divider or end-marker. All subsequent
42reads will return the empty string to indicate end-of-file, until a
43call to \method{pop()} removes the boundary a or \method{next()} call
44reenables it.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000045
46It is possible to push more than one boundary. Encountering the
47most-recently-pushed boundary will return EOF; encountering any other
48boundary will raise an error.
49\end{methoddesc}
50
51\begin{methoddesc}{readline}{str}
52Read a line. If the line is data (not a section-divider or end-marker
53or real EOF) return it. If the line matches the most-recently-stacked
Guido van Rossum8ec619f1998-06-30 16:35:25 +000054boundary, return \code{''} and set \code{self.last} to 1 or 0 according as
Guido van Rossum8668e8e1998-06-28 17:55:53 +000055the match is or is not an end-marker. If the line matches any other
Fred Drake1717ba41998-07-02 19:36:50 +000056stacked boundary, raise an error. On encountering end-of-file on the
57underlying stream object, the method raises \exception{Error} unless
58all boundaries have been popped.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000059\end{methoddesc}
60
61\begin{methoddesc}{readlines}{str}
Fred Drake1717ba41998-07-02 19:36:50 +000062Return all lines remaining in this part as a list of strings.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000063\end{methoddesc}
64
Fred Drake1717ba41998-07-02 19:36:50 +000065\begin{methoddesc}{read}{}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000066Read all lines, up to the next section. Return them as a single
67(multiline) string. Note that this doesn't take a size argument!
68\end{methoddesc}
69
Fred Drake1717ba41998-07-02 19:36:50 +000070\begin{methoddesc}{next}{}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000071Skip lines to the next section (that is, read lines until a
Fred Drake1717ba41998-07-02 19:36:50 +000072section-divider or end-marker has been consumed). Return true if
73there is such a section, false if an end-marker is seen. Re-enable
74the most-recently-pushed boundary.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000075\end{methoddesc}
76
Fred Drake1717ba41998-07-02 19:36:50 +000077\begin{methoddesc}{pop}{}
78Pop a section boundary. This boundary will no longer be interpreted
79as EOF.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000080\end{methoddesc}
81
Fred Drake1717ba41998-07-02 19:36:50 +000082\begin{methoddesc}{seek}{pos\optional{, whence}}
Guido van Rossum8668e8e1998-06-28 17:55:53 +000083Seek. Seek indices are relative to the start of the current section.
Fred Drake1717ba41998-07-02 19:36:50 +000084The \var{pos} and \var{whence} arguments are interpreted as for a file
85seek.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000086\end{methoddesc}
87
Fred Drake1717ba41998-07-02 19:36:50 +000088\begin{methoddesc}{tell}{}
89Return the file position relative to the start of the current section.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000090\end{methoddesc}
91
92\begin{methoddesc}{is_data}{str}
Fred Drake1717ba41998-07-02 19:36:50 +000093Return true if \var{str} is data and false if it might be a section
94boundary. As written, it tests for a prefix other than \code{'--'} at
95start of line (which all MIME boundaries have) but it is declared so
96it can be overridden in derived classes.
Guido van Rossum8668e8e1998-06-28 17:55:53 +000097
98Note that this test is used intended as a fast guard for the real
Fred Drake1717ba41998-07-02 19:36:50 +000099boundary tests; if it always returns false it will merely slow
100processing, not cause it to fail.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000101\end{methoddesc}
102
103\begin{methoddesc}{section_divider}{str}
104Turn a boundary into a section-divider line. By default, this
Fred Drake1717ba41998-07-02 19:36:50 +0000105method prepends \code{'--'} (which MIME section boundaries have) but
106it is declared so it can be overridden in derived classes. This
107method need not append LF or CR-LF, as comparison with the result
108ignores trailing whitespace.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000109\end{methoddesc}
110
111\begin{methoddesc}{end_marker}{str}
112Turn a boundary string into an end-marker line. By default, this
Fred Drake1717ba41998-07-02 19:36:50 +0000113method prepends \code{'--'} and appends \code{'--'} (like a
114MIME-multipart end-of-message marker) but it is declared so it can be
115be overridden in derived classes. This method need not append LF or
116CR-LF, as comparison with the result ignores trailing whitespace.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000117\end{methoddesc}
118
119Finally, \class{MultiFile} instances have two public instance variables:
120
121\begin{memberdesc}{level}
Fred Drake1717ba41998-07-02 19:36:50 +0000122Nesting depth of the current part.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000123\end{memberdesc}
124
125\begin{memberdesc}{last}
Fred Drake1717ba41998-07-02 19:36:50 +0000126True if the last end-of-file was for an end-of-message marker.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000127\end{memberdesc}
128
Fred Drake1717ba41998-07-02 19:36:50 +0000129
130\subsection{\class{Multifile} Example}
131\label{multifile-example}
132
133% This is almost unreadable; should be re-written when someone gets time.
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000134
135\begin{verbatim}
Fred Drake1717ba41998-07-02 19:36:50 +0000136fp = MultiFile(sys.stdin, 0)
137fp.push(outer_boundary)
138message1 = fp.readlines()
139# We should now be either at real EOF or stopped on a message
140# boundary. Re-enable the outer boundary.
141fp.next()
142# Read another message with the same delimiter
143message2 = fp.readlines()
144# Re-enable that delimiter again
145fp.next()
146# Now look for a message subpart with a different boundary
147fp.push(inner_boundary)
148sub_header = fp.readlines()
149# If no exception has been thrown, we're looking at the start of
150# the message subpart. Reset and grab the subpart
151fp.next()
152sub_body = fp.readlines()
153# Got it. Now pop the inner boundary to re-enable the outer one.
154fp.pop()
155# Read to next outer boundary
156message3 = fp.readlines()
Guido van Rossum8668e8e1998-06-28 17:55:53 +0000157\end{verbatim}