Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 1 | % Documentation by ESR |
| 2 | \section{Standard Module \module{multifile}} |
Fred Drake | b91e934 | 1998-07-23 17:59:49 +0000 | [diff] [blame] | 3 | \declaremodule[multifile]{standard}{multiFile} |
| 4 | |
| 5 | \modulesynopsis{None} |
| 6 | |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 7 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 8 | The \class{MultiFile} object enables you to treat sections of a text |
| 9 | file as file-like input objects, with \code{''} being returned by |
| 10 | \method{readline()} when a given delimiter pattern is encountered. The |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 11 | defaults of this class are designed to make it useful for parsing |
| 12 | MIME multipart messages, but by subclassing it and overriding methods |
| 13 | it can be easily adapted for more general use. |
| 14 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 15 | \begin{classdesc}{MultiFile}{fp\optional{, seekable}} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 16 | Create a multi-file. You must instantiate this class with an input |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 17 | object argument for the \class{MultiFile} instance to get lines from, |
| 18 | such as as a file object returned by \function{open()}. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 19 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 20 | \class{MultiFile} only ever looks at the input object's |
| 21 | \method{readline()}, \method{seek()} and \method{tell()} methods, and |
| 22 | the latter two are only needed if you want random access to the |
| 23 | individual MIME parts. To use \class{MultiFile} on a non-seekable |
| 24 | stream object, set the optional \var{seekable} argument to false; this |
| 25 | will prevent using the input object's \method{seek()} and |
| 26 | \method{tell()} methods. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 27 | \end{classdesc} |
| 28 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 29 | It will be useful to know that in \class{MultiFile}'s view of the world, text |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 30 | is composed of three kinds of lines: data, section-dividers, and |
| 31 | end-markers. MultiFile is designed to support parsing of |
| 32 | messages that may have multiple nested message parts, each with its |
| 33 | own pattern for section-divider and end-marker lines. |
| 34 | |
| 35 | \subsection{MultiFile Objects} |
| 36 | \label{MultiFile-objects} |
| 37 | |
| 38 | A \class{MultiFile} instance has the following methods: |
| 39 | |
| 40 | \begin{methoddesc}{push}{str} |
| 41 | Push a boundary string. When an appropriately decorated version of |
| 42 | this boundary is found as an input line, it will be interpreted as a |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 43 | section-divider or end-marker. All subsequent |
| 44 | reads will return the empty string to indicate end-of-file, until a |
| 45 | call to \method{pop()} removes the boundary a or \method{next()} call |
| 46 | reenables it. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 47 | |
| 48 | It is possible to push more than one boundary. Encountering the |
| 49 | most-recently-pushed boundary will return EOF; encountering any other |
| 50 | boundary will raise an error. |
| 51 | \end{methoddesc} |
| 52 | |
| 53 | \begin{methoddesc}{readline}{str} |
| 54 | Read a line. If the line is data (not a section-divider or end-marker |
| 55 | or real EOF) return it. If the line matches the most-recently-stacked |
Guido van Rossum | 8ec619f | 1998-06-30 16:35:25 +0000 | [diff] [blame] | 56 | boundary, return \code{''} and set \code{self.last} to 1 or 0 according as |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 57 | the match is or is not an end-marker. If the line matches any other |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 58 | stacked boundary, raise an error. On encountering end-of-file on the |
| 59 | underlying stream object, the method raises \exception{Error} unless |
| 60 | all boundaries have been popped. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 61 | \end{methoddesc} |
| 62 | |
| 63 | \begin{methoddesc}{readlines}{str} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 64 | Return all lines remaining in this part as a list of strings. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 65 | \end{methoddesc} |
| 66 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 67 | \begin{methoddesc}{read}{} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 68 | Read all lines, up to the next section. Return them as a single |
| 69 | (multiline) string. Note that this doesn't take a size argument! |
| 70 | \end{methoddesc} |
| 71 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 72 | \begin{methoddesc}{next}{} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 73 | Skip lines to the next section (that is, read lines until a |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 74 | section-divider or end-marker has been consumed). Return true if |
| 75 | there is such a section, false if an end-marker is seen. Re-enable |
| 76 | the most-recently-pushed boundary. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 77 | \end{methoddesc} |
| 78 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 79 | \begin{methoddesc}{pop}{} |
| 80 | Pop a section boundary. This boundary will no longer be interpreted |
| 81 | as EOF. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 82 | \end{methoddesc} |
| 83 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 84 | \begin{methoddesc}{seek}{pos\optional{, whence}} |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 85 | Seek. Seek indices are relative to the start of the current section. |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 86 | The \var{pos} and \var{whence} arguments are interpreted as for a file |
| 87 | seek. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 88 | \end{methoddesc} |
| 89 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 90 | \begin{methoddesc}{tell}{} |
| 91 | Return the file position relative to the start of the current section. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 92 | \end{methoddesc} |
| 93 | |
| 94 | \begin{methoddesc}{is_data}{str} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 95 | Return true if \var{str} is data and false if it might be a section |
| 96 | boundary. As written, it tests for a prefix other than \code{'--'} at |
| 97 | start of line (which all MIME boundaries have) but it is declared so |
| 98 | it can be overridden in derived classes. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 99 | |
| 100 | Note that this test is used intended as a fast guard for the real |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 101 | boundary tests; if it always returns false it will merely slow |
| 102 | processing, not cause it to fail. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 103 | \end{methoddesc} |
| 104 | |
| 105 | \begin{methoddesc}{section_divider}{str} |
| 106 | Turn a boundary into a section-divider line. By default, this |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 107 | method prepends \code{'--'} (which MIME section boundaries have) but |
| 108 | it is declared so it can be overridden in derived classes. This |
| 109 | method need not append LF or CR-LF, as comparison with the result |
| 110 | ignores trailing whitespace. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 111 | \end{methoddesc} |
| 112 | |
| 113 | \begin{methoddesc}{end_marker}{str} |
| 114 | Turn a boundary string into an end-marker line. By default, this |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 115 | method prepends \code{'--'} and appends \code{'--'} (like a |
| 116 | MIME-multipart end-of-message marker) but it is declared so it can be |
| 117 | be overridden in derived classes. This method need not append LF or |
| 118 | CR-LF, as comparison with the result ignores trailing whitespace. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 119 | \end{methoddesc} |
| 120 | |
| 121 | Finally, \class{MultiFile} instances have two public instance variables: |
| 122 | |
| 123 | \begin{memberdesc}{level} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 124 | Nesting depth of the current part. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 125 | \end{memberdesc} |
| 126 | |
| 127 | \begin{memberdesc}{last} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 128 | True if the last end-of-file was for an end-of-message marker. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 129 | \end{memberdesc} |
| 130 | |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 131 | |
| 132 | \subsection{\class{Multifile} Example} |
| 133 | \label{multifile-example} |
| 134 | |
| 135 | % This is almost unreadable; should be re-written when someone gets time. |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 136 | |
| 137 | \begin{verbatim} |
Fred Drake | 1717ba4 | 1998-07-02 19:36:50 +0000 | [diff] [blame] | 138 | fp = MultiFile(sys.stdin, 0) |
| 139 | fp.push(outer_boundary) |
| 140 | message1 = fp.readlines() |
| 141 | # We should now be either at real EOF or stopped on a message |
| 142 | # boundary. Re-enable the outer boundary. |
| 143 | fp.next() |
| 144 | # Read another message with the same delimiter |
| 145 | message2 = fp.readlines() |
| 146 | # Re-enable that delimiter again |
| 147 | fp.next() |
| 148 | # Now look for a message subpart with a different boundary |
| 149 | fp.push(inner_boundary) |
| 150 | sub_header = fp.readlines() |
| 151 | # If no exception has been thrown, we're looking at the start of |
| 152 | # the message subpart. Reset and grab the subpart |
| 153 | fp.next() |
| 154 | sub_body = fp.readlines() |
| 155 | # Got it. Now pop the inner boundary to re-enable the outer one. |
| 156 | fp.pop() |
| 157 | # Read to next outer boundary |
| 158 | message3 = fp.readlines() |
Guido van Rossum | 8668e8e | 1998-06-28 17:55:53 +0000 | [diff] [blame] | 159 | \end{verbatim} |