Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`multifile` --- Support for files containing distinct parts |
| 3 | ================================================================ |
| 4 | |
| 5 | .. module:: multifile |
| 6 | :synopsis: Support for reading files which contain distinct parts, such as some MIME data. |
Georg Brandl | 7f758c4 | 2007-08-15 18:41:25 +0000 | [diff] [blame] | 7 | :deprecated: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 8 | .. sectionauthor:: Eric S. Raymond <esr@snark.thyrsus.com> |
| 9 | |
| 10 | |
| 11 | .. deprecated:: 2.5 |
| 12 | The :mod:`email` package should be used in preference to the :mod:`multifile` |
| 13 | module. This module is present only to maintain backward compatibility. |
| 14 | |
| 15 | The :class:`MultiFile` object enables you to treat sections of a text file as |
| 16 | file-like input objects, with ``''`` being returned by :meth:`readline` when a |
| 17 | given delimiter pattern is encountered. The defaults of this class are designed |
| 18 | to make it useful for parsing MIME multipart messages, but by subclassing it and |
| 19 | overriding methods it can be easily adapted for more general use. |
| 20 | |
| 21 | |
| 22 | .. class:: MultiFile(fp[, seekable]) |
| 23 | |
| 24 | Create a multi-file. You must instantiate this class with an input object |
| 25 | argument for the :class:`MultiFile` instance to get lines from, such as a file |
| 26 | object returned by :func:`open`. |
| 27 | |
| 28 | :class:`MultiFile` only ever looks at the input object's :meth:`readline`, |
| 29 | :meth:`seek` and :meth:`tell` methods, and the latter two are only needed if you |
| 30 | want random access to the individual MIME parts. To use :class:`MultiFile` on a |
| 31 | non-seekable stream object, set the optional *seekable* argument to false; this |
| 32 | will prevent using the input object's :meth:`seek` and :meth:`tell` methods. |
| 33 | |
| 34 | It will be useful to know that in :class:`MultiFile`'s view of the world, text |
| 35 | is composed of three kinds of lines: data, section-dividers, and end-markers. |
| 36 | MultiFile is designed to support parsing of messages that may have multiple |
| 37 | nested message parts, each with its own pattern for section-divider and |
| 38 | end-marker lines. |
| 39 | |
| 40 | |
| 41 | .. seealso:: |
| 42 | |
| 43 | Module :mod:`email` |
| 44 | Comprehensive email handling package; supersedes the :mod:`multifile` module. |
| 45 | |
| 46 | |
| 47 | .. _multifile-objects: |
| 48 | |
| 49 | MultiFile Objects |
| 50 | ----------------- |
| 51 | |
| 52 | A :class:`MultiFile` instance has the following methods: |
| 53 | |
| 54 | |
| 55 | .. method:: MultiFile.readline(str) |
| 56 | |
| 57 | Read a line. If the line is data (not a section-divider or end-marker or real |
| 58 | EOF) return it. If the line matches the most-recently-stacked boundary, return |
| 59 | ``''`` and set ``self.last`` to 1 or 0 according as the match is or is not an |
| 60 | end-marker. If the line matches any other stacked boundary, raise an error. On |
| 61 | encountering end-of-file on the underlying stream object, the method raises |
| 62 | :exc:`Error` unless all boundaries have been popped. |
| 63 | |
| 64 | |
| 65 | .. method:: MultiFile.readlines(str) |
| 66 | |
| 67 | Return all lines remaining in this part as a list of strings. |
| 68 | |
| 69 | |
| 70 | .. method:: MultiFile.read() |
| 71 | |
| 72 | Read all lines, up to the next section. Return them as a single (multiline) |
| 73 | string. Note that this doesn't take a size argument! |
| 74 | |
| 75 | |
| 76 | .. method:: MultiFile.seek(pos[, whence]) |
| 77 | |
| 78 | Seek. Seek indices are relative to the start of the current section. The *pos* |
| 79 | and *whence* arguments are interpreted as for a file seek. |
| 80 | |
| 81 | |
| 82 | .. method:: MultiFile.tell() |
| 83 | |
| 84 | Return the file position relative to the start of the current section. |
| 85 | |
| 86 | |
| 87 | .. method:: MultiFile.next() |
| 88 | |
| 89 | Skip lines to the next section (that is, read lines until a section-divider or |
| 90 | end-marker has been consumed). Return true if there is such a section, false if |
| 91 | an end-marker is seen. Re-enable the most-recently-pushed boundary. |
| 92 | |
| 93 | |
| 94 | .. method:: MultiFile.is_data(str) |
| 95 | |
| 96 | Return true if *str* is data and false if it might be a section boundary. As |
| 97 | written, it tests for a prefix other than ``'-``\ ``-'`` at start of line (which |
| 98 | all MIME boundaries have) but it is declared so it can be overridden in derived |
| 99 | classes. |
| 100 | |
| 101 | Note that this test is used intended as a fast guard for the real boundary |
| 102 | tests; if it always returns false it will merely slow processing, not cause it |
| 103 | to fail. |
| 104 | |
| 105 | |
| 106 | .. method:: MultiFile.push(str) |
| 107 | |
| 108 | Push a boundary string. When a decorated version of this boundary is found as |
| 109 | an input line, it will be interpreted as a section-divider or end-marker |
| 110 | (depending on the decoration, see :rfc:`2045`). All subsequent reads will |
| 111 | return the empty string to indicate end-of-file, until a call to :meth:`pop` |
Georg Brandl | 0dfdf00 | 2009-10-27 14:36:50 +0000 | [diff] [blame] | 112 | removes the boundary a or :meth:`.next` call reenables it. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 113 | |
| 114 | It is possible to push more than one boundary. Encountering the |
| 115 | most-recently-pushed boundary will return EOF; encountering any other |
| 116 | boundary will raise an error. |
| 117 | |
| 118 | |
| 119 | .. method:: MultiFile.pop() |
| 120 | |
| 121 | Pop a section boundary. This boundary will no longer be interpreted as EOF. |
| 122 | |
| 123 | |
| 124 | .. method:: MultiFile.section_divider(str) |
| 125 | |
| 126 | Turn a boundary into a section-divider line. By default, this method |
| 127 | prepends ``'--'`` (which MIME section boundaries have) but it is declared so |
| 128 | it can be overridden in derived classes. This method need not append LF or |
| 129 | CR-LF, as comparison with the result ignores trailing whitespace. |
| 130 | |
| 131 | |
| 132 | .. method:: MultiFile.end_marker(str) |
| 133 | |
| 134 | Turn a boundary string into an end-marker line. By default, this method |
| 135 | prepends ``'--'`` and appends ``'--'`` (like a MIME-multipart end-of-message |
| 136 | marker) but it is declared so it can be overridden in derived classes. This |
| 137 | method need not append LF or CR-LF, as comparison with the result ignores |
| 138 | trailing whitespace. |
| 139 | |
| 140 | Finally, :class:`MultiFile` instances have two public instance variables: |
| 141 | |
| 142 | |
| 143 | .. attribute:: MultiFile.level |
| 144 | |
| 145 | Nesting depth of the current part. |
| 146 | |
| 147 | |
| 148 | .. attribute:: MultiFile.last |
| 149 | |
| 150 | True if the last end-of-file was for an end-of-message marker. |
| 151 | |
| 152 | |
| 153 | .. _multifile-example: |
| 154 | |
| 155 | :class:`MultiFile` Example |
| 156 | -------------------------- |
| 157 | |
Skip Montanaro | 5466246 | 2007-12-08 15:26:16 +0000 | [diff] [blame] | 158 | .. sectionauthor:: Skip Montanaro <skip@pobox.com> |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 159 | |
| 160 | |
| 161 | :: |
| 162 | |
| 163 | import mimetools |
| 164 | import multifile |
| 165 | import StringIO |
| 166 | |
| 167 | def extract_mime_part_matching(stream, mimetype): |
| 168 | """Return the first element in a multipart MIME message on stream |
| 169 | matching mimetype.""" |
| 170 | |
| 171 | msg = mimetools.Message(stream) |
| 172 | msgtype = msg.gettype() |
| 173 | params = msg.getplist() |
| 174 | |
| 175 | data = StringIO.StringIO() |
| 176 | if msgtype[:10] == "multipart/": |
| 177 | |
| 178 | file = multifile.MultiFile(stream) |
| 179 | file.push(msg.getparam("boundary")) |
| 180 | while file.next(): |
| 181 | submsg = mimetools.Message(file) |
| 182 | try: |
| 183 | data = StringIO.StringIO() |
| 184 | mimetools.decode(file, data, submsg.getencoding()) |
| 185 | except ValueError: |
| 186 | continue |
| 187 | if submsg.gettype() == mimetype: |
| 188 | break |
| 189 | file.pop() |
| 190 | return data.getvalue() |
| 191 | |