R David Murray | c27e522 | 2012-05-25 15:01:48 -0400 | [diff] [blame] | 1 | :mod:`email` Package Architecture |
| 2 | ================================= |
| 3 | |
| 4 | Overview |
| 5 | -------- |
| 6 | |
| 7 | The email package consists of three major components: |
| 8 | |
| 9 | Model |
| 10 | An object structure that represents an email message, and provides an |
| 11 | API for creating, querying, and modifying a message. |
| 12 | |
| 13 | Parser |
| 14 | Takes a sequence of characters or bytes and produces a model of the |
| 15 | email message represented by those characters or bytes. |
| 16 | |
| 17 | Generator |
| 18 | Takes a model and turns it into a sequence of characters or bytes. The |
| 19 | sequence can either be intended for human consumption (a printable |
| 20 | unicode string) or bytes suitable for transmission over the wire. In |
| 21 | the latter case all data is properly encoded using the content transfer |
| 22 | encodings specified by the relevant RFCs. |
| 23 | |
| 24 | Conceptually the package is organized around the model. The model provides both |
| 25 | "external" APIs intended for use by application programs using the library, |
| 26 | and "internal" APIs intended for use by the Parser and Generator components. |
Ezio Melotti | 2af76da | 2013-08-10 18:47:07 +0300 | [diff] [blame] | 27 | This division is intentionally a bit fuzzy; the API described by this |
| 28 | documentation is all a public, stable API. This allows for an application |
| 29 | with special needs to implement its own parser and/or generator. |
R David Murray | c27e522 | 2012-05-25 15:01:48 -0400 | [diff] [blame] | 30 | |
| 31 | In addition to the three major functional components, there is a third key |
| 32 | component to the architecture: |
| 33 | |
| 34 | Policy |
| 35 | An object that specifies various behavioral settings and carries |
| 36 | implementations of various behavior-controlling methods. |
| 37 | |
| 38 | The Policy framework provides a simple and convenient way to control the |
| 39 | behavior of the library, making it possible for the library to be used in a |
| 40 | very flexible fashion while leveraging the common code required to parse, |
| 41 | represent, and generate message-like objects. For example, in addition to the |
| 42 | default :rfc:`5322` email message policy, we also have a policy that manages |
| 43 | HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy |
| 44 | controls, such as the maximum line length produced by the generator, can also |
| 45 | be controlled individually to meet specialized application requirements. |
| 46 | |
| 47 | |
| 48 | The Model |
| 49 | --------- |
| 50 | |
| 51 | The message model is implemented by the :class:`~email.message.Message` class. |
| 52 | The model divides a message into the two fundamental parts discussed by the |
| 53 | RFC: the header section and the body. The `Message` object acts as a |
| 54 | pseudo-dictionary of named headers. Its dictionary interface provides |
| 55 | convenient access to individual headers by name. However, all headers are kept |
| 56 | internally in an ordered list, so that the information about the order of the |
| 57 | headers in the original message is preserved. |
| 58 | |
| 59 | The `Message` object also has a `payload` that holds the body. A `payload` can |
| 60 | be one of two things: data, or a list of `Message` objects. The latter is used |
| 61 | to represent a multipart MIME message. Lists can be nested arbitrarily deeply |
| 62 | in order to represent the message, with all terminal leaves having non-list |
| 63 | data payloads. |
| 64 | |
| 65 | |
| 66 | Message Lifecycle |
| 67 | ----------------- |
| 68 | |
delirious-lettuce | 3378b20 | 2017-05-19 14:37:57 -0600 | [diff] [blame] | 69 | The general lifecycle of a message is: |
R David Murray | c27e522 | 2012-05-25 15:01:48 -0400 | [diff] [blame] | 70 | |
| 71 | Creation |
| 72 | A `Message` object can be created by a Parser, or it can be |
| 73 | instantiated as an empty message by an application. |
| 74 | |
| 75 | Manipulation |
| 76 | The application may examine one or more headers, and/or the |
| 77 | payload, and it may modify one or more headers and/or |
| 78 | the payload. This may be done on the top level `Message` |
| 79 | object, or on any sub-object. |
| 80 | |
| 81 | Finalization |
| 82 | The Model is converted into a unicode or binary stream, |
| 83 | or the model is discarded. |
| 84 | |
| 85 | |
| 86 | |
| 87 | Header Policy Control During Lifecycle |
| 88 | -------------------------------------- |
| 89 | |
| 90 | One of the major controls exerted by the Policy is the management of headers |
| 91 | during the `Message` lifecycle. Most applications don't need to be aware of |
| 92 | this. |
| 93 | |
| 94 | A header enters the model in one of two ways: via a Parser, or by being set to |
| 95 | a specific value by an application program after the Model already exists. |
| 96 | Similarly, a header exits the model in one of two ways: by being serialized by |
| 97 | a Generator, or by being retrieved from a Model by an application program. The |
| 98 | Policy object provides hooks for all four of these pathways. |
| 99 | |
| 100 | The model storage for headers is a list of (name, value) tuples. |
| 101 | |
| 102 | The Parser identifies headers during parsing, and passes them to the |
| 103 | :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The |
| 104 | result of that method is the (name, value) tuple to be stored in the model. |
| 105 | |
| 106 | When an application program supplies a header value (for example, through the |
| 107 | `Message` object `__setitem__` interface), the name and the value are passed to |
| 108 | the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which |
| 109 | returns the (name, value) tuple to be stored in the model. |
| 110 | |
| 111 | When an application program retrieves a header (through any of the dict or list |
| 112 | interfaces of `Message`), the name and value are passed to the |
| 113 | :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to |
| 114 | obtain the value returned to the application. |
| 115 | |
| 116 | When a Generator requests a header during serialization, the name and value are |
| 117 | passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which |
| 118 | returns a string containing line breaks in the appropriate places. The |
| 119 | :meth:`~email.policy.Policy.cte_type` Policy control determines whether or |
| 120 | not Content Transfer Encoding is performed on the data in the header. There is |
| 121 | also a :meth:`~email.policy.Policy.binary_fold` method for use by generators |
| 122 | that produce binary output, which returns the folded header as binary data, |
| 123 | possibly folded at different places than the corresponding string would be. |
| 124 | |
| 125 | |
| 126 | Handling Binary Data |
| 127 | -------------------- |
| 128 | |
| 129 | In an ideal world all message data would conform to the RFCs, meaning that the |
| 130 | parser could decode the message into the idealized unicode message that the |
| 131 | sender originally wrote. In the real world, the email package must also be |
| 132 | able to deal with badly formatted messages, including messages containing |
| 133 | non-ASCII characters that either have no indicated character set or are not |
| 134 | valid characters in the indicated character set. |
| 135 | |
| 136 | Since email messages are *primarily* text data, and operations on message data |
| 137 | are primarily text operations (except for binary payloads of course), the model |
| 138 | stores all text data as unicode strings. Un-decodable binary inside text |
| 139 | data is handled by using the `surrogateescape` error handler of the ASCII |
| 140 | codec. As with the binary filenames the error handler was introduced to |
| 141 | handle, this allows the email package to "carry" the binary data received |
| 142 | during parsing along until the output stage, at which time it is regenerated |
| 143 | in its original form. |
| 144 | |
| 145 | This carried binary data is almost entirely an implementation detail. The one |
| 146 | place where it is visible in the API is in the "internal" API. A Parser must |
| 147 | do the `surrogateescape` encoding of binary input data, and pass that data to |
| 148 | the appropriate Policy method. The "internal" interface used by the Generator |
| 149 | to access header values preserves the `surrogateescaped` bytes. All other |
| 150 | interfaces convert the binary data either back into bytes or into a safe form |
| 151 | (losing information in some cases). |
| 152 | |
| 153 | |
| 154 | Backward Compatibility |
| 155 | ---------------------- |
| 156 | |
| 157 | The :class:`~email.policy.Policy.Compat32` Policy provides backward |
| 158 | compatibility with version 5.1 of the email package. It does this via the |
| 159 | following implementation of the four+1 Policy methods described above: |
| 160 | |
| 161 | header_source_parse |
| 162 | Splits the first line on the colon to obtain the name, discards any spaces |
| 163 | after the colon, and joins the remainder of the line with all of the |
| 164 | remaining lines, preserving the linesep characters to obtain the value. |
| 165 | Trailing carriage return and/or linefeed characters are stripped from the |
| 166 | resulting value string. |
| 167 | |
| 168 | header_store_parse |
| 169 | Returns the name and value exactly as received from the application. |
| 170 | |
| 171 | header_fetch_parse |
| 172 | If the value contains any `surrogateescaped` binary data, return the value |
| 173 | as a :class:`~email.header.Header` object, using the character set |
| 174 | `unknown-8bit`. Otherwise just returns the value. |
| 175 | |
| 176 | fold |
| 177 | Uses :class:`~email.header.Header`'s folding to fold headers in the |
| 178 | same way the email5.1 generator did. |
| 179 | |
| 180 | binary_fold |
| 181 | Same as fold, but encodes to 'ascii'. |
| 182 | |
| 183 | |
| 184 | New Algorithm |
| 185 | ------------- |
| 186 | |
| 187 | header_source_parse |
| 188 | Same as legacy behavior. |
| 189 | |
| 190 | header_store_parse |
| 191 | Same as legacy behavior. |
| 192 | |
| 193 | header_fetch_parse |
| 194 | If the value is already a header object, returns it. Otherwise, parses the |
| 195 | value using the new parser, and returns the resulting object as the value. |
| 196 | `surrogateescaped` bytes get turned into unicode unknown character code |
| 197 | points. |
| 198 | |
| 199 | fold |
| 200 | Uses the new header folding algorithm, respecting the policy settings. |
| 201 | surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for |
| 202 | ``cte_type=7bit`` or ``8bit``. Returns a string. |
| 203 | |
| 204 | At some point there will also be a ``cte_type=unicode``, and for that |
| 205 | policy fold will serialize the idealized unicode message with RFC-like |
| 206 | folding, converting any surrogateescaped bytes into the unicode |
| 207 | unknown character glyph. |
| 208 | |
| 209 | binary_fold |
| 210 | Uses the new header folding algorithm, respecting the policy settings. |
| 211 | surrogateescaped bytes are encoded using the `unknown-8bit` charset for |
| 212 | ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. |
| 213 | Returns bytes. |
| 214 | |
| 215 | At some point there will also be a ``cte_type=unicode``, and for that |
| 216 | policy binary_fold will serialize the message according to :rfc:``5335``. |