| R David Murray | c27e522 | 2012-05-25 15:01:48 -0400 | [diff] [blame] | 1 | :mod:`email` Package Architecture | 
 | 2 | ================================= | 
 | 3 |  | 
 | 4 | Overview | 
 | 5 | -------- | 
 | 6 |  | 
 | 7 | The email package consists of three major components: | 
 | 8 |  | 
 | 9 |     Model | 
 | 10 |         An object structure that represents an email message, and provides an | 
 | 11 |         API for creating, querying, and modifying a message. | 
 | 12 |  | 
 | 13 |     Parser | 
 | 14 |         Takes a sequence of characters or bytes and produces a model of the | 
 | 15 |         email message represented by those characters or bytes. | 
 | 16 |  | 
 | 17 |     Generator | 
 | 18 |         Takes a model and turns it into a sequence of characters or bytes.  The | 
 | 19 |         sequence can either be intended for human consumption (a printable | 
 | 20 |         unicode string) or bytes suitable for transmission over the wire.  In | 
 | 21 |         the latter case all data is properly encoded using the content transfer | 
 | 22 |         encodings specified by the relevant RFCs. | 
 | 23 |  | 
 | 24 | Conceptually the package is organized around the model.  The model provides both | 
 | 25 | "external" APIs intended for use by application programs using the library, | 
 | 26 | and "internal" APIs intended for use by the Parser and Generator components. | 
 | 27 | This division is intentionally a bit fuzy; the API described by this documentation | 
 | 28 | is all a public, stable API.  This allows for an application with special needs | 
 | 29 | to implement its own parser and/or generator. | 
 | 30 |  | 
 | 31 | In addition to the three major functional components, there is a third key | 
 | 32 | component to the architecture: | 
 | 33 |  | 
 | 34 |     Policy | 
 | 35 |         An object that specifies various behavioral settings and carries | 
 | 36 |         implementations of various behavior-controlling methods. | 
 | 37 |  | 
 | 38 | The Policy framework provides a simple and convenient way to control the | 
 | 39 | behavior of the library, making it possible for the library to be used in a | 
 | 40 | very flexible fashion while leveraging the common code required to parse, | 
 | 41 | represent, and generate message-like objects.  For example, in addition to the | 
 | 42 | default :rfc:`5322` email message policy, we also have a policy that manages | 
 | 43 | HTTP headers in a fashion compliant with :rfc:`2616`.  Individual policy | 
 | 44 | controls, such as the maximum line length produced by the generator, can also | 
 | 45 | be controlled individually to meet specialized application requirements. | 
 | 46 |  | 
 | 47 |  | 
 | 48 | The Model | 
 | 49 | --------- | 
 | 50 |  | 
 | 51 | The message model is implemented by the :class:`~email.message.Message` class. | 
 | 52 | The model divides a message into the two fundamental parts discussed by the | 
 | 53 | RFC: the header section and the body.  The `Message` object acts as a | 
 | 54 | pseudo-dictionary of named headers.  Its dictionary interface provides | 
 | 55 | convenient access to individual headers by name.  However, all headers are kept | 
 | 56 | internally in an ordered list, so that the information about the order of the | 
 | 57 | headers in the original message is preserved. | 
 | 58 |  | 
 | 59 | The `Message` object also has a `payload` that holds the body.  A `payload` can | 
 | 60 | be one of two things: data, or a list of `Message` objects.  The latter is used | 
 | 61 | to represent a multipart MIME message.  Lists can be nested arbitrarily deeply | 
 | 62 | in order to represent the message, with all terminal leaves having non-list | 
 | 63 | data payloads. | 
 | 64 |  | 
 | 65 |  | 
 | 66 | Message Lifecycle | 
 | 67 | ----------------- | 
 | 68 |  | 
 | 69 | The general lifecyle of a message is: | 
 | 70 |  | 
 | 71 |     Creation | 
 | 72 |         A `Message` object can be created by a Parser, or it can be | 
 | 73 |         instantiated as an empty message by an application. | 
 | 74 |  | 
 | 75 |     Manipulation | 
 | 76 |         The application may examine one or more headers, and/or the | 
 | 77 |         payload, and it may modify one or more headers and/or | 
 | 78 |         the payload.  This may be done on the top level `Message` | 
 | 79 |         object, or on any sub-object. | 
 | 80 |  | 
 | 81 |     Finalization | 
 | 82 |         The Model is converted into a unicode or binary stream, | 
 | 83 |         or the model is discarded. | 
 | 84 |  | 
 | 85 |  | 
 | 86 |  | 
 | 87 | Header Policy Control During Lifecycle | 
 | 88 | -------------------------------------- | 
 | 89 |  | 
 | 90 | One of the major controls exerted by the Policy is the management of headers | 
 | 91 | during the `Message` lifecycle.  Most applications don't need to be aware of | 
 | 92 | this. | 
 | 93 |  | 
 | 94 | A header enters the model in one of two ways: via a Parser, or by being set to | 
 | 95 | a specific value by an application program after the Model already exists. | 
 | 96 | Similarly, a header exits the model in one of two ways: by being serialized by | 
 | 97 | a Generator, or by being retrieved from a Model by an application program.  The | 
 | 98 | Policy object provides hooks for all four of these pathways. | 
 | 99 |  | 
 | 100 | The model storage for headers is a list of (name, value) tuples. | 
 | 101 |  | 
 | 102 | The Parser identifies headers during parsing, and passes them to the | 
 | 103 | :meth:`~email.policy.Policy.header_source_parse` method of the Policy.  The | 
 | 104 | result of that method is the (name, value) tuple to be stored in the model. | 
 | 105 |  | 
 | 106 | When an application program supplies a header value (for example, through the | 
 | 107 | `Message` object `__setitem__` interface), the name and the value are passed to | 
 | 108 | the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which | 
 | 109 | returns the (name, value) tuple to be stored in the model. | 
 | 110 |  | 
 | 111 | When an application program retrieves a header (through any of the dict or list | 
 | 112 | interfaces of `Message`), the name and value are passed to the | 
 | 113 | :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to | 
 | 114 | obtain the value returned to the application. | 
 | 115 |  | 
 | 116 | When a Generator requests a header during serialization, the name and value are | 
 | 117 | passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which | 
 | 118 | returns a string containing line breaks in the appropriate places.  The | 
 | 119 | :meth:`~email.policy.Policy.cte_type` Policy control determines whether or | 
 | 120 | not Content Transfer Encoding is performed on the data in the header.  There is | 
 | 121 | also a :meth:`~email.policy.Policy.binary_fold` method for use by generators | 
 | 122 | that produce binary output, which returns the folded header as binary data, | 
 | 123 | possibly folded at different places than the corresponding string would be. | 
 | 124 |  | 
 | 125 |  | 
 | 126 | Handling Binary Data | 
 | 127 | -------------------- | 
 | 128 |  | 
 | 129 | In an ideal world all message data would conform to the RFCs, meaning that the | 
 | 130 | parser could decode the message into the idealized unicode message that the | 
 | 131 | sender originally wrote.  In the real world, the email package must also be | 
 | 132 | able to deal with badly formatted messages, including messages containing | 
 | 133 | non-ASCII characters that either have no indicated character set or are not | 
 | 134 | valid characters in the indicated character set. | 
 | 135 |  | 
 | 136 | Since email messages are *primarily* text data, and operations on message data | 
 | 137 | are primarily text operations (except for binary payloads of course), the model | 
 | 138 | stores all text data as unicode strings.  Un-decodable binary inside text | 
 | 139 | data is handled by using the `surrogateescape` error handler of the ASCII | 
 | 140 | codec.  As with the binary filenames the error handler was introduced to | 
 | 141 | handle, this allows the email package to "carry" the binary data received | 
 | 142 | during parsing along until the output stage, at which time it is regenerated | 
 | 143 | in its original form. | 
 | 144 |  | 
 | 145 | This carried binary data is almost entirely an implementation detail.  The one | 
 | 146 | place where it is visible in the API is in the "internal" API.  A Parser must | 
 | 147 | do the `surrogateescape` encoding of binary input data, and pass that data to | 
 | 148 | the appropriate Policy method.  The "internal" interface used by the Generator | 
 | 149 | to access header values preserves the `surrogateescaped` bytes.  All other | 
 | 150 | interfaces convert the binary data either back into bytes or into a safe form | 
 | 151 | (losing information in some cases). | 
 | 152 |  | 
 | 153 |  | 
 | 154 | Backward Compatibility | 
 | 155 | ---------------------- | 
 | 156 |  | 
 | 157 | The :class:`~email.policy.Policy.Compat32` Policy provides backward | 
 | 158 | compatibility with version 5.1 of the email package.  It does this via the | 
 | 159 | following implementation of the four+1 Policy methods described above: | 
 | 160 |  | 
 | 161 | header_source_parse | 
 | 162 |     Splits the first line on the colon to obtain the name, discards any spaces | 
 | 163 |     after the colon, and joins the remainder of the line with all of the | 
 | 164 |     remaining lines, preserving the linesep characters to obtain the value. | 
 | 165 |     Trailing carriage return and/or linefeed characters are stripped from the | 
 | 166 |     resulting value string. | 
 | 167 |  | 
 | 168 | header_store_parse | 
 | 169 |     Returns the name and value exactly as received from the application. | 
 | 170 |  | 
 | 171 | header_fetch_parse | 
 | 172 |     If the value contains any `surrogateescaped` binary data, return the value | 
 | 173 |     as a :class:`~email.header.Header` object, using the character set | 
 | 174 |     `unknown-8bit`.  Otherwise just returns the value. | 
 | 175 |  | 
 | 176 | fold | 
 | 177 |     Uses :class:`~email.header.Header`'s folding to fold headers in the | 
 | 178 |     same way the email5.1 generator did. | 
 | 179 |  | 
 | 180 | binary_fold | 
 | 181 |     Same as fold, but encodes to 'ascii'. | 
 | 182 |  | 
 | 183 |  | 
 | 184 | New Algorithm | 
 | 185 | ------------- | 
 | 186 |  | 
 | 187 | header_source_parse | 
 | 188 |     Same as legacy behavior. | 
 | 189 |  | 
 | 190 | header_store_parse | 
 | 191 |     Same as legacy behavior. | 
 | 192 |  | 
 | 193 | header_fetch_parse | 
 | 194 |     If the value is already a header object, returns it.  Otherwise, parses the | 
 | 195 |     value using the new parser, and returns the resulting object as the value. | 
 | 196 |     `surrogateescaped` bytes get turned into unicode unknown character code | 
 | 197 |     points. | 
 | 198 |  | 
 | 199 | fold | 
 | 200 |     Uses the new header folding algorithm, respecting the policy settings. | 
 | 201 |     surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for | 
 | 202 |     ``cte_type=7bit`` or ``8bit``.  Returns a string. | 
 | 203 |  | 
 | 204 |     At some point there will also be a ``cte_type=unicode``, and for that | 
 | 205 |     policy fold will serialize the idealized unicode message with RFC-like | 
 | 206 |     folding, converting any surrogateescaped bytes into the unicode | 
 | 207 |     unknown character glyph. | 
 | 208 |  | 
 | 209 | binary_fold | 
 | 210 |     Uses the new header folding algorithm, respecting the policy settings. | 
 | 211 |     surrogateescaped bytes are encoded using the `unknown-8bit` charset for | 
 | 212 |     ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. | 
 | 213 |     Returns bytes. | 
 | 214 |  | 
 | 215 |     At some point there will also be a ``cte_type=unicode``, and for that | 
 | 216 |     policy binary_fold will serialize the message according to :rfc:``5335``. |