Barry Warsaw | 5e63463 | 2001-09-26 05:23:47 +0000 | [diff] [blame^] | 1 | % Copyright (C) 2001 Python Software Foundation |
| 2 | % Author: barry@zope.com (Barry Warsaw) |
| 3 | |
| 4 | \section{\module{email} -- |
| 5 | An email and MIME handling package} |
| 6 | |
| 7 | \declaremodule{standard}{email} |
| 8 | \modulesynopsis{Package supporting the parsing, manipulating, and |
| 9 | generating email messages, including MIME documents.} |
| 10 | \moduleauthor{Barry A. Warsaw}{barry@zope.com} |
| 11 | |
| 12 | \versionadded{2.2} |
| 13 | |
| 14 | The \module{email} package is a library for managing email messages, |
| 15 | including MIME and other \rfc{2822}-based message documents. It |
| 16 | subsumes most of the functionality in several older standard modules |
| 17 | such as \module{rfc822}, \module{mimetools}, \module{multifile}, and |
| 18 | other non-standard packages such as \module{mimecntl}. |
| 19 | |
| 20 | The primary distinguishing feature of the \module{email} package is |
| 21 | that it splits the parsing and generating of email messages from the |
| 22 | internal \emph{object model} representation of email. Applications |
| 23 | using the \module{email} package deal primarily with objects; you can |
| 24 | add sub-objects to messages, remove sub-objects from messages, |
| 25 | completely re-arrange the contents, etc. There is a separate parser |
| 26 | and a separate generator which handles the transformation from flat |
| 27 | text to the object module, and then back to flat text again. There |
| 28 | are also handy subclasses for some common MIME object types, and a few |
| 29 | miscellaneous utilities that help with such common tasks as extracting |
| 30 | and parsing message field values, creating RFC-compliant dates, etc. |
| 31 | |
| 32 | The following sections describe the functionality of the |
| 33 | \module{email} package. The ordering follows a progression that |
| 34 | should be common in applications: an email message is read as flat |
| 35 | text from a file or other source, the text is parsed to produce an |
| 36 | object model representation of the email message, this model is |
| 37 | manipulated, and finally the model is rendered back into |
| 38 | flat text. |
| 39 | |
| 40 | It is perfectly feasible to create the object model out of whole cloth |
| 41 | -- i.e. completely from scratch. From there, a similar progression can |
| 42 | be taken as above. |
| 43 | |
| 44 | Also included are detailed specifications of all the classes and |
| 45 | modules that the \module{email} package provides, the exception |
| 46 | classes you might encounter while using the \module{email} package, |
| 47 | some auxiliary utilities, and a few examples. For users of the older |
| 48 | \module{mimelib} package, from which the \module{email} package is |
| 49 | descendent, a section on differences and porting is provided. |
| 50 | |
| 51 | \subsection{Representing an email message} |
| 52 | |
| 53 | The primary object in the \module{email} package is the |
| 54 | \class{Message} class, provided in the \refmodule{email.Message} |
| 55 | module. \class{Message} is the base class for the \module{email} |
| 56 | object model. It provides the core functionality for setting and |
| 57 | querying header fields, and for accessing message bodies. |
| 58 | |
| 59 | Conceptually, a \class{Message} object consists of \emph{headers} and |
| 60 | \emph{payloads}. Headers are \rfc{2822} style field name and |
| 61 | values where the field name and value are separated by a colon. The |
| 62 | colon is not part of either the field name or the field value. |
| 63 | |
| 64 | Headers are stored and returned in case-preserving form but are |
| 65 | matched case-insensitively. There may also be a single |
| 66 | \emph{Unix-From} header, also known as the envelope header or the |
| 67 | \code{From_} header. The payload is either a string in the case of |
| 68 | simple message objects, a list of \class{Message} objects for |
| 69 | multipart MIME documents, or a single \class{Message} instance for |
| 70 | \code{message/rfc822} type objects. |
| 71 | |
| 72 | \class{Message} objects provide a mapping style interface for |
| 73 | accessing the message headers, and an explicit interface for accessing |
| 74 | both the headers and the payload. It provides convenience methods for |
| 75 | generating a flat text representation of the message object tree, for |
| 76 | accessing commonly used header parameters, and for recursively walking |
| 77 | over the object tree. |
| 78 | |
| 79 | \subsection{Parsing email messages} |
| 80 | Message object trees can be created in one of two ways: they can be |
| 81 | created from whole cloth by instantiating \class{Message} objects and |
| 82 | stringing them together via \method{add_payload()} and |
| 83 | \method{set_payload()} calls, or they can be created by parsing a flat text |
| 84 | representation of the email message. |
| 85 | |
| 86 | The \module{email} package provides a standard parser that understands |
| 87 | most email document structures, including MIME documents. You can |
| 88 | pass the parser a string or a file object, and the parser will return |
| 89 | to you the root \class{Message} instance of the object tree. For |
| 90 | simple, non-MIME messages the payload of this root object will likely |
| 91 | be a string (e.g. containing the text of the message). For MIME |
| 92 | messages, the root object will return 1 from its |
| 93 | \method{is_multipart()} method, and the subparts can be accessed via |
| 94 | the \method{get_payload()} and \method{walk()} methods. |
| 95 | |
| 96 | Note that the parser can be extended in limited ways, and of course |
| 97 | you can implement your own parser completely from scratch. There is |
| 98 | no magical connection between the \module{email} package's bundled |
| 99 | parser and the |
| 100 | \class{Message} class, so your custom parser can create message object |
| 101 | trees in any way it find necessary. The \module{email} package's |
| 102 | parser is described in detail in the \refmodule{email.Parser} module |
| 103 | documentation. |
| 104 | |
| 105 | \subsection{Generating MIME documents} |
| 106 | One of the most common tasks is to generate the flat text of the email |
| 107 | message represented by a message object tree. You will need to do |
| 108 | this if you want to send your message via the \refmodule{smtplib} |
| 109 | module or the \refmodule{nntplib} module, or print the message on the |
| 110 | console. Taking a message object tree and producing a flat text |
| 111 | document is the job of the \refmodule{email.Generator} module. |
| 112 | |
| 113 | Again, as with the \refmodule{email.Parser} module, you aren't limited |
| 114 | to the functionality of the bundled generator; you could write one |
| 115 | from scratch yourself. However the bundled generator knows how to |
| 116 | generate most email in a standards-compliant way, should handle MIME |
| 117 | and non-MIME email messages just fine, and is designed so that the |
| 118 | transformation from flat text, to an object tree via the |
| 119 | \class{Parser} class, |
| 120 | and back to flat text, be idempotent (the input is identical to the |
| 121 | output). |
| 122 | |
| 123 | \subsection{Creating email and MIME objects from scratch} |
| 124 | |
| 125 | Ordinarily, you get a message object tree by passing some text to a |
| 126 | parser, which parses the text and returns the root of the message |
| 127 | object tree. However you can also build a complete object tree from |
| 128 | scratch, or even individual \class{Message} objects by hand. In fact, |
| 129 | you can also take an existing tree and add new \class{Message} |
| 130 | objects, move them around, etc. This makes a very convenient |
| 131 | interface for slicing-and-dicing MIME messages. |
| 132 | |
| 133 | You can create a new object tree by creating \class{Message} |
| 134 | instances, adding payloads and all the appropriate headers manually. |
| 135 | For MIME messages though, the \module{email} package provides some |
| 136 | convenient classes to make things easier. Each of these classes |
| 137 | should be imported from a module with the same name as the class, from |
| 138 | within the \module{email} package. E.g.: |
| 139 | |
| 140 | \begin{verbatim} |
| 141 | import email.MIMEImage.MIMEImage |
| 142 | \end{verbatim} |
| 143 | |
| 144 | or |
| 145 | |
| 146 | \begin{verbatim} |
| 147 | from email.MIMEText import MIMEText |
| 148 | \end{verbatim} |
| 149 | |
| 150 | Here are the classes: |
| 151 | |
| 152 | \begin{classdesc}{MIMEBase}{_maintype, _subtype, **_params} |
| 153 | This is the base class for all the MIME-specific subclasses of |
| 154 | \class{Message}. Ordinarily you won't create instances specifically |
| 155 | of \class{MIMEBase}, although you could. \class{MIMEBase} is provided |
| 156 | primarily as a convenient base class for more specific MIME-aware |
| 157 | subclasses. |
| 158 | |
| 159 | \var{_maintype} is the \code{Content-Type:} major type (e.g. \code{text} or |
| 160 | \code{image}), and \var{_subtype} is the \code{Content-Type:} minor type |
| 161 | (e.g. \code{plain} or \code{gif}). \var{_params} is a parameter |
| 162 | key/value dictionary and is passed directly to |
| 163 | \method{Message.add_header()}. |
| 164 | |
| 165 | The \class{MIMEBase} class always adds a \code{Content-Type:} header |
| 166 | (based on \var{_maintype}, \var{_subtype}, and \var{_params}), and a |
| 167 | \code{MIME-Version:} header (always set to \code{1.0}). |
| 168 | \end{classdesc} |
| 169 | |
| 170 | \begin{classdesc}{MIMEImage}{_imagedata\optional{, _subtype\optional{, |
| 171 | _encoder\optional{, **_params}}}} |
| 172 | |
| 173 | A subclass of \class{MIMEBase}, the \class{MIMEImage} class is used to |
| 174 | create MIME message objects of major type \code{image}. |
| 175 | \var{_imagedata} is a string containing the raw image data. If this |
| 176 | data can be decoded by the standard Python module \refmodule{imghdr}, |
| 177 | then the subtype will be automatically included in the |
| 178 | \code{Content-Type:} header. Otherwise you can explicitly specify the |
| 179 | image subtype via the \var{_subtype} parameter. If the minor type could |
| 180 | not be guessed and \var{_subtype} was not given, then \code{TypeError} |
| 181 | is raised. |
| 182 | |
| 183 | Optional \var{_encoder} is a callable (i.e. function) which will |
| 184 | perform the actual encoding of the image data for transport. This |
| 185 | callable takes one argument, which is the \class{MIMEImage} instance. |
| 186 | It should use \method{get_payload()} and \method{set_payload()} to |
| 187 | change the payload to encoded form. It should also add any |
| 188 | \code{Content-Transfer-Encoding:} or other headers to the message |
| 189 | object as necessary. The default encoding is \emph{Base64}. See the |
| 190 | \refmodule{email.Encoders} module for a list of the built-in encoders. |
| 191 | |
| 192 | \var{_params} are passed straight through to the \class{MIMEBase} |
| 193 | constructor. |
| 194 | \end{classdesc} |
| 195 | |
| 196 | \begin{classdesc}{MIMEText}{_text\optional{, _subtype\optional{, |
| 197 | _charset\optional{, _encoder}}}} |
| 198 | A subclass of \class{MIMEBase}, the \class{MIMEText} class is used to |
| 199 | create MIME objects of major type \code{text}. \var{_text} is the string |
| 200 | for the payload. \var{_subtype} is the minor type and defaults to |
| 201 | \code{plain}. \var{_charset} is the character set of the text and is |
| 202 | passed as a parameter to the \class{MIMEBase} constructor; it defaults |
| 203 | to \code{us-ascii}. No guessing or encoding is performed on the text |
| 204 | data, but a newline is appended to \var{_text} if it doesn't already |
| 205 | end with a newline. |
| 206 | |
| 207 | The \var{_encoding} argument is as with the \class{MIMEImage} class |
| 208 | constructor, except that the default encoding for \class{MIMEText} |
| 209 | objects is one that doesn't actually modify the payload, but does set |
| 210 | the \code{Content-Transfer-Encoding:} header to \code{7bit} or |
| 211 | \code{8bit} as appropriate. |
| 212 | \end{classdesc} |
| 213 | |
| 214 | \begin{classdesc}{MIMEMessage}{_msg\optional{, _subtype}} |
| 215 | A subclass of \class{MIMEBase}, the \class{MIMEMessage} class is used to |
| 216 | create MIME objects of main type \code{message}. \var{_msg} is used as |
| 217 | the payload, and must be an instance of class \class{Message} (or a |
| 218 | subclass thereof), otherwise a \exception{TypeError} is raised. |
| 219 | |
| 220 | Optional \var{_subtype} sets the subtype of the message; it defaults |
| 221 | to \code{rfc822}. |
| 222 | \end{classdesc} |
| 223 | |
| 224 | \subsection{Encoders, Exceptions, Utilities, and Iterators} |
| 225 | |
| 226 | The \module{email} package provides various encoders for safe |
| 227 | transport of binary payloads in \class{MIMEImage} and \class{MIMEText} |
| 228 | instances. See the \refmodule{email.Encoders} module for more |
| 229 | details. |
| 230 | |
| 231 | All of the class exceptions that the \module{email} package can raise |
| 232 | are available in the \refmodule{email.Errors} module. |
| 233 | |
| 234 | Some miscellaneous utility functions are available in the |
| 235 | \refmodule{email.Utils} module. |
| 236 | |
| 237 | Iterating over a message object tree is easy with the |
| 238 | \method{Message.walk()} method; some additional helper iterators are |
| 239 | available in the \refmodule{email.Iterators} module. |
| 240 | |
| 241 | \subsection{Differences from \module{mimelib}} |
| 242 | |
| 243 | The \module{email} package was originally prototyped as a separate |
| 244 | library called \module{mimelib}. Changes have been made so that |
| 245 | method names are more consistent, and some methods or modules have |
| 246 | either been added or removed. The semantics of some of the methods |
| 247 | have also changed. For the most part, any functionality available in |
| 248 | \module{mimelib} is still available in the \module{email} package, |
| 249 | albeit often in a different way. |
| 250 | |
| 251 | Here is a brief description of the differences between the |
| 252 | \module{mimelib} and the \module{email} packages, along with hints on |
| 253 | how to port your applications. |
| 254 | |
| 255 | Of course, the most visible difference between the two packages is |
| 256 | that the package name has been changed to \module{email}. In |
| 257 | addition, the top-level package has the following differences: |
| 258 | |
| 259 | \begin{itemize} |
| 260 | \item \function{messageFromString()} has been renamed to |
| 261 | \function{message_from_string()}. |
| 262 | \item \function{messageFromFile()} has been renamed to |
| 263 | \function{message_from_file()}. |
| 264 | \end{itemize} |
| 265 | |
| 266 | The \class{Message} class has the following differences: |
| 267 | |
| 268 | \begin{itemize} |
| 269 | \item The method \method{asString()} was renamed to \method{as_string()}. |
| 270 | \item The method \method{ismultipart()} was renamed to |
| 271 | \method{is_multipart()}. |
| 272 | \item The \method{get_payload()} method has grown a \var{decode} |
| 273 | optional argument. |
| 274 | \item The method \method{getall()} was renamed to \method{get_all()}. |
| 275 | \item The method \method{addheader()} was renamed to \method{add_header()}. |
| 276 | \item The method \method{gettype()} was renamed to \method{get_type()}. |
| 277 | \item The method\method{getmaintype()} was renamed to |
| 278 | \method{get_main_type()}. |
| 279 | \item The method \method{getsubtype()} was renamed to |
| 280 | \method{get_subtype()}. |
| 281 | \item The method \method{getparams()} was renamed to |
| 282 | \method{get_params()}. |
| 283 | Also, whereas \method{getparams()} returned a list of strings, |
| 284 | \method{get_params()} returns a list of 2-tuples, effectively |
| 285 | the key/value pairs of the parameters, split on the \samp{=} |
| 286 | sign. |
| 287 | \item The method \method{getparam()} was renamed to \method{get_param()}. |
| 288 | \item The method \method{getcharsets()} was renamed to |
| 289 | \method{get_charsets()}. |
| 290 | \item The method \method{getfilename()} was renamed to |
| 291 | \method{get_filename()}. |
| 292 | \item The method \method{getboundary()} was renamed to |
| 293 | \method{get_boundary()}. |
| 294 | \item The method \method{setboundary()} was renamed to |
| 295 | \method{set_boundary()}. |
| 296 | \item The method \method{getdecodedpayload()} was removed. To get |
| 297 | similar functionality, pass the value 1 to the \var{decode} flag |
| 298 | of the {get_payload()} method. |
| 299 | \item The method \method{getpayloadastext()} was removed. Similar |
| 300 | functionality |
| 301 | is supported by the \class{DecodedGenerator} class in the |
| 302 | \refmodule{email.Generator} module. |
| 303 | \item The method \method{getbodyastext()} was removed. You can get |
| 304 | similar functionality by creating an iterator with |
| 305 | \function{typed_subpart_iterator()} in the |
| 306 | \refmodule{email.Iterators} module. |
| 307 | \end{itemize} |
| 308 | |
| 309 | The \class{Parser} class has no differences in its public interface. |
| 310 | It does have some additional smarts to recognize |
| 311 | \code{message/delivery-status} type messages, which it represents as |
| 312 | a \class{Message} instance containing separate \class{Message} |
| 313 | subparts for each header block in the delivery status |
| 314 | notification\footnote{Delivery Status Notifications (DSN) are defined |
| 315 | in \rfc{1894}}. |
| 316 | |
| 317 | The \class{Generator} class has no differences in its public |
| 318 | interface. There is a new class in the \refmodule{email.Generator} |
| 319 | module though, called \class{DecodedGenerator} which provides most of |
| 320 | the functionality previously available in the |
| 321 | \method{Message.getpayloadastext()} method. |
| 322 | |
| 323 | The following modules and classes have been changed: |
| 324 | |
| 325 | \begin{itemize} |
| 326 | \item The \class{MIMEBase} class constructor arguments \var{_major} |
| 327 | and \var{_minor} have changed to \var{_maintype} and |
| 328 | \var{_subtype} respectively. |
| 329 | \item The \code{Image} class/module has been renamed to |
| 330 | \code{MIMEImage}. The \var{_minor} argument has been renamed to |
| 331 | \var{_subtype}. |
| 332 | \item The \code{Text} class/module has been renamed to |
| 333 | \code{MIMEText}. The \var{_minor} argument has been renamed to |
| 334 | \var{_subtype}. |
| 335 | \item The \code{MessageRFC822} class/module has been renamed to |
| 336 | \code{MIMEMessage}. Note that an earlier version of |
| 337 | \module{mimelib} called this class/module \code{RFC822}, but |
| 338 | that clashed with the Python standard library module |
| 339 | \refmodule{rfc822} on some case-insensitive file systems. |
| 340 | |
| 341 | Also, the \class{MIMEMessage} class now represents any kind of |
| 342 | MIME message with main type \code{message}. It takes an |
| 343 | optional argument \var{_subtype} which is used to set the MIME |
| 344 | subtype. \var{_subtype} defaults to \code{rfc822}. |
| 345 | \end{itemize} |
| 346 | |
| 347 | \module{mimelib} provided some utility functions in its |
| 348 | \module{address} and \module{date} modules. All of these functions |
| 349 | have been moved to the \refmodule{email.Utils} module. |
| 350 | |
| 351 | The \code{MsgReader} class/module has been removed. Its functionality |
| 352 | is most closely supported in the \function{body_line_iterator()} |
| 353 | function in the \refmodule{email.Iterators} module. |
| 354 | |
| 355 | \subsection{Examples} |
| 356 | |
| 357 | Coming soon... |
| 358 | |