| \declaremodule{standard}{email.Header} |
| \modulesynopsis{Representing non-ASCII headers} |
| |
| \rfc{2822} is the base standard that describes the format of email |
| messages. It derives from the older \rfc{822} standard which came |
| into widespread use at a time when most email was composed of \ASCII{} |
| characters only. \rfc{2822} is a specification written assuming email |
| contains only 7-bit \ASCII{} characters. |
| |
| Of course, as email has been deployed worldwide, it has become |
| internationalized, such that language specific character sets can now |
| be used in email messages. The base standard still requires email |
| messages to be transfered using only 7-bit \ASCII{} characters, so a |
| slew of RFCs have been written describing how to encode email |
| containing non-\ASCII{} characters into \rfc{2822}-compliant format. |
| These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}. |
| The \module{email} package supports these standards in its |
| \module{email.Header} and \module{email.Charset} modules. |
| |
| If you want to include non-\ASCII{} characters in your email headers, |
| say in the \mailheader{Subject} or \mailheader{To} fields, you should |
| use the \class{Header} class and assign the field in the |
| \class{Message} object to an instance of \class{Header} instead of |
| using a string for the header value. For example: |
| |
| \begin{verbatim} |
| >>> from email.Message import Message |
| >>> from email.Header import Header |
| >>> msg = Message() |
| >>> h = Header('p\xf6stal', 'iso-8859-1') |
| >>> msg['Subject'] = h |
| >>> print msg.as_string() |
| Subject: =?iso-8859-1?q?p=F6stal?= |
| |
| |
| \end{verbatim} |
| |
| Notice here how we wanted the \mailheader{Subject} field to contain a |
| non-\ASCII{} character? We did this by creating a \class{Header} |
| instance and passing in the character set that the byte string was |
| encoded in. When the subsequent \class{Message} instance was |
| flattened, the \mailheader{Subject} field was properly \rfc{2047} |
| encoded. MIME-aware mail readers would show this header using the |
| embedded ISO-8859-1 character. |
| |
| \versionadded{2.2.2} |
| |
| Here is the \class{Header} class description: |
| |
| \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{, |
| maxlinelen\optional{, header_name\optional{, continuation_ws\optional{, |
| errors}}}}}}} |
| Create a MIME-compliant header that can contain strings in different |
| character sets. |
| |
| Optional \var{s} is the initial header value. If \code{None} (the |
| default), the initial header value is not set. You can later append |
| to the header with \method{append()} method calls. \var{s} may be a |
| byte string or a Unicode string, but see the \method{append()} |
| documentation for semantics. |
| |
| Optional \var{charset} serves two purposes: it has the same meaning as |
| the \var{charset} argument to the \method{append()} method. It also |
| sets the default character set for all subsequent \method{append()} |
| calls that omit the \var{charset} argument. If \var{charset} is not |
| provided in the constructor (the default), the \code{us-ascii} |
| character set is used both as \var{s}'s initial charset and as the |
| default for subsequent \method{append()} calls. |
| |
| The maximum line length can be specified explicit via |
| \var{maxlinelen}. For splitting the first line to a shorter value (to |
| account for the field header which isn't included in \var{s}, |
| e.g. \mailheader{Subject}) pass in the name of the field in |
| \var{header_name}. The default \var{maxlinelen} is 76, and the |
| default value for \var{header_name} is \code{None}, meaning it is not |
| taken into account for the first line of a long, split header. |
| |
| Optional \var{continuation_ws} must be \rfc{2822}-compliant folding |
| whitespace, and is usually either a space or a hard tab character. |
| This character will be prepended to continuation lines. |
| \end{classdesc} |
| |
| Optional \var{errors} is passed straight through to the |
| \method{append()} method. |
| |
| \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}} |
| Append the string \var{s} to the MIME header. |
| |
| Optional \var{charset}, if given, should be a \class{Charset} instance |
| (see \refmodule{email.Charset}) or the name of a character set, which |
| will be converted to a \class{Charset} instance. A value of |
| \code{None} (the default) means that the \var{charset} given in the |
| constructor is used. |
| |
| \var{s} may be a byte string or a Unicode string. If it is a byte |
| string (i.e. \code{isinstance(s, str)} is true), then |
| \var{charset} is the encoding of that byte string, and a |
| \exception{UnicodeError} will be raised if the string cannot be |
| decoded with that character set. |
| |
| If \var{s} is a Unicode string, then \var{charset} is a hint |
| specifying the character set of the characters in the string. In this |
| case, when producing an \rfc{2822}-compliant header using \rfc{2047} |
| rules, the Unicode string will be encoded using the following charsets |
| in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The |
| first character set to not provoke a \exception{UnicodeError} is used. |
| |
| Optional \var{errors} is passed through to any \function{unicode()} or |
| \function{ustr.encode()} call, and defaults to ``strict''. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Header]{encode}{\optional{splitchars}} |
| Encode a message header into an RFC-compliant format, possibly |
| wrapping long lines and encapsulating non-\ASCII{} parts in base64 or |
| quoted-printable encodings. Optional \var{splitchars} is a string |
| containing characters to split long ASCII lines on, in rough support |
| of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't |
| affect \rfc{2047} encoded lines. |
| \end{methoddesc} |
| |
| The \class{Header} class also provides a number of methods to support |
| standard operators and built-in functions. |
| |
| \begin{methoddesc}[Header]{__str__}{} |
| A synonym for \method{Header.encode()}. Useful for |
| \code{str(aHeader)}. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Header]{__unicode__}{} |
| A helper for the built-in \function{unicode()} function. Returns the |
| header as a Unicode string. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Header]{__eq__}{other} |
| This method allows you to compare two \class{Header} instances for equality. |
| \end{methoddesc} |
| |
| \begin{methoddesc}[Header]{__ne__}{other} |
| This method allows you to compare two \class{Header} instances for inequality. |
| \end{methoddesc} |
| |
| The \module{email.Header} module also provides the following |
| convenient functions. |
| |
| \begin{funcdesc}{decode_header}{header} |
| Decode a message header value without converting the character set. |
| The header value is in \var{header}. |
| |
| This function returns a list of \code{(decoded_string, charset)} pairs |
| containing each of the decoded parts of the header. \var{charset} is |
| \code{None} for non-encoded parts of the header, otherwise a lower |
| case string containing the name of the character set specified in the |
| encoded string. |
| |
| Here's an example: |
| |
| \begin{verbatim} |
| >>> from email.Header import decode_header |
| >>> decode_header('=?iso-8859-1?q?p=F6stal?=') |
| [('p\\xf6stal', 'iso-8859-1')] |
| \end{verbatim} |
| \end{funcdesc} |
| |
| \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{, |
| header_name\optional{, continuation_ws}}}} |
| Create a \class{Header} instance from a sequence of pairs as returned |
| by \function{decode_header()}. |
| |
| \function{decode_header()} takes a header value string and returns a |
| sequence of pairs of the format \code{(decoded_string, charset)} where |
| \var{charset} is the name of the character set. |
| |
| This function takes one of those sequence of pairs and returns a |
| \class{Header} instance. Optional \var{maxlinelen}, |
| \var{header_name}, and \var{continuation_ws} are as in the |
| \class{Header} constructor. |
| \end{funcdesc} |