blob: 524d08c378268057da4b3dd4ab79498d234b2449 [file] [log] [blame]
Thomas Wouters49fd7fa2006-04-21 10:40:58 +00001\declaremodule{standard}{email.header}
Barry Warsaw5b9da892002-10-01 01:05:52 +00002\modulesynopsis{Representing non-ASCII headers}
3
4\rfc{2822} is the base standard that describes the format of email
5messages. It derives from the older \rfc{822} standard which came
Barry Warsaw5db478f2002-10-01 04:33:16 +00006into widespread use at a time when most email was composed of \ASCII{}
Barry Warsaw5b9da892002-10-01 01:05:52 +00007characters only. \rfc{2822} is a specification written assuming email
8contains only 7-bit \ASCII{} characters.
9
10Of course, as email has been deployed worldwide, it has become
11internationalized, such that language specific character sets can now
12be used in email messages. The base standard still requires email
Raymond Hettinger68804312005-01-01 00:28:46 +000013messages to be transferred using only 7-bit \ASCII{} characters, so a
Barry Warsaw5b9da892002-10-01 01:05:52 +000014slew of RFCs have been written describing how to encode email
15containing non-\ASCII{} characters into \rfc{2822}-compliant format.
16These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}.
17The \module{email} package supports these standards in its
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000018\module{email.header} and \module{email.charset} modules.
Barry Warsaw5b9da892002-10-01 01:05:52 +000019
20If you want to include non-\ASCII{} characters in your email headers,
21say in the \mailheader{Subject} or \mailheader{To} fields, you should
Barry Warsaw5db478f2002-10-01 04:33:16 +000022use the \class{Header} class and assign the field in the
23\class{Message} object to an instance of \class{Header} instead of
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000024using a string for the header value. Import the \class{Header} class from the
25\module{email.header} module. For example:
Barry Warsaw5b9da892002-10-01 01:05:52 +000026
27\begin{verbatim}
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000028>>> from email.message import Message
29>>> from email.header import Header
Barry Warsaw5b9da892002-10-01 01:05:52 +000030>>> msg = Message()
31>>> h = Header('p\xf6stal', 'iso-8859-1')
32>>> msg['Subject'] = h
33>>> print msg.as_string()
34Subject: =?iso-8859-1?q?p=F6stal?=
35
36
37\end{verbatim}
38
39Notice here how we wanted the \mailheader{Subject} field to contain a
40non-\ASCII{} character? We did this by creating a \class{Header}
41instance and passing in the character set that the byte string was
42encoded in. When the subsequent \class{Message} instance was
43flattened, the \mailheader{Subject} field was properly \rfc{2047}
44encoded. MIME-aware mail readers would show this header using the
45embedded ISO-8859-1 character.
46
47\versionadded{2.2.2}
48
49Here is the \class{Header} class description:
50
51\begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
Barry Warsawd1adc8a2002-12-30 19:17:37 +000052 maxlinelen\optional{, header_name\optional{, continuation_ws\optional{,
53 errors}}}}}}}
Barry Warsaw5db478f2002-10-01 04:33:16 +000054Create a MIME-compliant header that can contain strings in different
55character sets.
Barry Warsaw5b9da892002-10-01 01:05:52 +000056
57Optional \var{s} is the initial header value. If \code{None} (the
58default), the initial header value is not set. You can later append
59to the header with \method{append()} method calls. \var{s} may be a
60byte string or a Unicode string, but see the \method{append()}
61documentation for semantics.
62
63Optional \var{charset} serves two purposes: it has the same meaning as
64the \var{charset} argument to the \method{append()} method. It also
65sets the default character set for all subsequent \method{append()}
66calls that omit the \var{charset} argument. If \var{charset} is not
67provided in the constructor (the default), the \code{us-ascii}
68character set is used both as \var{s}'s initial charset and as the
69default for subsequent \method{append()} calls.
70
71The maximum line length can be specified explicit via
72\var{maxlinelen}. For splitting the first line to a shorter value (to
73account for the field header which isn't included in \var{s},
74e.g. \mailheader{Subject}) pass in the name of the field in
75\var{header_name}. The default \var{maxlinelen} is 76, and the
76default value for \var{header_name} is \code{None}, meaning it is not
77taken into account for the first line of a long, split header.
78
Barry Warsaw5db478f2002-10-01 04:33:16 +000079Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
Barry Warsaw5b9da892002-10-01 01:05:52 +000080whitespace, and is usually either a space or a hard tab character.
81This character will be prepended to continuation lines.
82\end{classdesc}
83
Barry Warsawd1adc8a2002-12-30 19:17:37 +000084Optional \var{errors} is passed straight through to the
85\method{append()} method.
86
87\begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}}
Barry Warsaw5b9da892002-10-01 01:05:52 +000088Append the string \var{s} to the MIME header.
89
90Optional \var{charset}, if given, should be a \class{Charset} instance
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000091(see \refmodule{email.charset}) or the name of a character set, which
Barry Warsaw5b9da892002-10-01 01:05:52 +000092will be converted to a \class{Charset} instance. A value of
93\code{None} (the default) means that the \var{charset} given in the
94constructor is used.
95
96\var{s} may be a byte string or a Unicode string. If it is a byte
Barry Warsaw5db478f2002-10-01 04:33:16 +000097string (i.e. \code{isinstance(s, str)} is true), then
Barry Warsaw5b9da892002-10-01 01:05:52 +000098\var{charset} is the encoding of that byte string, and a
99\exception{UnicodeError} will be raised if the string cannot be
100decoded with that character set.
101
102If \var{s} is a Unicode string, then \var{charset} is a hint
103specifying the character set of the characters in the string. In this
104case, when producing an \rfc{2822}-compliant header using \rfc{2047}
105rules, the Unicode string will be encoded using the following charsets
106in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The
107first character set to not provoke a \exception{UnicodeError} is used.
Barry Warsawd1adc8a2002-12-30 19:17:37 +0000108
109Optional \var{errors} is passed through to any \function{unicode()} or
110\function{ustr.encode()} call, and defaults to ``strict''.
Barry Warsaw5b9da892002-10-01 01:05:52 +0000111\end{methoddesc}
112
Barry Warsawf32e3132003-03-06 06:06:54 +0000113\begin{methoddesc}[Header]{encode}{\optional{splitchars}}
Barry Warsaw5b9da892002-10-01 01:05:52 +0000114Encode a message header into an RFC-compliant format, possibly
115wrapping long lines and encapsulating non-\ASCII{} parts in base64 or
Barry Warsawf32e3132003-03-06 06:06:54 +0000116quoted-printable encodings. Optional \var{splitchars} is a string
117containing characters to split long ASCII lines on, in rough support
118of \rfc{2822}'s \emph{highest level syntactic breaks}. This doesn't
119affect \rfc{2047} encoded lines.
Barry Warsaw5b9da892002-10-01 01:05:52 +0000120\end{methoddesc}
121
122The \class{Header} class also provides a number of methods to support
123standard operators and built-in functions.
124
125\begin{methoddesc}[Header]{__str__}{}
126A synonym for \method{Header.encode()}. Useful for
Barry Warsaw5db478f2002-10-01 04:33:16 +0000127\code{str(aHeader)}.
Barry Warsaw5b9da892002-10-01 01:05:52 +0000128\end{methoddesc}
129
130\begin{methoddesc}[Header]{__unicode__}{}
131A helper for the built-in \function{unicode()} function. Returns the
132header as a Unicode string.
133\end{methoddesc}
134
135\begin{methoddesc}[Header]{__eq__}{other}
136This method allows you to compare two \class{Header} instances for equality.
137\end{methoddesc}
138
139\begin{methoddesc}[Header]{__ne__}{other}
140This method allows you to compare two \class{Header} instances for inequality.
141\end{methoddesc}
142
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000143The \module{email.header} module also provides the following
Barry Warsaw5b9da892002-10-01 01:05:52 +0000144convenient functions.
145
146\begin{funcdesc}{decode_header}{header}
147Decode a message header value without converting the character set.
148The header value is in \var{header}.
149
150This function returns a list of \code{(decoded_string, charset)} pairs
151containing each of the decoded parts of the header. \var{charset} is
152\code{None} for non-encoded parts of the header, otherwise a lower
153case string containing the name of the character set specified in the
154encoded string.
155
156Here's an example:
157
158\begin{verbatim}
Thomas Wouters49fd7fa2006-04-21 10:40:58 +0000159>>> from email.header import decode_header
Barry Warsaw5b9da892002-10-01 01:05:52 +0000160>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
Edward Loper3077b022004-09-28 02:54:54 +0000161[('p\xf6stal', 'iso-8859-1')]
Barry Warsaw5b9da892002-10-01 01:05:52 +0000162\end{verbatim}
163\end{funcdesc}
164
165\begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{,
166 header_name\optional{, continuation_ws}}}}
167Create a \class{Header} instance from a sequence of pairs as returned
168by \function{decode_header()}.
169
170\function{decode_header()} takes a header value string and returns a
171sequence of pairs of the format \code{(decoded_string, charset)} where
172\var{charset} is the name of the character set.
173
174This function takes one of those sequence of pairs and returns a
175\class{Header} instance. Optional \var{maxlinelen},
176\var{header_name}, and \var{continuation_ws} are as in the
177\class{Header} constructor.
178\end{funcdesc}