Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 1 | \declaremodule{standard}{email.Header} |
| 2 | \modulesynopsis{Representing non-ASCII headers} |
| 3 | |
| 4 | \rfc{2822} is the base standard that describes the format of email |
| 5 | messages. It derives from the older \rfc{822} standard which came |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 6 | into widespread use at a time when most email was composed of \ASCII{} |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 7 | characters only. \rfc{2822} is a specification written assuming email |
| 8 | contains only 7-bit \ASCII{} characters. |
| 9 | |
| 10 | Of course, as email has been deployed worldwide, it has become |
| 11 | internationalized, such that language specific character sets can now |
| 12 | be used in email messages. The base standard still requires email |
| 13 | messages to be transfered using only 7-bit \ASCII{} characters, so a |
| 14 | slew of RFCs have been written describing how to encode email |
| 15 | containing non-\ASCII{} characters into \rfc{2822}-compliant format. |
| 16 | These RFCs include \rfc{2045}, \rfc{2046}, \rfc{2047}, and \rfc{2231}. |
| 17 | The \module{email} package supports these standards in its |
| 18 | \module{email.Header} and \module{email.Charset} modules. |
| 19 | |
| 20 | If you want to include non-\ASCII{} characters in your email headers, |
| 21 | say in the \mailheader{Subject} or \mailheader{To} fields, you should |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 22 | use the \class{Header} class and assign the field in the |
| 23 | \class{Message} object to an instance of \class{Header} instead of |
| 24 | using a string for the header value. For example: |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 25 | |
| 26 | \begin{verbatim} |
| 27 | >>> from email.Message import Message |
| 28 | >>> from email.Header import Header |
| 29 | >>> msg = Message() |
| 30 | >>> h = Header('p\xf6stal', 'iso-8859-1') |
| 31 | >>> msg['Subject'] = h |
| 32 | >>> print msg.as_string() |
| 33 | Subject: =?iso-8859-1?q?p=F6stal?= |
| 34 | |
| 35 | |
| 36 | \end{verbatim} |
| 37 | |
| 38 | Notice here how we wanted the \mailheader{Subject} field to contain a |
| 39 | non-\ASCII{} character? We did this by creating a \class{Header} |
| 40 | instance and passing in the character set that the byte string was |
| 41 | encoded in. When the subsequent \class{Message} instance was |
| 42 | flattened, the \mailheader{Subject} field was properly \rfc{2047} |
| 43 | encoded. MIME-aware mail readers would show this header using the |
| 44 | embedded ISO-8859-1 character. |
| 45 | |
| 46 | \versionadded{2.2.2} |
| 47 | |
| 48 | Here is the \class{Header} class description: |
| 49 | |
| 50 | \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{, |
Barry Warsaw | d1adc8a | 2002-12-30 19:17:37 +0000 | [diff] [blame] | 51 | maxlinelen\optional{, header_name\optional{, continuation_ws\optional{, |
| 52 | errors}}}}}}} |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 53 | Create a MIME-compliant header that can contain strings in different |
| 54 | character sets. |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 55 | |
| 56 | Optional \var{s} is the initial header value. If \code{None} (the |
| 57 | default), the initial header value is not set. You can later append |
| 58 | to the header with \method{append()} method calls. \var{s} may be a |
| 59 | byte string or a Unicode string, but see the \method{append()} |
| 60 | documentation for semantics. |
| 61 | |
| 62 | Optional \var{charset} serves two purposes: it has the same meaning as |
| 63 | the \var{charset} argument to the \method{append()} method. It also |
| 64 | sets the default character set for all subsequent \method{append()} |
| 65 | calls that omit the \var{charset} argument. If \var{charset} is not |
| 66 | provided in the constructor (the default), the \code{us-ascii} |
| 67 | character set is used both as \var{s}'s initial charset and as the |
| 68 | default for subsequent \method{append()} calls. |
| 69 | |
| 70 | The maximum line length can be specified explicit via |
| 71 | \var{maxlinelen}. For splitting the first line to a shorter value (to |
| 72 | account for the field header which isn't included in \var{s}, |
| 73 | e.g. \mailheader{Subject}) pass in the name of the field in |
| 74 | \var{header_name}. The default \var{maxlinelen} is 76, and the |
| 75 | default value for \var{header_name} is \code{None}, meaning it is not |
| 76 | taken into account for the first line of a long, split header. |
| 77 | |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 78 | Optional \var{continuation_ws} must be \rfc{2822}-compliant folding |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 79 | whitespace, and is usually either a space or a hard tab character. |
| 80 | This character will be prepended to continuation lines. |
| 81 | \end{classdesc} |
| 82 | |
Barry Warsaw | d1adc8a | 2002-12-30 19:17:37 +0000 | [diff] [blame] | 83 | Optional \var{errors} is passed straight through to the |
| 84 | \method{append()} method. |
| 85 | |
| 86 | \begin{methoddesc}[Header]{append}{s\optional{, charset\optional{, errors}}} |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 87 | Append the string \var{s} to the MIME header. |
| 88 | |
| 89 | Optional \var{charset}, if given, should be a \class{Charset} instance |
| 90 | (see \refmodule{email.Charset}) or the name of a character set, which |
| 91 | will be converted to a \class{Charset} instance. A value of |
| 92 | \code{None} (the default) means that the \var{charset} given in the |
| 93 | constructor is used. |
| 94 | |
| 95 | \var{s} may be a byte string or a Unicode string. If it is a byte |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 96 | string (i.e. \code{isinstance(s, str)} is true), then |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 97 | \var{charset} is the encoding of that byte string, and a |
| 98 | \exception{UnicodeError} will be raised if the string cannot be |
| 99 | decoded with that character set. |
| 100 | |
| 101 | If \var{s} is a Unicode string, then \var{charset} is a hint |
| 102 | specifying the character set of the characters in the string. In this |
| 103 | case, when producing an \rfc{2822}-compliant header using \rfc{2047} |
| 104 | rules, the Unicode string will be encoded using the following charsets |
| 105 | in order: \code{us-ascii}, the \var{charset} hint, \code{utf-8}. The |
| 106 | first character set to not provoke a \exception{UnicodeError} is used. |
Barry Warsaw | d1adc8a | 2002-12-30 19:17:37 +0000 | [diff] [blame] | 107 | |
| 108 | Optional \var{errors} is passed through to any \function{unicode()} or |
| 109 | \function{ustr.encode()} call, and defaults to ``strict''. |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 110 | \end{methoddesc} |
| 111 | |
| 112 | \begin{methoddesc}[Header]{encode}{} |
| 113 | Encode a message header into an RFC-compliant format, possibly |
| 114 | wrapping long lines and encapsulating non-\ASCII{} parts in base64 or |
| 115 | quoted-printable encodings. |
| 116 | \end{methoddesc} |
| 117 | |
| 118 | The \class{Header} class also provides a number of methods to support |
| 119 | standard operators and built-in functions. |
| 120 | |
| 121 | \begin{methoddesc}[Header]{__str__}{} |
| 122 | A synonym for \method{Header.encode()}. Useful for |
Barry Warsaw | 5db478f | 2002-10-01 04:33:16 +0000 | [diff] [blame] | 123 | \code{str(aHeader)}. |
Barry Warsaw | 5b9da89 | 2002-10-01 01:05:52 +0000 | [diff] [blame] | 124 | \end{methoddesc} |
| 125 | |
| 126 | \begin{methoddesc}[Header]{__unicode__}{} |
| 127 | A helper for the built-in \function{unicode()} function. Returns the |
| 128 | header as a Unicode string. |
| 129 | \end{methoddesc} |
| 130 | |
| 131 | \begin{methoddesc}[Header]{__eq__}{other} |
| 132 | This method allows you to compare two \class{Header} instances for equality. |
| 133 | \end{methoddesc} |
| 134 | |
| 135 | \begin{methoddesc}[Header]{__ne__}{other} |
| 136 | This method allows you to compare two \class{Header} instances for inequality. |
| 137 | \end{methoddesc} |
| 138 | |
| 139 | The \module{email.Header} module also provides the following |
| 140 | convenient functions. |
| 141 | |
| 142 | \begin{funcdesc}{decode_header}{header} |
| 143 | Decode a message header value without converting the character set. |
| 144 | The header value is in \var{header}. |
| 145 | |
| 146 | This function returns a list of \code{(decoded_string, charset)} pairs |
| 147 | containing each of the decoded parts of the header. \var{charset} is |
| 148 | \code{None} for non-encoded parts of the header, otherwise a lower |
| 149 | case string containing the name of the character set specified in the |
| 150 | encoded string. |
| 151 | |
| 152 | Here's an example: |
| 153 | |
| 154 | \begin{verbatim} |
| 155 | >>> from email.Header import decode_header |
| 156 | >>> decode_header('=?iso-8859-1?q?p=F6stal?=') |
| 157 | [('p\\xf6stal', 'iso-8859-1')] |
| 158 | \end{verbatim} |
| 159 | \end{funcdesc} |
| 160 | |
| 161 | \begin{funcdesc}{make_header}{decoded_seq\optional{, maxlinelen\optional{, |
| 162 | header_name\optional{, continuation_ws}}}} |
| 163 | Create a \class{Header} instance from a sequence of pairs as returned |
| 164 | by \function{decode_header()}. |
| 165 | |
| 166 | \function{decode_header()} takes a header value string and returns a |
| 167 | sequence of pairs of the format \code{(decoded_string, charset)} where |
| 168 | \var{charset} is the name of the character set. |
| 169 | |
| 170 | This function takes one of those sequence of pairs and returns a |
| 171 | \class{Header} instance. Optional \var{maxlinelen}, |
| 172 | \var{header_name}, and \var{continuation_ws} are as in the |
| 173 | \class{Header} constructor. |
| 174 | \end{funcdesc} |