Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`email`: Internationalized headers |
| 2 | --------------------------------------- |
| 3 | |
| 4 | .. module:: email.header |
| 5 | :synopsis: Representing non-ASCII headers |
| 6 | |
| 7 | |
| 8 | :rfc:`2822` is the base standard that describes the format of email messages. |
| 9 | It derives from the older :rfc:`822` standard which came into widespread use at |
| 10 | a time when most email was composed of ASCII characters only. :rfc:`2822` is a |
| 11 | specification written assuming email contains only 7-bit ASCII characters. |
| 12 | |
| 13 | Of course, as email has been deployed worldwide, it has become |
| 14 | internationalized, such that language specific character sets can now be used in |
| 15 | email messages. The base standard still requires email messages to be |
| 16 | transferred using only 7-bit ASCII characters, so a slew of RFCs have been |
| 17 | written describing how to encode email containing non-ASCII characters into |
| 18 | :rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`, |
| 19 | :rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards |
| 20 | in its :mod:`email.header` and :mod:`email.charset` modules. |
| 21 | |
| 22 | If you want to include non-ASCII characters in your email headers, say in the |
| 23 | :mailheader:`Subject` or :mailheader:`To` fields, you should use the |
Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 24 | :class:`Header` class and assign the field in the :class:`~email.message.Message` |
| 25 | object to an instance of :class:`Header` instead of using a string for the header |
| 26 | value. Import the :class:`Header` class from the :mod:`email.header` module. |
| 27 | For example:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 28 | |
| 29 | >>> from email.message import Message |
| 30 | >>> from email.header import Header |
| 31 | >>> msg = Message() |
| 32 | >>> h = Header('p\xf6stal', 'iso-8859-1') |
| 33 | >>> msg['Subject'] = h |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 34 | >>> print(msg.as_string()) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 | Subject: =?iso-8859-1?q?p=F6stal?= |
| 36 | |
| 37 | |
| 38 | |
| 39 | Notice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII |
| 40 | character? We did this by creating a :class:`Header` instance and passing in |
| 41 | the character set that the byte string was encoded in. When the subsequent |
Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 42 | :class:`~email.message.Message` instance was flattened, the :mailheader:`Subject` |
| 43 | field was properly :rfc:`2047` encoded. MIME-aware mail readers would show this |
| 44 | header using the embedded ISO-8859-1 character. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 46 | Here is the :class:`Header` class description: |
| 47 | |
| 48 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 49 | .. class:: Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 50 | |
| 51 | Create a MIME-compliant header that can contain strings in different character |
| 52 | sets. |
| 53 | |
| 54 | Optional *s* is the initial header value. If ``None`` (the default), the |
| 55 | initial header value is not set. You can later append to the header with |
Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 56 | :meth:`append` method calls. *s* may be an instance of :class:`bytes` or |
| 57 | :class:`str`, but see the :meth:`append` documentation for semantics. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 58 | |
| 59 | Optional *charset* serves two purposes: it has the same meaning as the *charset* |
| 60 | argument to the :meth:`append` method. It also sets the default character set |
| 61 | for all subsequent :meth:`append` calls that omit the *charset* argument. If |
| 62 | *charset* is not provided in the constructor (the default), the ``us-ascii`` |
| 63 | character set is used both as *s*'s initial charset and as the default for |
| 64 | subsequent :meth:`append` calls. |
| 65 | |
| 66 | The maximum line length can be specified explicit via *maxlinelen*. For |
| 67 | splitting the first line to a shorter value (to account for the field header |
| 68 | which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the |
| 69 | field in *header_name*. The default *maxlinelen* is 76, and the default value |
| 70 | for *header_name* is ``None``, meaning it is not taken into account for the |
| 71 | first line of a long, split header. |
| 72 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 73 | Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding |
| 74 | whitespace, and is usually either a space or a hard tab character. This |
| 75 | character will be prepended to continuation lines. *continuation_ws* |
| 76 | defaults to a single space character. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 77 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 78 | Optional *errors* is passed straight through to the :meth:`append` method. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 79 | |
| 80 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 81 | .. method:: append(s, charset=None, errors='strict') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 82 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 83 | Append the string *s* to the MIME header. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 84 | |
Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 85 | Optional *charset*, if given, should be a :class:`~email.charset.Charset` |
| 86 | instance (see :mod:`email.charset`) or the name of a character set, which |
| 87 | will be converted to a :class:`~email.charset.Charset` instance. A value |
| 88 | of ``None`` (the default) means that the *charset* given in the constructor |
| 89 | is used. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 90 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 91 | *s* may be an instance of :class:`bytes` or :class:`str`. If it is an |
| 92 | instance of :class:`bytes`, then *charset* is the encoding of that byte |
| 93 | string, and a :exc:`UnicodeError` will be raised if the string cannot be |
| 94 | decoded with that character set. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 95 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 96 | If *s* is an instance of :class:`str`, then *charset* is a hint specifying |
R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 97 | the character set of the characters in the string. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 98 | |
R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 99 | In either case, when producing an :rfc:`2822`\ -compliant header using |
| 100 | :rfc:`2047` rules, the string will be encoded using the output codec of |
| 101 | the charset. If the string cannot be encoded using the output codec, a |
| 102 | UnicodeError will be raised. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 103 | |
R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 104 | Optional *errors* is passed as the errors argument to the decode call |
| 105 | if *s* is a byte string. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 106 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 107 | .. method:: encode(splitchars=';, \\t', maxlinelen=None) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 108 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 109 | Encode a message header into an RFC-compliant format, possibly wrapping |
| 110 | long lines and encapsulating non-ASCII parts in base64 or quoted-printable |
| 111 | encodings. Optional *splitchars* is a string containing characters to |
| 112 | split long ASCII lines on, in rough support of :rfc:`2822`'s *highest |
| 113 | level syntactic breaks*. This doesn't affect :rfc:`2047` encoded lines. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 114 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 115 | *maxlinelen*, if given, overrides the instance's value for the maximum |
| 116 | line length. |
| 117 | |
| 118 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 119 | The :class:`Header` class also provides a number of methods to support |
| 120 | standard operators and built-in functions. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 121 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 122 | .. method:: __str__() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 123 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 124 | A helper for :class:`str`'s :func:`encode` method. Returns the header as |
| 125 | a Unicode string. |
| 126 | |
Georg Brandl | 41d0815 | 2011-01-09 08:01:46 +0000 | [diff] [blame^] | 127 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 128 | .. method:: __eq__(other) |
| 129 | |
| 130 | This method allows you to compare two :class:`Header` instances for |
| 131 | equality. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 132 | |
| 133 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 134 | .. method:: __ne__(other) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 | |
Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 136 | This method allows you to compare two :class:`Header` instances for |
| 137 | inequality. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 | |
| 139 | The :mod:`email.header` module also provides the following convenient functions. |
| 140 | |
| 141 | |
| 142 | .. function:: decode_header(header) |
| 143 | |
| 144 | Decode a message header value without converting the character set. The header |
| 145 | value is in *header*. |
| 146 | |
| 147 | This function returns a list of ``(decoded_string, charset)`` pairs containing |
| 148 | each of the decoded parts of the header. *charset* is ``None`` for non-encoded |
| 149 | parts of the header, otherwise a lower case string containing the name of the |
| 150 | character set specified in the encoded string. |
| 151 | |
| 152 | Here's an example:: |
| 153 | |
| 154 | >>> from email.header import decode_header |
| 155 | >>> decode_header('=?iso-8859-1?q?p=F6stal?=') |
| 156 | [('p\xf6stal', 'iso-8859-1')] |
| 157 | |
| 158 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 159 | .. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | Create a :class:`Header` instance from a sequence of pairs as returned by |
| 162 | :func:`decode_header`. |
| 163 | |
| 164 | :func:`decode_header` takes a header value string and returns a sequence of |
| 165 | pairs of the format ``(decoded_string, charset)`` where *charset* is the name of |
| 166 | the character set. |
| 167 | |
Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 168 | This function takes one of those sequence of pairs and returns a |
| 169 | :class:`Header` instance. Optional *maxlinelen*, *header_name*, and |
| 170 | *continuation_ws* are as in the :class:`Header` constructor. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 171 | |