| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`email`: Internationalized headers | 
 | 2 | --------------------------------------- | 
 | 3 |  | 
 | 4 | .. module:: email.header | 
 | 5 |    :synopsis: Representing non-ASCII headers | 
 | 6 |  | 
 | 7 |  | 
 | 8 | :rfc:`2822` is the base standard that describes the format of email messages. | 
 | 9 | It derives from the older :rfc:`822` standard which came into widespread use at | 
 | 10 | a time when most email was composed of ASCII characters only.  :rfc:`2822` is a | 
 | 11 | specification written assuming email contains only 7-bit ASCII characters. | 
 | 12 |  | 
 | 13 | Of course, as email has been deployed worldwide, it has become | 
 | 14 | internationalized, such that language specific character sets can now be used in | 
 | 15 | email messages.  The base standard still requires email messages to be | 
 | 16 | transferred using only 7-bit ASCII characters, so a slew of RFCs have been | 
 | 17 | written describing how to encode email containing non-ASCII characters into | 
 | 18 | :rfc:`2822`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`, | 
 | 19 | :rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards | 
 | 20 | in its :mod:`email.header` and :mod:`email.charset` modules. | 
 | 21 |  | 
 | 22 | If you want to include non-ASCII characters in your email headers, say in the | 
 | 23 | :mailheader:`Subject` or :mailheader:`To` fields, you should use the | 
| Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 24 | :class:`Header` class and assign the field in the :class:`~email.message.Message` | 
 | 25 | object to an instance of :class:`Header` instead of using a string for the header | 
 | 26 | value.  Import the :class:`Header` class from the :mod:`email.header` module. | 
 | 27 | For example:: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 28 |  | 
 | 29 |    >>> from email.message import Message | 
 | 30 |    >>> from email.header import Header | 
 | 31 |    >>> msg = Message() | 
 | 32 |    >>> h = Header('p\xf6stal', 'iso-8859-1') | 
 | 33 |    >>> msg['Subject'] = h | 
| Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 34 |    >>> print(msg.as_string()) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 35 |    Subject: =?iso-8859-1?q?p=F6stal?= | 
 | 36 |  | 
 | 37 |  | 
 | 38 |  | 
 | 39 | Notice here how we wanted the :mailheader:`Subject` field to contain a non-ASCII | 
 | 40 | character?  We did this by creating a :class:`Header` instance and passing in | 
 | 41 | the character set that the byte string was encoded in.  When the subsequent | 
| Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 42 | :class:`~email.message.Message` instance was flattened, the :mailheader:`Subject` | 
 | 43 | field was properly :rfc:`2047` encoded.  MIME-aware mail readers would show this | 
 | 44 | header using the embedded ISO-8859-1 character. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 46 | Here is the :class:`Header` class description: | 
 | 47 |  | 
 | 48 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 49 | .. class:: Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 50 |  | 
 | 51 |    Create a MIME-compliant header that can contain strings in different character | 
 | 52 |    sets. | 
 | 53 |  | 
 | 54 |    Optional *s* is the initial header value.  If ``None`` (the default), the | 
 | 55 |    initial header value is not set.  You can later append to the header with | 
| Georg Brandl | f694518 | 2008-02-01 11:56:49 +0000 | [diff] [blame] | 56 |    :meth:`append` method calls.  *s* may be an instance of :class:`bytes` or | 
 | 57 |    :class:`str`, but see the :meth:`append` documentation for semantics. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 58 |  | 
 | 59 |    Optional *charset* serves two purposes: it has the same meaning as the *charset* | 
 | 60 |    argument to the :meth:`append` method.  It also sets the default character set | 
 | 61 |    for all subsequent :meth:`append` calls that omit the *charset* argument.  If | 
 | 62 |    *charset* is not provided in the constructor (the default), the ``us-ascii`` | 
 | 63 |    character set is used both as *s*'s initial charset and as the default for | 
 | 64 |    subsequent :meth:`append` calls. | 
 | 65 |  | 
| R. David Murray | 43b2f45 | 2011-02-11 03:13:19 +0000 | [diff] [blame] | 66 |    The maximum line length can be specified explicitly via *maxlinelen*.  For | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 67 |    splitting the first line to a shorter value (to account for the field header | 
 | 68 |    which isn't included in *s*, e.g. :mailheader:`Subject`) pass in the name of the | 
 | 69 |    field in *header_name*.  The default *maxlinelen* is 76, and the default value | 
 | 70 |    for *header_name* is ``None``, meaning it is not taken into account for the | 
 | 71 |    first line of a long, split header. | 
 | 72 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 73 |    Optional *continuation_ws* must be :rfc:`2822`\ -compliant folding | 
 | 74 |    whitespace, and is usually either a space or a hard tab character.  This | 
 | 75 |    character will be prepended to continuation lines.  *continuation_ws* | 
 | 76 |    defaults to a single space character. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 77 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 78 |    Optional *errors* is passed straight through to the :meth:`append` method. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 79 |  | 
 | 80 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 81 |    .. method:: append(s, charset=None, errors='strict') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 82 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 83 |       Append the string *s* to the MIME header. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 84 |  | 
| Georg Brandl | 3638e48 | 2009-04-27 16:46:17 +0000 | [diff] [blame] | 85 |       Optional *charset*, if given, should be a :class:`~email.charset.Charset` | 
 | 86 |       instance (see :mod:`email.charset`) or the name of a character set, which | 
 | 87 |       will be converted to a :class:`~email.charset.Charset` instance.  A value | 
 | 88 |       of ``None`` (the default) means that the *charset* given in the constructor | 
 | 89 |       is used. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 90 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 91 |       *s* may be an instance of :class:`bytes` or :class:`str`.  If it is an | 
 | 92 |       instance of :class:`bytes`, then *charset* is the encoding of that byte | 
 | 93 |       string, and a :exc:`UnicodeError` will be raised if the string cannot be | 
 | 94 |       decoded with that character set. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 95 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 96 |       If *s* is an instance of :class:`str`, then *charset* is a hint specifying | 
| R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 97 |       the character set of the characters in the string. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 98 |  | 
| R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 99 |       In either case, when producing an :rfc:`2822`\ -compliant header using | 
 | 100 |       :rfc:`2047` rules, the string will be encoded using the output codec of | 
 | 101 |       the charset.  If the string cannot be encoded using the output codec, a | 
 | 102 |       UnicodeError will be raised. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 103 |  | 
| R. David Murray | f9844c8 | 2011-01-05 01:47:38 +0000 | [diff] [blame] | 104 |       Optional *errors* is passed as the errors argument to the decode call | 
 | 105 |       if *s* is a byte string. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 106 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 107 |    .. method:: encode(splitchars=';, \\t', maxlinelen=None) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 108 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 109 |       Encode a message header into an RFC-compliant format, possibly wrapping | 
 | 110 |       long lines and encapsulating non-ASCII parts in base64 or quoted-printable | 
 | 111 |       encodings.  Optional *splitchars* is a string containing characters to | 
 | 112 |       split long ASCII lines on, in rough support of :rfc:`2822`'s *highest | 
 | 113 |       level syntactic breaks*.  This doesn't affect :rfc:`2047` encoded lines. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 114 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 115 |       *maxlinelen*, if given, overrides the instance's value for the maximum | 
 | 116 |       line length. | 
 | 117 |  | 
 | 118 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 119 |    The :class:`Header` class also provides a number of methods to support | 
 | 120 |    standard operators and built-in functions. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 121 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 122 |    .. method:: __str__() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 123 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 124 |       A helper for :class:`str`'s :func:`encode` method.  Returns the header as | 
 | 125 |       a Unicode string. | 
 | 126 |  | 
| Georg Brandl | 41d0815 | 2011-01-09 08:01:46 +0000 | [diff] [blame] | 127 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 128 |    .. method:: __eq__(other) | 
 | 129 |  | 
 | 130 |       This method allows you to compare two :class:`Header` instances for | 
 | 131 |       equality. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 132 |  | 
 | 133 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 134 |    .. method:: __ne__(other) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 |  | 
| Benjamin Peterson | e41251e | 2008-04-25 01:59:09 +0000 | [diff] [blame] | 136 |       This method allows you to compare two :class:`Header` instances for | 
 | 137 |       inequality. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 |  | 
 | 139 | The :mod:`email.header` module also provides the following convenient functions. | 
 | 140 |  | 
 | 141 |  | 
 | 142 | .. function:: decode_header(header) | 
 | 143 |  | 
 | 144 |    Decode a message header value without converting the character set. The header | 
 | 145 |    value is in *header*. | 
 | 146 |  | 
 | 147 |    This function returns a list of ``(decoded_string, charset)`` pairs containing | 
 | 148 |    each of the decoded parts of the header.  *charset* is ``None`` for non-encoded | 
 | 149 |    parts of the header, otherwise a lower case string containing the name of the | 
 | 150 |    character set specified in the encoded string. | 
 | 151 |  | 
 | 152 |    Here's an example:: | 
 | 153 |  | 
 | 154 |       >>> from email.header import decode_header | 
 | 155 |       >>> decode_header('=?iso-8859-1?q?p=F6stal?=') | 
 | 156 |       [('p\xf6stal', 'iso-8859-1')] | 
 | 157 |  | 
 | 158 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 159 | .. function:: make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 |  | 
 | 161 |    Create a :class:`Header` instance from a sequence of pairs as returned by | 
 | 162 |    :func:`decode_header`. | 
 | 163 |  | 
 | 164 |    :func:`decode_header` takes a header value string and returns a sequence of | 
 | 165 |    pairs of the format ``(decoded_string, charset)`` where *charset* is the name of | 
 | 166 |    the character set. | 
 | 167 |  | 
| Georg Brandl | 3f076d8 | 2009-05-17 11:28:33 +0000 | [diff] [blame] | 168 |    This function takes one of those sequence of pairs and returns a | 
 | 169 |    :class:`Header` instance.  Optional *maxlinelen*, *header_name*, and | 
 | 170 |    *continuation_ws* are as in the :class:`Header` constructor. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 171 |  |