blob: 2c830cfd81fb47ea83afcc214122633f0e2018b4 [file] [log] [blame]
R David Murrayea976682012-05-27 15:03:38 -04001:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5 :synopsis: Automatic Parsing of headers based on the field name
6
R David Murray79cf3ba2012-05-27 17:10:36 -04007.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040010**Source code:** :source:`Lib/email/headerregistry.py`
R David Murray79cf3ba2012-05-27 17:10:36 -040011
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040012--------------
R David Murrayea976682012-05-27 15:03:38 -040013
R David Murray7f730cf2016-09-08 18:28:43 -040014.. versionadded:: 3.6 [1]_
R David Murray29d1bc02016-09-07 21:15:59 -040015
R David Murrayea976682012-05-27 15:03:38 -040016Headers are represented by customized subclasses of :class:`str`. The
17particular class used to represent a given header is determined by the
18:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
19effect when the headers are created. This section documents the particular
20``header_factory`` implemented by the email package for handling :RFC:`5322`
21compliant email messages, which not only provides customized header objects for
22various header types, but also provides an extension mechanism for applications
23to add their own custom header types.
24
25When using any of the policy objects derived from
26:data:`~email.policy.EmailPolicy`, all headers are produced by
27:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
28class. Each header class has an additional base class that is determined by
29the type of the header. For example, many headers have the class
30:class:`.UnstructuredHeader` as their other base class. The specialized second
31class for a header is determined by the name of the header, using a lookup
32table stored in the :class:`.HeaderRegistry`. All of this is managed
33transparently for the typical application program, but interfaces are provided
34for modifying the default behavior for use by more complex applications.
35
36The sections below first document the header base classes and their attributes,
37followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
38finally the support classes used to represent the data parsed from structured
39headers.
40
41
42.. class:: BaseHeader(name, value)
43
44 *name* and *value* are passed to ``BaseHeader`` from the
45 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
46 any header object is the *value* fully decoded to unicode.
47
48 This base class defines the following read-only properties:
49
50
51 .. attribute:: name
52
53 The name of the header (the portion of the field before the ':'). This
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030054 is exactly the value passed in the
55 :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
56 is, case is preserved.
R David Murrayea976682012-05-27 15:03:38 -040057
58
59 .. attribute:: defects
60
61 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
62 RFC compliance problems found during parsing. The email package tries to
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030063 be complete about detecting compliance issues. See the :mod:`~email.errors`
R David Murrayea976682012-05-27 15:03:38 -040064 module for a discussion of the types of defects that may be reported.
65
66
67 .. attribute:: max_count
68
69 The maximum number of headers of this type that can have the same
70 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
71 for this attribute is ``None``; it is expected that specialized header
72 classes will override this value as needed.
73
74 ``BaseHeader`` also provides the following method, which is called by the
75 email library code and should not in general be called by application
76 programs:
77
78 .. method:: fold(*, policy)
79
80 Return a string containing :attr:`~email.policy.Policy.linesep`
R David Murray29d1bc02016-09-07 21:15:59 -040081 characters as required to correctly fold the header according to
82 *policy*. A :attr:`~email.policy.Policy.cte_type` of ``8bit`` will be
83 treated as if it were ``7bit``, since headers may not contain arbitrary
84 binary data. If :attr:`~email.policy.EmailPolicy.utf8` is ``False``,
85 non-ASCII data will be :rfc:`2047` encoded.
R David Murrayea976682012-05-27 15:03:38 -040086
87
88 ``BaseHeader`` by itself cannot be used to create a header object. It
89 defines a protocol that each specialized header cooperates with in order to
90 produce the header object. Specifically, ``BaseHeader`` requires that
91 the specialized class provide a :func:`classmethod` named ``parse``. This
92 method is called as follows::
93
94 parse(string, kwds)
95
96 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
97 ``defects`` is an empty list. The parse method should append any detected
98 defects to this list. On return, the ``kwds`` dictionary *must* contain
99 values for at least the keys ``decoded`` and ``defects``. ``decoded``
100 should be the string value for the header (that is, the header value fully
101 decoded to unicode). The parse method should assume that *string* may
R David Murray29d1bc02016-09-07 21:15:59 -0400102 contain content-transfer-encoded parts, but should correctly handle all valid
R David Murrayea976682012-05-27 15:03:38 -0400103 unicode characters as well so that it can parse un-encoded header values.
104
105 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
106 ``init`` method. The specialized class only needs to provide an ``init``
107 method if it wishes to set additional attributes beyond those provided by
108 ``BaseHeader`` itself. Such an ``init`` method should look like this::
109
110 def init(self, *args, **kw):
111 self._myattr = kw.pop('myattr')
112 super().init(*args, **kw)
113
114 That is, anything extra that the specialized class puts in to the ``kwds``
115 dictionary should be removed and handled, and the remaining contents of
116 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
117
118
119.. class:: UnstructuredHeader
120
121 An "unstructured" header is the default type of header in :rfc:`5322`.
122 Any header that does not have a specified syntax is treated as
123 unstructured. The classic example of an unstructured header is the
124 :mailheader:`Subject` header.
125
126 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
127 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
128 mechanism for encoding non-ASCII text as ASCII characters within a header
129 value. When a *value* containing encoded words is passed to the
130 constructor, the ``UnstructuredHeader`` parser converts such encoded words
R David Murray29d1bc02016-09-07 21:15:59 -0400131 into unicode, following the :rfc:`2047` rules for unstructured text. The
132 parser uses heuristics to attempt to decode certain non-compliant encoded
133 words. Defects are registered in such cases, as well as defects for issues
134 such as invalid characters within the encoded words or the non-encoded text.
R David Murrayea976682012-05-27 15:03:38 -0400135
136 This header type provides no additional attributes.
137
138
139.. class:: DateHeader
140
141 :rfc:`5322` specifies a very specific format for dates within email headers.
142 The ``DateHeader`` parser recognizes that date format, as well as
143 recognizing a number of variant forms that are sometimes found "in the
144 wild".
145
146 This header type provides the following additional attributes:
147
148 .. attribute:: datetime
149
150 If the header value can be recognized as a valid date of one form or
151 another, this attribute will contain a :class:`~datetime.datetime`
152 instance representing that date. If the timezone of the input date is
153 specified as ``-0000`` (indicating it is in UTC but contains no
154 information about the source timezone), then :attr:`.datetime` will be a
155 naive :class:`~datetime.datetime`. If a specific timezone offset is
156 found (including `+0000`), then :attr:`.datetime` will contain an aware
157 ``datetime`` that uses :class:`datetime.timezone` to record the timezone
158 offset.
159
160 The ``decoded`` value of the header is determined by formatting the
161 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
162
163 email.utils.format_datetime(self.datetime)
164
165 When creating a ``DateHeader``, *value* may be
166 :class:`~datetime.datetime` instance. This means, for example, that
167 the following code is valid and does what one would expect::
168
Serhiy Storchakadba90392016-05-10 12:01:23 +0300169 msg['Date'] = datetime(2011, 7, 15, 21)
R David Murrayea976682012-05-27 15:03:38 -0400170
171 Because this is a naive ``datetime`` it will be interpreted as a UTC
172 timestamp, and the resulting value will have a timezone of ``-0000``. Much
173 more useful is to use the :func:`~email.utils.localtime` function from the
174 :mod:`~email.utils` module::
175
176 msg['Date'] = utils.localtime()
177
178 This example sets the date header to the current time and date using
179 the current timezone offset.
180
181
182.. class:: AddressHeader
183
184 Address headers are one of the most complex structured header types.
185 The ``AddressHeader`` class provides a generic interface to any address
186 header.
187
188 This header type provides the following additional attributes:
189
190
191 .. attribute:: groups
192
193 A tuple of :class:`.Group` objects encoding the
194 addresses and groups found in the header value. Addresses that are
195 not part of a group are represented in this list as single-address
196 ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
197
198
199 .. attribute:: addresses
200
201 A tuple of :class:`.Address` objects encoding all
202 of the individual addresses from the header value. If the header value
203 contains any groups, the individual addresses from the group are included
204 in the list at the point where the group occurs in the value (that is,
205 the list of addresses is "flattened" into a one dimensional list).
206
207 The ``decoded`` value of the header will have all encoded words decoded to
R David Murray29d1bc02016-09-07 21:15:59 -0400208 unicode. :class:`~encodings.idna` encoded domain names are also decoded to
209 unicode. The ``decoded`` value is set by :attr:`~str.join`\ ing the
210 :class:`str` value of the elements of the ``groups`` attribute with ``',
211 '``.
R David Murrayea976682012-05-27 15:03:38 -0400212
213 A list of :class:`.Address` and :class:`.Group` objects in any combination
214 may be used to set the value of an address header. ``Group`` objects whose
215 ``display_name`` is ``None`` will be interpreted as single addresses, which
216 allows an address list to be copied with groups intact by using the list
R David Murray29d1bc02016-09-07 21:15:59 -0400217 obtained from the ``groups`` attribute of the source header.
R David Murrayea976682012-05-27 15:03:38 -0400218
219
220.. class:: SingleAddressHeader
221
222 A subclass of :class:`.AddressHeader` that adds one
223 additional attribute:
224
225
226 .. attribute:: address
227
228 The single address encoded by the header value. If the header value
229 actually contains more than one address (which would be a violation of
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300230 the RFC under the default :mod:`~email.policy`), accessing this attribute
231 will result in a :exc:`ValueError`.
R David Murrayea976682012-05-27 15:03:38 -0400232
233
R David Murray97f43c02012-06-24 05:03:27 -0400234Many of the above classes also have a ``Unique`` variant (for example,
R David Murrayea976682012-05-27 15:03:38 -0400235``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
236variant, :attr:`~.BaseHeader.max_count` is set to 1.
237
238
R David Murray97f43c02012-06-24 05:03:27 -0400239.. class:: MIMEVersionHeader
240
241 There is really only one valid value for the :mailheader:`MIME-Version`
242 header, and that is ``1.0``. For future proofing, this header class
243 supports other valid version numbers. If a version number has a valid value
244 per :rfc:`2045`, then the header object will have non-``None`` values for
245 the following attributes:
246
247 .. attribute:: version
248
249 The version number as a string, with any whitespace and/or comments
250 removed.
251
252 .. attribute:: major
253
254 The major version number as an integer
255
256 .. attribute:: minor
257
258 The minor version number as an integer
259
260
261.. class:: ParameterizedMIMEHeader
262
R David Murray29d1bc02016-09-07 21:15:59 -0400263 MIME headers all start with the prefix 'Content-'. Each specific header has
R David Murray97f43c02012-06-24 05:03:27 -0400264 a certain value, described under the class for that header. Some can
265 also take a list of supplemental parameters, which have a common format.
266 This class serves as a base for all the MIME headers that take parameters.
267
Georg Brandl7ac2af72012-06-24 11:56:47 +0200268 .. attribute:: params
R David Murray97f43c02012-06-24 05:03:27 -0400269
270 A dictionary mapping parameter names to parameter values.
271
272
273.. class:: ContentTypeHeader
274
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300275 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400276 :mailheader:`Content-Type` header.
277
278 .. attribute:: content_type
279
280 The content type string, in the form ``maintype/subtype``.
281
282 .. attribute:: maintype
283
284 .. attribute:: subtype
285
286
287.. class:: ContentDispositionHeader
288
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300289 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400290 :mailheader:`Content-Disposition` header.
291
292 .. attribute:: content-disposition
293
294 ``inline`` and ``attachment`` are the only valid values in common use.
295
296
297.. class:: ContentTransferEncoding
298
299 Handles the :mailheader:`Content-Transfer-Encoding` header.
300
301 .. attribute:: cte
302
303 Valid values are ``7bit``, ``8bit``, ``base64``, and
304 ``quoted-printable``. See :rfc:`2045` for more information.
305
306
307
R David Murrayea976682012-05-27 15:03:38 -0400308.. class:: HeaderRegistry(base_class=BaseHeader, \
309 default_class=UnstructuredHeader, \
310 use_default_map=True)
311
312 This is the factory used by :class:`~email.policy.EmailPolicy` by default.
313 ``HeaderRegistry`` builds the class used to create a header instance
314 dynamically, using *base_class* and a specialized class retrieved from a
315 registry that it holds. When a given header name does not appear in the
316 registry, the class specified by *default_class* is used as the specialized
317 class. When *use_default_map* is ``True`` (the default), the standard
318 mapping of header names to classes is copied in to the registry during
319 initialization. *base_class* is always the last class in the generated
320 class's ``__bases__`` list.
321
322 The default mappings are:
323
324 :subject: UniqueUnstructuredHeader
325 :date: UniqueDateHeader
326 :resent-date: DateHeader
327 :orig-date: UniqueDateHeader
328 :sender: UniqueSingleAddressHeader
329 :resent-sender: SingleAddressHeader
330 :to: UniqueAddressHeader
331 :resent-to: AddressHeader
332 :cc: UniqueAddressHeader
333 :resent-cc: AddressHeader
334 :from: UniqueAddressHeader
335 :resent-from: AddressHeader
336 :reply-to: UniqueAddressHeader
337
338 ``HeaderRegistry`` has the following methods:
339
340
341 .. method:: map_to_type(self, name, cls)
342
343 *name* is the name of the header to be mapped. It will be converted to
344 lower case in the registry. *cls* is the specialized class to be used,
345 along with *base_class*, to create the class used to instantiate headers
346 that match *name*.
347
348
349 .. method:: __getitem__(name)
350
351 Construct and return a class to handle creating a *name* header.
352
353
354 .. method:: __call__(name, value)
355
356 Retrieves the specialized header associated with *name* from the
357 registry (using *default_class* if *name* does not appear in the
358 registry) and composes it with *base_class* to produce a class,
359 calls the constructed class's constructor, passing it the same
360 argument list, and finally returns the class instance created thereby.
361
362
363The following classes are the classes used to represent data parsed from
364structured headers and can, in general, be used by an application program to
365construct structured values to assign to specific headers.
366
367
368.. class:: Address(display_name='', username='', domain='', addr_spec=None)
369
370 The class used to represent an email address. The general form of an
371 address is::
372
373 [display_name] <username@domain>
374
375 or::
376
377 username@domain
378
379 where each part must conform to specific syntax rules spelled out in
380 :rfc:`5322`.
381
382 As a convenience *addr_spec* can be specified instead of *username* and
383 *domain*, in which case *username* and *domain* will be parsed from the
384 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
385 not ``Address`` will raise an error. Unicode characters are allowed and
386 will be property encoded when serialized. However, per the RFCs, unicode is
387 *not* allowed in the username portion of the address.
388
389 .. attribute:: display_name
390
391 The display name portion of the address, if any, with all quoting
392 removed. If the address does not have a display name, this attribute
393 will be an empty string.
394
395 .. attribute:: username
396
397 The ``username`` portion of the address, with all quoting removed.
398
399 .. attribute:: domain
400
401 The ``domain`` portion of the address.
402
403 .. attribute:: addr_spec
404
405 The ``username@domain`` portion of the address, correctly quoted
406 for use as a bare address (the second form shown above). This
407 attribute is not mutable.
408
409 .. method:: __str__()
410
411 The ``str`` value of the object is the address quoted according to
412 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
413 characters.
414
415 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
416 ``username`` and ``domain`` are both the empty string (or ``None``), then
417 the string value of the ``Address`` is ``<>``.
418
419
420.. class:: Group(display_name=None, addresses=None)
421
422 The class used to represent an address group. The general form of an
423 address group is::
424
425 display_name: [address-list];
426
427 As a convenience for processing lists of addresses that consist of a mixture
428 of groups and single addresses, a ``Group`` may also be used to represent
429 single addresses that are not part of a group by setting *display_name* to
430 ``None`` and providing a list of the single address as *addresses*.
431
432 .. attribute:: display_name
433
434 The ``display_name`` of the group. If it is ``None`` and there is
435 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
436 single address that is not in a group.
437
438 .. attribute:: addresses
439
440 A possibly empty tuple of :class:`.Address` objects representing the
441 addresses in the group.
442
443 .. method:: __str__()
444
445 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
446 but with no Content Transfer Encoding of any non-ASCII characters. If
447 ``display_name`` is none and there is a single ``Address`` in the
448 ``addresses`` list, the ``str`` value will be the same as the ``str`` of
449 that single ``Address``.
R David Murray7f730cf2016-09-08 18:28:43 -0400450
451
452.. rubric:: Footnotes
453
454.. [1] Oringally added in 3.3 as a :term:`provisional module <provisional
455 package>`