blob: 0707bd858ac94920cc12c91ec1216a86193dad48 [file] [log] [blame]
R David Murrayea976682012-05-27 15:03:38 -04001:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5 :synopsis: Automatic Parsing of headers based on the field name
6
R David Murray79cf3ba2012-05-27 17:10:36 -04007.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040010.. versionadded:: 3.3
11 as a :term:`provisional module <provisional package>`.
12
13**Source code:** :source:`Lib/email/headerregistry.py`
R David Murray79cf3ba2012-05-27 17:10:36 -040014
R David Murrayea976682012-05-27 15:03:38 -040015.. note::
16
17 The headerregistry module has been included in the standard library on a
18 :term:`provisional basis <provisional package>`. Backwards incompatible
19 changes (up to and including removal of the module) may occur if deemed
20 necessary by the core developers.
21
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040022--------------
R David Murrayea976682012-05-27 15:03:38 -040023
24Headers are represented by customized subclasses of :class:`str`. The
25particular class used to represent a given header is determined by the
26:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
27effect when the headers are created. This section documents the particular
28``header_factory`` implemented by the email package for handling :RFC:`5322`
29compliant email messages, which not only provides customized header objects for
30various header types, but also provides an extension mechanism for applications
31to add their own custom header types.
32
33When using any of the policy objects derived from
34:data:`~email.policy.EmailPolicy`, all headers are produced by
35:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
36class. Each header class has an additional base class that is determined by
37the type of the header. For example, many headers have the class
38:class:`.UnstructuredHeader` as their other base class. The specialized second
39class for a header is determined by the name of the header, using a lookup
40table stored in the :class:`.HeaderRegistry`. All of this is managed
41transparently for the typical application program, but interfaces are provided
42for modifying the default behavior for use by more complex applications.
43
44The sections below first document the header base classes and their attributes,
45followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
46finally the support classes used to represent the data parsed from structured
47headers.
48
49
50.. class:: BaseHeader(name, value)
51
52 *name* and *value* are passed to ``BaseHeader`` from the
53 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
54 any header object is the *value* fully decoded to unicode.
55
56 This base class defines the following read-only properties:
57
58
59 .. attribute:: name
60
61 The name of the header (the portion of the field before the ':'). This
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030062 is exactly the value passed in the
63 :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
64 is, case is preserved.
R David Murrayea976682012-05-27 15:03:38 -040065
66
67 .. attribute:: defects
68
69 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
70 RFC compliance problems found during parsing. The email package tries to
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030071 be complete about detecting compliance issues. See the :mod:`~email.errors`
R David Murrayea976682012-05-27 15:03:38 -040072 module for a discussion of the types of defects that may be reported.
73
74
75 .. attribute:: max_count
76
77 The maximum number of headers of this type that can have the same
78 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
79 for this attribute is ``None``; it is expected that specialized header
80 classes will override this value as needed.
81
82 ``BaseHeader`` also provides the following method, which is called by the
83 email library code and should not in general be called by application
84 programs:
85
86 .. method:: fold(*, policy)
87
88 Return a string containing :attr:`~email.policy.Policy.linesep`
89 characters as required to correctly fold the header according
90 to *policy*. A :attr:`~email.policy.Policy.cte_type` of
91 ``8bit`` will be treated as if it were ``7bit``, since strings
92 may not contain binary data.
93
94
95 ``BaseHeader`` by itself cannot be used to create a header object. It
96 defines a protocol that each specialized header cooperates with in order to
97 produce the header object. Specifically, ``BaseHeader`` requires that
98 the specialized class provide a :func:`classmethod` named ``parse``. This
99 method is called as follows::
100
101 parse(string, kwds)
102
103 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
104 ``defects`` is an empty list. The parse method should append any detected
105 defects to this list. On return, the ``kwds`` dictionary *must* contain
106 values for at least the keys ``decoded`` and ``defects``. ``decoded``
107 should be the string value for the header (that is, the header value fully
108 decoded to unicode). The parse method should assume that *string* may
109 contain transport encoded parts, but should correctly handle all valid
110 unicode characters as well so that it can parse un-encoded header values.
111
112 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
113 ``init`` method. The specialized class only needs to provide an ``init``
114 method if it wishes to set additional attributes beyond those provided by
115 ``BaseHeader`` itself. Such an ``init`` method should look like this::
116
117 def init(self, *args, **kw):
118 self._myattr = kw.pop('myattr')
119 super().init(*args, **kw)
120
121 That is, anything extra that the specialized class puts in to the ``kwds``
122 dictionary should be removed and handled, and the remaining contents of
123 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
124
125
126.. class:: UnstructuredHeader
127
128 An "unstructured" header is the default type of header in :rfc:`5322`.
129 Any header that does not have a specified syntax is treated as
130 unstructured. The classic example of an unstructured header is the
131 :mailheader:`Subject` header.
132
133 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
134 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
135 mechanism for encoding non-ASCII text as ASCII characters within a header
136 value. When a *value* containing encoded words is passed to the
137 constructor, the ``UnstructuredHeader`` parser converts such encoded words
138 back in to the original unicode, following the :rfc:`2047` rules for
139 unstructured text. The parser uses heuristics to attempt to decode certain
140 non-compliant encoded words. Defects are registered in such cases, as well
141 as defects for issues such as invalid characters within the encoded words or
142 the non-encoded text.
143
144 This header type provides no additional attributes.
145
146
147.. class:: DateHeader
148
149 :rfc:`5322` specifies a very specific format for dates within email headers.
150 The ``DateHeader`` parser recognizes that date format, as well as
151 recognizing a number of variant forms that are sometimes found "in the
152 wild".
153
154 This header type provides the following additional attributes:
155
156 .. attribute:: datetime
157
158 If the header value can be recognized as a valid date of one form or
159 another, this attribute will contain a :class:`~datetime.datetime`
160 instance representing that date. If the timezone of the input date is
161 specified as ``-0000`` (indicating it is in UTC but contains no
162 information about the source timezone), then :attr:`.datetime` will be a
163 naive :class:`~datetime.datetime`. If a specific timezone offset is
164 found (including `+0000`), then :attr:`.datetime` will contain an aware
165 ``datetime`` that uses :class:`datetime.timezone` to record the timezone
166 offset.
167
168 The ``decoded`` value of the header is determined by formatting the
169 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
170
171 email.utils.format_datetime(self.datetime)
172
173 When creating a ``DateHeader``, *value* may be
174 :class:`~datetime.datetime` instance. This means, for example, that
175 the following code is valid and does what one would expect::
176
Serhiy Storchakadba90392016-05-10 12:01:23 +0300177 msg['Date'] = datetime(2011, 7, 15, 21)
R David Murrayea976682012-05-27 15:03:38 -0400178
179 Because this is a naive ``datetime`` it will be interpreted as a UTC
180 timestamp, and the resulting value will have a timezone of ``-0000``. Much
181 more useful is to use the :func:`~email.utils.localtime` function from the
182 :mod:`~email.utils` module::
183
184 msg['Date'] = utils.localtime()
185
186 This example sets the date header to the current time and date using
187 the current timezone offset.
188
189
190.. class:: AddressHeader
191
192 Address headers are one of the most complex structured header types.
193 The ``AddressHeader`` class provides a generic interface to any address
194 header.
195
196 This header type provides the following additional attributes:
197
198
199 .. attribute:: groups
200
201 A tuple of :class:`.Group` objects encoding the
202 addresses and groups found in the header value. Addresses that are
203 not part of a group are represented in this list as single-address
204 ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
205
206
207 .. attribute:: addresses
208
209 A tuple of :class:`.Address` objects encoding all
210 of the individual addresses from the header value. If the header value
211 contains any groups, the individual addresses from the group are included
212 in the list at the point where the group occurs in the value (that is,
213 the list of addresses is "flattened" into a one dimensional list).
214
215 The ``decoded`` value of the header will have all encoded words decoded to
216 unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
217 ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
218 the elements of the ``groups`` attribute with ``', '``.
219
220 A list of :class:`.Address` and :class:`.Group` objects in any combination
221 may be used to set the value of an address header. ``Group`` objects whose
222 ``display_name`` is ``None`` will be interpreted as single addresses, which
223 allows an address list to be copied with groups intact by using the list
224 obtained ``groups`` attribute of the source header.
225
226
227.. class:: SingleAddressHeader
228
229 A subclass of :class:`.AddressHeader` that adds one
230 additional attribute:
231
232
233 .. attribute:: address
234
235 The single address encoded by the header value. If the header value
236 actually contains more than one address (which would be a violation of
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300237 the RFC under the default :mod:`~email.policy`), accessing this attribute
238 will result in a :exc:`ValueError`.
R David Murrayea976682012-05-27 15:03:38 -0400239
240
R David Murray97f43c02012-06-24 05:03:27 -0400241Many of the above classes also have a ``Unique`` variant (for example,
R David Murrayea976682012-05-27 15:03:38 -0400242``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
243variant, :attr:`~.BaseHeader.max_count` is set to 1.
244
245
R David Murray97f43c02012-06-24 05:03:27 -0400246.. class:: MIMEVersionHeader
247
248 There is really only one valid value for the :mailheader:`MIME-Version`
249 header, and that is ``1.0``. For future proofing, this header class
250 supports other valid version numbers. If a version number has a valid value
251 per :rfc:`2045`, then the header object will have non-``None`` values for
252 the following attributes:
253
254 .. attribute:: version
255
256 The version number as a string, with any whitespace and/or comments
257 removed.
258
259 .. attribute:: major
260
261 The major version number as an integer
262
263 .. attribute:: minor
264
265 The minor version number as an integer
266
267
268.. class:: ParameterizedMIMEHeader
269
270 MOME headers all start with the prefix 'Content-'. Each specific header has
271 a certain value, described under the class for that header. Some can
272 also take a list of supplemental parameters, which have a common format.
273 This class serves as a base for all the MIME headers that take parameters.
274
Georg Brandl7ac2af72012-06-24 11:56:47 +0200275 .. attribute:: params
R David Murray97f43c02012-06-24 05:03:27 -0400276
277 A dictionary mapping parameter names to parameter values.
278
279
280.. class:: ContentTypeHeader
281
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300282 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400283 :mailheader:`Content-Type` header.
284
285 .. attribute:: content_type
286
287 The content type string, in the form ``maintype/subtype``.
288
289 .. attribute:: maintype
290
291 .. attribute:: subtype
292
293
294.. class:: ContentDispositionHeader
295
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300296 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400297 :mailheader:`Content-Disposition` header.
298
299 .. attribute:: content-disposition
300
301 ``inline`` and ``attachment`` are the only valid values in common use.
302
303
304.. class:: ContentTransferEncoding
305
306 Handles the :mailheader:`Content-Transfer-Encoding` header.
307
308 .. attribute:: cte
309
310 Valid values are ``7bit``, ``8bit``, ``base64``, and
311 ``quoted-printable``. See :rfc:`2045` for more information.
312
313
314
R David Murrayea976682012-05-27 15:03:38 -0400315.. class:: HeaderRegistry(base_class=BaseHeader, \
316 default_class=UnstructuredHeader, \
317 use_default_map=True)
318
319 This is the factory used by :class:`~email.policy.EmailPolicy` by default.
320 ``HeaderRegistry`` builds the class used to create a header instance
321 dynamically, using *base_class* and a specialized class retrieved from a
322 registry that it holds. When a given header name does not appear in the
323 registry, the class specified by *default_class* is used as the specialized
324 class. When *use_default_map* is ``True`` (the default), the standard
325 mapping of header names to classes is copied in to the registry during
326 initialization. *base_class* is always the last class in the generated
327 class's ``__bases__`` list.
328
329 The default mappings are:
330
331 :subject: UniqueUnstructuredHeader
332 :date: UniqueDateHeader
333 :resent-date: DateHeader
334 :orig-date: UniqueDateHeader
335 :sender: UniqueSingleAddressHeader
336 :resent-sender: SingleAddressHeader
337 :to: UniqueAddressHeader
338 :resent-to: AddressHeader
339 :cc: UniqueAddressHeader
340 :resent-cc: AddressHeader
341 :from: UniqueAddressHeader
342 :resent-from: AddressHeader
343 :reply-to: UniqueAddressHeader
344
345 ``HeaderRegistry`` has the following methods:
346
347
348 .. method:: map_to_type(self, name, cls)
349
350 *name* is the name of the header to be mapped. It will be converted to
351 lower case in the registry. *cls* is the specialized class to be used,
352 along with *base_class*, to create the class used to instantiate headers
353 that match *name*.
354
355
356 .. method:: __getitem__(name)
357
358 Construct and return a class to handle creating a *name* header.
359
360
361 .. method:: __call__(name, value)
362
363 Retrieves the specialized header associated with *name* from the
364 registry (using *default_class* if *name* does not appear in the
365 registry) and composes it with *base_class* to produce a class,
366 calls the constructed class's constructor, passing it the same
367 argument list, and finally returns the class instance created thereby.
368
369
370The following classes are the classes used to represent data parsed from
371structured headers and can, in general, be used by an application program to
372construct structured values to assign to specific headers.
373
374
375.. class:: Address(display_name='', username='', domain='', addr_spec=None)
376
377 The class used to represent an email address. The general form of an
378 address is::
379
380 [display_name] <username@domain>
381
382 or::
383
384 username@domain
385
386 where each part must conform to specific syntax rules spelled out in
387 :rfc:`5322`.
388
389 As a convenience *addr_spec* can be specified instead of *username* and
390 *domain*, in which case *username* and *domain* will be parsed from the
391 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
392 not ``Address`` will raise an error. Unicode characters are allowed and
393 will be property encoded when serialized. However, per the RFCs, unicode is
394 *not* allowed in the username portion of the address.
395
396 .. attribute:: display_name
397
398 The display name portion of the address, if any, with all quoting
399 removed. If the address does not have a display name, this attribute
400 will be an empty string.
401
402 .. attribute:: username
403
404 The ``username`` portion of the address, with all quoting removed.
405
406 .. attribute:: domain
407
408 The ``domain`` portion of the address.
409
410 .. attribute:: addr_spec
411
412 The ``username@domain`` portion of the address, correctly quoted
413 for use as a bare address (the second form shown above). This
414 attribute is not mutable.
415
416 .. method:: __str__()
417
418 The ``str`` value of the object is the address quoted according to
419 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
420 characters.
421
422 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
423 ``username`` and ``domain`` are both the empty string (or ``None``), then
424 the string value of the ``Address`` is ``<>``.
425
426
427.. class:: Group(display_name=None, addresses=None)
428
429 The class used to represent an address group. The general form of an
430 address group is::
431
432 display_name: [address-list];
433
434 As a convenience for processing lists of addresses that consist of a mixture
435 of groups and single addresses, a ``Group`` may also be used to represent
436 single addresses that are not part of a group by setting *display_name* to
437 ``None`` and providing a list of the single address as *addresses*.
438
439 .. attribute:: display_name
440
441 The ``display_name`` of the group. If it is ``None`` and there is
442 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
443 single address that is not in a group.
444
445 .. attribute:: addresses
446
447 A possibly empty tuple of :class:`.Address` objects representing the
448 addresses in the group.
449
450 .. method:: __str__()
451
452 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
453 but with no Content Transfer Encoding of any non-ASCII characters. If
454 ``display_name`` is none and there is a single ``Address`` in the
455 ``addresses`` list, the ``str`` value will be the same as the ``str`` of
456 that single ``Address``.