blob: 2e9224a639e342315ea2a16269a7225f5c4c7898 [file] [log] [blame]
R David Murrayea976682012-05-27 15:03:38 -04001:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5 :synopsis: Automatic Parsing of headers based on the field name
6
R David Murray79cf3ba2012-05-27 17:10:36 -04007.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
10
R David Murrayea976682012-05-27 15:03:38 -040011.. note::
12
13 The headerregistry module has been included in the standard library on a
14 :term:`provisional basis <provisional package>`. Backwards incompatible
15 changes (up to and including removal of the module) may occur if deemed
16 necessary by the core developers.
17
18.. versionadded:: 3.3
19 as a :term:`provisional module <provisional package>`
20
21Headers are represented by customized subclasses of :class:`str`. The
22particular class used to represent a given header is determined by the
23:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
24effect when the headers are created. This section documents the particular
25``header_factory`` implemented by the email package for handling :RFC:`5322`
26compliant email messages, which not only provides customized header objects for
27various header types, but also provides an extension mechanism for applications
28to add their own custom header types.
29
30When using any of the policy objects derived from
31:data:`~email.policy.EmailPolicy`, all headers are produced by
32:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
33class. Each header class has an additional base class that is determined by
34the type of the header. For example, many headers have the class
35:class:`.UnstructuredHeader` as their other base class. The specialized second
36class for a header is determined by the name of the header, using a lookup
37table stored in the :class:`.HeaderRegistry`. All of this is managed
38transparently for the typical application program, but interfaces are provided
39for modifying the default behavior for use by more complex applications.
40
41The sections below first document the header base classes and their attributes,
42followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
43finally the support classes used to represent the data parsed from structured
44headers.
45
46
47.. class:: BaseHeader(name, value)
48
49 *name* and *value* are passed to ``BaseHeader`` from the
50 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
51 any header object is the *value* fully decoded to unicode.
52
53 This base class defines the following read-only properties:
54
55
56 .. attribute:: name
57
58 The name of the header (the portion of the field before the ':'). This
59 is exactly the value passed in the :attr:`~EmailPolicy.header_factory`
60 call for *name*; that is, case is preserved.
61
62
63 .. attribute:: defects
64
65 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
66 RFC compliance problems found during parsing. The email package tries to
67 be complete about detecting compliance issues. See the :mod:`errors`
68 module for a discussion of the types of defects that may be reported.
69
70
71 .. attribute:: max_count
72
73 The maximum number of headers of this type that can have the same
74 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
75 for this attribute is ``None``; it is expected that specialized header
76 classes will override this value as needed.
77
78 ``BaseHeader`` also provides the following method, which is called by the
79 email library code and should not in general be called by application
80 programs:
81
82 .. method:: fold(*, policy)
83
84 Return a string containing :attr:`~email.policy.Policy.linesep`
85 characters as required to correctly fold the header according
86 to *policy*. A :attr:`~email.policy.Policy.cte_type` of
87 ``8bit`` will be treated as if it were ``7bit``, since strings
88 may not contain binary data.
89
90
91 ``BaseHeader`` by itself cannot be used to create a header object. It
92 defines a protocol that each specialized header cooperates with in order to
93 produce the header object. Specifically, ``BaseHeader`` requires that
94 the specialized class provide a :func:`classmethod` named ``parse``. This
95 method is called as follows::
96
97 parse(string, kwds)
98
99 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
100 ``defects`` is an empty list. The parse method should append any detected
101 defects to this list. On return, the ``kwds`` dictionary *must* contain
102 values for at least the keys ``decoded`` and ``defects``. ``decoded``
103 should be the string value for the header (that is, the header value fully
104 decoded to unicode). The parse method should assume that *string* may
105 contain transport encoded parts, but should correctly handle all valid
106 unicode characters as well so that it can parse un-encoded header values.
107
108 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
109 ``init`` method. The specialized class only needs to provide an ``init``
110 method if it wishes to set additional attributes beyond those provided by
111 ``BaseHeader`` itself. Such an ``init`` method should look like this::
112
113 def init(self, *args, **kw):
114 self._myattr = kw.pop('myattr')
115 super().init(*args, **kw)
116
117 That is, anything extra that the specialized class puts in to the ``kwds``
118 dictionary should be removed and handled, and the remaining contents of
119 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
120
121
122.. class:: UnstructuredHeader
123
124 An "unstructured" header is the default type of header in :rfc:`5322`.
125 Any header that does not have a specified syntax is treated as
126 unstructured. The classic example of an unstructured header is the
127 :mailheader:`Subject` header.
128
129 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
130 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
131 mechanism for encoding non-ASCII text as ASCII characters within a header
132 value. When a *value* containing encoded words is passed to the
133 constructor, the ``UnstructuredHeader`` parser converts such encoded words
134 back in to the original unicode, following the :rfc:`2047` rules for
135 unstructured text. The parser uses heuristics to attempt to decode certain
136 non-compliant encoded words. Defects are registered in such cases, as well
137 as defects for issues such as invalid characters within the encoded words or
138 the non-encoded text.
139
140 This header type provides no additional attributes.
141
142
143.. class:: DateHeader
144
145 :rfc:`5322` specifies a very specific format for dates within email headers.
146 The ``DateHeader`` parser recognizes that date format, as well as
147 recognizing a number of variant forms that are sometimes found "in the
148 wild".
149
150 This header type provides the following additional attributes:
151
152 .. attribute:: datetime
153
154 If the header value can be recognized as a valid date of one form or
155 another, this attribute will contain a :class:`~datetime.datetime`
156 instance representing that date. If the timezone of the input date is
157 specified as ``-0000`` (indicating it is in UTC but contains no
158 information about the source timezone), then :attr:`.datetime` will be a
159 naive :class:`~datetime.datetime`. If a specific timezone offset is
160 found (including `+0000`), then :attr:`.datetime` will contain an aware
161 ``datetime`` that uses :class:`datetime.timezone` to record the timezone
162 offset.
163
164 The ``decoded`` value of the header is determined by formatting the
165 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
166
167 email.utils.format_datetime(self.datetime)
168
169 When creating a ``DateHeader``, *value* may be
170 :class:`~datetime.datetime` instance. This means, for example, that
171 the following code is valid and does what one would expect::
172
173 msg['Date'] = datetime(2011, 7, 15, 21)
174
175 Because this is a naive ``datetime`` it will be interpreted as a UTC
176 timestamp, and the resulting value will have a timezone of ``-0000``. Much
177 more useful is to use the :func:`~email.utils.localtime` function from the
178 :mod:`~email.utils` module::
179
180 msg['Date'] = utils.localtime()
181
182 This example sets the date header to the current time and date using
183 the current timezone offset.
184
185
186.. class:: AddressHeader
187
188 Address headers are one of the most complex structured header types.
189 The ``AddressHeader`` class provides a generic interface to any address
190 header.
191
192 This header type provides the following additional attributes:
193
194
195 .. attribute:: groups
196
197 A tuple of :class:`.Group` objects encoding the
198 addresses and groups found in the header value. Addresses that are
199 not part of a group are represented in this list as single-address
200 ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
201
202
203 .. attribute:: addresses
204
205 A tuple of :class:`.Address` objects encoding all
206 of the individual addresses from the header value. If the header value
207 contains any groups, the individual addresses from the group are included
208 in the list at the point where the group occurs in the value (that is,
209 the list of addresses is "flattened" into a one dimensional list).
210
211 The ``decoded`` value of the header will have all encoded words decoded to
212 unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
213 ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
214 the elements of the ``groups`` attribute with ``', '``.
215
216 A list of :class:`.Address` and :class:`.Group` objects in any combination
217 may be used to set the value of an address header. ``Group`` objects whose
218 ``display_name`` is ``None`` will be interpreted as single addresses, which
219 allows an address list to be copied with groups intact by using the list
220 obtained ``groups`` attribute of the source header.
221
222
223.. class:: SingleAddressHeader
224
225 A subclass of :class:`.AddressHeader` that adds one
226 additional attribute:
227
228
229 .. attribute:: address
230
231 The single address encoded by the header value. If the header value
232 actually contains more than one address (which would be a violation of
233 the RFC under the default :mod:`policy`), accessing this attribute will
234 result in a :exc:`ValueError`.
235
236
R David Murray97f43c02012-06-24 05:03:27 -0400237Many of the above classes also have a ``Unique`` variant (for example,
R David Murrayea976682012-05-27 15:03:38 -0400238``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
239variant, :attr:`~.BaseHeader.max_count` is set to 1.
240
241
R David Murray97f43c02012-06-24 05:03:27 -0400242.. class:: MIMEVersionHeader
243
244 There is really only one valid value for the :mailheader:`MIME-Version`
245 header, and that is ``1.0``. For future proofing, this header class
246 supports other valid version numbers. If a version number has a valid value
247 per :rfc:`2045`, then the header object will have non-``None`` values for
248 the following attributes:
249
250 .. attribute:: version
251
252 The version number as a string, with any whitespace and/or comments
253 removed.
254
255 .. attribute:: major
256
257 The major version number as an integer
258
259 .. attribute:: minor
260
261 The minor version number as an integer
262
263
264.. class:: ParameterizedMIMEHeader
265
266 MOME headers all start with the prefix 'Content-'. Each specific header has
267 a certain value, described under the class for that header. Some can
268 also take a list of supplemental parameters, which have a common format.
269 This class serves as a base for all the MIME headers that take parameters.
270
271 .. attrbibute:: params
272
273 A dictionary mapping parameter names to parameter values.
274
275
276.. class:: ContentTypeHeader
277
278 A :class:`ParameterizedMIMEHheader` class that handles the
279 :mailheader:`Content-Type` header.
280
281 .. attribute:: content_type
282
283 The content type string, in the form ``maintype/subtype``.
284
285 .. attribute:: maintype
286
287 .. attribute:: subtype
288
289
290.. class:: ContentDispositionHeader
291
292 A :class:`ParameterizedMIMEHheader` class that handles the
293 :mailheader:`Content-Disposition` header.
294
295 .. attribute:: content-disposition
296
297 ``inline`` and ``attachment`` are the only valid values in common use.
298
299
300.. class:: ContentTransferEncoding
301
302 Handles the :mailheader:`Content-Transfer-Encoding` header.
303
304 .. attribute:: cte
305
306 Valid values are ``7bit``, ``8bit``, ``base64``, and
307 ``quoted-printable``. See :rfc:`2045` for more information.
308
309
310
R David Murrayea976682012-05-27 15:03:38 -0400311.. class:: HeaderRegistry(base_class=BaseHeader, \
312 default_class=UnstructuredHeader, \
313 use_default_map=True)
314
315 This is the factory used by :class:`~email.policy.EmailPolicy` by default.
316 ``HeaderRegistry`` builds the class used to create a header instance
317 dynamically, using *base_class* and a specialized class retrieved from a
318 registry that it holds. When a given header name does not appear in the
319 registry, the class specified by *default_class* is used as the specialized
320 class. When *use_default_map* is ``True`` (the default), the standard
321 mapping of header names to classes is copied in to the registry during
322 initialization. *base_class* is always the last class in the generated
323 class's ``__bases__`` list.
324
325 The default mappings are:
326
327 :subject: UniqueUnstructuredHeader
328 :date: UniqueDateHeader
329 :resent-date: DateHeader
330 :orig-date: UniqueDateHeader
331 :sender: UniqueSingleAddressHeader
332 :resent-sender: SingleAddressHeader
333 :to: UniqueAddressHeader
334 :resent-to: AddressHeader
335 :cc: UniqueAddressHeader
336 :resent-cc: AddressHeader
337 :from: UniqueAddressHeader
338 :resent-from: AddressHeader
339 :reply-to: UniqueAddressHeader
340
341 ``HeaderRegistry`` has the following methods:
342
343
344 .. method:: map_to_type(self, name, cls)
345
346 *name* is the name of the header to be mapped. It will be converted to
347 lower case in the registry. *cls* is the specialized class to be used,
348 along with *base_class*, to create the class used to instantiate headers
349 that match *name*.
350
351
352 .. method:: __getitem__(name)
353
354 Construct and return a class to handle creating a *name* header.
355
356
357 .. method:: __call__(name, value)
358
359 Retrieves the specialized header associated with *name* from the
360 registry (using *default_class* if *name* does not appear in the
361 registry) and composes it with *base_class* to produce a class,
362 calls the constructed class's constructor, passing it the same
363 argument list, and finally returns the class instance created thereby.
364
365
366The following classes are the classes used to represent data parsed from
367structured headers and can, in general, be used by an application program to
368construct structured values to assign to specific headers.
369
370
371.. class:: Address(display_name='', username='', domain='', addr_spec=None)
372
373 The class used to represent an email address. The general form of an
374 address is::
375
376 [display_name] <username@domain>
377
378 or::
379
380 username@domain
381
382 where each part must conform to specific syntax rules spelled out in
383 :rfc:`5322`.
384
385 As a convenience *addr_spec* can be specified instead of *username* and
386 *domain*, in which case *username* and *domain* will be parsed from the
387 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
388 not ``Address`` will raise an error. Unicode characters are allowed and
389 will be property encoded when serialized. However, per the RFCs, unicode is
390 *not* allowed in the username portion of the address.
391
392 .. attribute:: display_name
393
394 The display name portion of the address, if any, with all quoting
395 removed. If the address does not have a display name, this attribute
396 will be an empty string.
397
398 .. attribute:: username
399
400 The ``username`` portion of the address, with all quoting removed.
401
402 .. attribute:: domain
403
404 The ``domain`` portion of the address.
405
406 .. attribute:: addr_spec
407
408 The ``username@domain`` portion of the address, correctly quoted
409 for use as a bare address (the second form shown above). This
410 attribute is not mutable.
411
412 .. method:: __str__()
413
414 The ``str`` value of the object is the address quoted according to
415 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
416 characters.
417
418 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
419 ``username`` and ``domain`` are both the empty string (or ``None``), then
420 the string value of the ``Address`` is ``<>``.
421
422
423.. class:: Group(display_name=None, addresses=None)
424
425 The class used to represent an address group. The general form of an
426 address group is::
427
428 display_name: [address-list];
429
430 As a convenience for processing lists of addresses that consist of a mixture
431 of groups and single addresses, a ``Group`` may also be used to represent
432 single addresses that are not part of a group by setting *display_name* to
433 ``None`` and providing a list of the single address as *addresses*.
434
435 .. attribute:: display_name
436
437 The ``display_name`` of the group. If it is ``None`` and there is
438 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
439 single address that is not in a group.
440
441 .. attribute:: addresses
442
443 A possibly empty tuple of :class:`.Address` objects representing the
444 addresses in the group.
445
446 .. method:: __str__()
447
448 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
449 but with no Content Transfer Encoding of any non-ASCII characters. If
450 ``display_name`` is none and there is a single ``Address`` in the
451 ``addresses`` list, the ``str`` value will be the same as the ``str`` of
452 that single ``Address``.