blob: db3aade416fdde7926d33c79627164d5ca77e59f [file] [log] [blame]
R David Murrayea976682012-05-27 15:03:38 -04001:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5 :synopsis: Automatic Parsing of headers based on the field name
6
R David Murray79cf3ba2012-05-27 17:10:36 -04007.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
10
R David Murrayea976682012-05-27 15:03:38 -040011.. note::
12
13 The headerregistry module has been included in the standard library on a
14 :term:`provisional basis <provisional package>`. Backwards incompatible
15 changes (up to and including removal of the module) may occur if deemed
16 necessary by the core developers.
17
18.. versionadded:: 3.3
Georg Brandl19c4e5b2012-06-24 19:29:49 +020019 as a :term:`provisional module <provisional package>`.
R David Murrayea976682012-05-27 15:03:38 -040020
21Headers are represented by customized subclasses of :class:`str`. The
22particular class used to represent a given header is determined by the
23:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
24effect when the headers are created. This section documents the particular
25``header_factory`` implemented by the email package for handling :RFC:`5322`
26compliant email messages, which not only provides customized header objects for
27various header types, but also provides an extension mechanism for applications
28to add their own custom header types.
29
30When using any of the policy objects derived from
31:data:`~email.policy.EmailPolicy`, all headers are produced by
32:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
33class. Each header class has an additional base class that is determined by
34the type of the header. For example, many headers have the class
35:class:`.UnstructuredHeader` as their other base class. The specialized second
36class for a header is determined by the name of the header, using a lookup
37table stored in the :class:`.HeaderRegistry`. All of this is managed
38transparently for the typical application program, but interfaces are provided
39for modifying the default behavior for use by more complex applications.
40
41The sections below first document the header base classes and their attributes,
42followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
43finally the support classes used to represent the data parsed from structured
44headers.
45
46
47.. class:: BaseHeader(name, value)
48
49 *name* and *value* are passed to ``BaseHeader`` from the
50 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
51 any header object is the *value* fully decoded to unicode.
52
53 This base class defines the following read-only properties:
54
55
56 .. attribute:: name
57
58 The name of the header (the portion of the field before the ':'). This
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030059 is exactly the value passed in the
60 :attr:`~email.policy.EmailPolicy.header_factory` call for *name*; that
61 is, case is preserved.
R David Murrayea976682012-05-27 15:03:38 -040062
63
64 .. attribute:: defects
65
66 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
67 RFC compliance problems found during parsing. The email package tries to
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +030068 be complete about detecting compliance issues. See the :mod:`~email.errors`
R David Murrayea976682012-05-27 15:03:38 -040069 module for a discussion of the types of defects that may be reported.
70
71
72 .. attribute:: max_count
73
74 The maximum number of headers of this type that can have the same
75 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
76 for this attribute is ``None``; it is expected that specialized header
77 classes will override this value as needed.
78
79 ``BaseHeader`` also provides the following method, which is called by the
80 email library code and should not in general be called by application
81 programs:
82
83 .. method:: fold(*, policy)
84
85 Return a string containing :attr:`~email.policy.Policy.linesep`
86 characters as required to correctly fold the header according
87 to *policy*. A :attr:`~email.policy.Policy.cte_type` of
88 ``8bit`` will be treated as if it were ``7bit``, since strings
89 may not contain binary data.
90
91
92 ``BaseHeader`` by itself cannot be used to create a header object. It
93 defines a protocol that each specialized header cooperates with in order to
94 produce the header object. Specifically, ``BaseHeader`` requires that
95 the specialized class provide a :func:`classmethod` named ``parse``. This
96 method is called as follows::
97
98 parse(string, kwds)
99
100 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
101 ``defects`` is an empty list. The parse method should append any detected
102 defects to this list. On return, the ``kwds`` dictionary *must* contain
103 values for at least the keys ``decoded`` and ``defects``. ``decoded``
104 should be the string value for the header (that is, the header value fully
105 decoded to unicode). The parse method should assume that *string* may
106 contain transport encoded parts, but should correctly handle all valid
107 unicode characters as well so that it can parse un-encoded header values.
108
109 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
110 ``init`` method. The specialized class only needs to provide an ``init``
111 method if it wishes to set additional attributes beyond those provided by
112 ``BaseHeader`` itself. Such an ``init`` method should look like this::
113
114 def init(self, *args, **kw):
115 self._myattr = kw.pop('myattr')
116 super().init(*args, **kw)
117
118 That is, anything extra that the specialized class puts in to the ``kwds``
119 dictionary should be removed and handled, and the remaining contents of
120 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
121
122
123.. class:: UnstructuredHeader
124
125 An "unstructured" header is the default type of header in :rfc:`5322`.
126 Any header that does not have a specified syntax is treated as
127 unstructured. The classic example of an unstructured header is the
128 :mailheader:`Subject` header.
129
130 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
131 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
132 mechanism for encoding non-ASCII text as ASCII characters within a header
133 value. When a *value* containing encoded words is passed to the
134 constructor, the ``UnstructuredHeader`` parser converts such encoded words
135 back in to the original unicode, following the :rfc:`2047` rules for
136 unstructured text. The parser uses heuristics to attempt to decode certain
137 non-compliant encoded words. Defects are registered in such cases, as well
138 as defects for issues such as invalid characters within the encoded words or
139 the non-encoded text.
140
141 This header type provides no additional attributes.
142
143
144.. class:: DateHeader
145
146 :rfc:`5322` specifies a very specific format for dates within email headers.
147 The ``DateHeader`` parser recognizes that date format, as well as
148 recognizing a number of variant forms that are sometimes found "in the
149 wild".
150
151 This header type provides the following additional attributes:
152
153 .. attribute:: datetime
154
155 If the header value can be recognized as a valid date of one form or
156 another, this attribute will contain a :class:`~datetime.datetime`
157 instance representing that date. If the timezone of the input date is
158 specified as ``-0000`` (indicating it is in UTC but contains no
159 information about the source timezone), then :attr:`.datetime` will be a
160 naive :class:`~datetime.datetime`. If a specific timezone offset is
161 found (including `+0000`), then :attr:`.datetime` will contain an aware
162 ``datetime`` that uses :class:`datetime.timezone` to record the timezone
163 offset.
164
165 The ``decoded`` value of the header is determined by formatting the
166 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
167
168 email.utils.format_datetime(self.datetime)
169
170 When creating a ``DateHeader``, *value* may be
171 :class:`~datetime.datetime` instance. This means, for example, that
172 the following code is valid and does what one would expect::
173
174 msg['Date'] = datetime(2011, 7, 15, 21)
175
176 Because this is a naive ``datetime`` it will be interpreted as a UTC
177 timestamp, and the resulting value will have a timezone of ``-0000``. Much
178 more useful is to use the :func:`~email.utils.localtime` function from the
179 :mod:`~email.utils` module::
180
181 msg['Date'] = utils.localtime()
182
183 This example sets the date header to the current time and date using
184 the current timezone offset.
185
186
187.. class:: AddressHeader
188
189 Address headers are one of the most complex structured header types.
190 The ``AddressHeader`` class provides a generic interface to any address
191 header.
192
193 This header type provides the following additional attributes:
194
195
196 .. attribute:: groups
197
198 A tuple of :class:`.Group` objects encoding the
199 addresses and groups found in the header value. Addresses that are
200 not part of a group are represented in this list as single-address
201 ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
202
203
204 .. attribute:: addresses
205
206 A tuple of :class:`.Address` objects encoding all
207 of the individual addresses from the header value. If the header value
208 contains any groups, the individual addresses from the group are included
209 in the list at the point where the group occurs in the value (that is,
210 the list of addresses is "flattened" into a one dimensional list).
211
212 The ``decoded`` value of the header will have all encoded words decoded to
213 unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
214 ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
215 the elements of the ``groups`` attribute with ``', '``.
216
217 A list of :class:`.Address` and :class:`.Group` objects in any combination
218 may be used to set the value of an address header. ``Group`` objects whose
219 ``display_name`` is ``None`` will be interpreted as single addresses, which
220 allows an address list to be copied with groups intact by using the list
221 obtained ``groups`` attribute of the source header.
222
223
224.. class:: SingleAddressHeader
225
226 A subclass of :class:`.AddressHeader` that adds one
227 additional attribute:
228
229
230 .. attribute:: address
231
232 The single address encoded by the header value. If the header value
233 actually contains more than one address (which would be a violation of
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300234 the RFC under the default :mod:`~email.policy`), accessing this attribute
235 will result in a :exc:`ValueError`.
R David Murrayea976682012-05-27 15:03:38 -0400236
237
R David Murray97f43c02012-06-24 05:03:27 -0400238Many of the above classes also have a ``Unique`` variant (for example,
R David Murrayea976682012-05-27 15:03:38 -0400239``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
240variant, :attr:`~.BaseHeader.max_count` is set to 1.
241
242
R David Murray97f43c02012-06-24 05:03:27 -0400243.. class:: MIMEVersionHeader
244
245 There is really only one valid value for the :mailheader:`MIME-Version`
246 header, and that is ``1.0``. For future proofing, this header class
247 supports other valid version numbers. If a version number has a valid value
248 per :rfc:`2045`, then the header object will have non-``None`` values for
249 the following attributes:
250
251 .. attribute:: version
252
253 The version number as a string, with any whitespace and/or comments
254 removed.
255
256 .. attribute:: major
257
258 The major version number as an integer
259
260 .. attribute:: minor
261
262 The minor version number as an integer
263
264
265.. class:: ParameterizedMIMEHeader
266
267 MOME headers all start with the prefix 'Content-'. Each specific header has
268 a certain value, described under the class for that header. Some can
269 also take a list of supplemental parameters, which have a common format.
270 This class serves as a base for all the MIME headers that take parameters.
271
Georg Brandl7ac2af72012-06-24 11:56:47 +0200272 .. attribute:: params
R David Murray97f43c02012-06-24 05:03:27 -0400273
274 A dictionary mapping parameter names to parameter values.
275
276
277.. class:: ContentTypeHeader
278
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300279 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400280 :mailheader:`Content-Type` header.
281
282 .. attribute:: content_type
283
284 The content type string, in the form ``maintype/subtype``.
285
286 .. attribute:: maintype
287
288 .. attribute:: subtype
289
290
291.. class:: ContentDispositionHeader
292
Serhiy Storchakae0f0cf42013-08-19 09:59:18 +0300293 A :class:`ParameterizedMIMEHeader` class that handles the
R David Murray97f43c02012-06-24 05:03:27 -0400294 :mailheader:`Content-Disposition` header.
295
296 .. attribute:: content-disposition
297
298 ``inline`` and ``attachment`` are the only valid values in common use.
299
300
301.. class:: ContentTransferEncoding
302
303 Handles the :mailheader:`Content-Transfer-Encoding` header.
304
305 .. attribute:: cte
306
307 Valid values are ``7bit``, ``8bit``, ``base64``, and
308 ``quoted-printable``. See :rfc:`2045` for more information.
309
310
311
R David Murrayea976682012-05-27 15:03:38 -0400312.. class:: HeaderRegistry(base_class=BaseHeader, \
313 default_class=UnstructuredHeader, \
314 use_default_map=True)
315
316 This is the factory used by :class:`~email.policy.EmailPolicy` by default.
317 ``HeaderRegistry`` builds the class used to create a header instance
318 dynamically, using *base_class* and a specialized class retrieved from a
319 registry that it holds. When a given header name does not appear in the
320 registry, the class specified by *default_class* is used as the specialized
321 class. When *use_default_map* is ``True`` (the default), the standard
322 mapping of header names to classes is copied in to the registry during
323 initialization. *base_class* is always the last class in the generated
324 class's ``__bases__`` list.
325
326 The default mappings are:
327
328 :subject: UniqueUnstructuredHeader
329 :date: UniqueDateHeader
330 :resent-date: DateHeader
331 :orig-date: UniqueDateHeader
332 :sender: UniqueSingleAddressHeader
333 :resent-sender: SingleAddressHeader
334 :to: UniqueAddressHeader
335 :resent-to: AddressHeader
336 :cc: UniqueAddressHeader
337 :resent-cc: AddressHeader
338 :from: UniqueAddressHeader
339 :resent-from: AddressHeader
340 :reply-to: UniqueAddressHeader
341
342 ``HeaderRegistry`` has the following methods:
343
344
345 .. method:: map_to_type(self, name, cls)
346
347 *name* is the name of the header to be mapped. It will be converted to
348 lower case in the registry. *cls* is the specialized class to be used,
349 along with *base_class*, to create the class used to instantiate headers
350 that match *name*.
351
352
353 .. method:: __getitem__(name)
354
355 Construct and return a class to handle creating a *name* header.
356
357
358 .. method:: __call__(name, value)
359
360 Retrieves the specialized header associated with *name* from the
361 registry (using *default_class* if *name* does not appear in the
362 registry) and composes it with *base_class* to produce a class,
363 calls the constructed class's constructor, passing it the same
364 argument list, and finally returns the class instance created thereby.
365
366
367The following classes are the classes used to represent data parsed from
368structured headers and can, in general, be used by an application program to
369construct structured values to assign to specific headers.
370
371
372.. class:: Address(display_name='', username='', domain='', addr_spec=None)
373
374 The class used to represent an email address. The general form of an
375 address is::
376
377 [display_name] <username@domain>
378
379 or::
380
381 username@domain
382
383 where each part must conform to specific syntax rules spelled out in
384 :rfc:`5322`.
385
386 As a convenience *addr_spec* can be specified instead of *username* and
387 *domain*, in which case *username* and *domain* will be parsed from the
388 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
389 not ``Address`` will raise an error. Unicode characters are allowed and
390 will be property encoded when serialized. However, per the RFCs, unicode is
391 *not* allowed in the username portion of the address.
392
393 .. attribute:: display_name
394
395 The display name portion of the address, if any, with all quoting
396 removed. If the address does not have a display name, this attribute
397 will be an empty string.
398
399 .. attribute:: username
400
401 The ``username`` portion of the address, with all quoting removed.
402
403 .. attribute:: domain
404
405 The ``domain`` portion of the address.
406
407 .. attribute:: addr_spec
408
409 The ``username@domain`` portion of the address, correctly quoted
410 for use as a bare address (the second form shown above). This
411 attribute is not mutable.
412
413 .. method:: __str__()
414
415 The ``str`` value of the object is the address quoted according to
416 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
417 characters.
418
419 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
420 ``username`` and ``domain`` are both the empty string (or ``None``), then
421 the string value of the ``Address`` is ``<>``.
422
423
424.. class:: Group(display_name=None, addresses=None)
425
426 The class used to represent an address group. The general form of an
427 address group is::
428
429 display_name: [address-list];
430
431 As a convenience for processing lists of addresses that consist of a mixture
432 of groups and single addresses, a ``Group`` may also be used to represent
433 single addresses that are not part of a group by setting *display_name* to
434 ``None`` and providing a list of the single address as *addresses*.
435
436 .. attribute:: display_name
437
438 The ``display_name`` of the group. If it is ``None`` and there is
439 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
440 single address that is not in a group.
441
442 .. attribute:: addresses
443
444 A possibly empty tuple of :class:`.Address` objects representing the
445 addresses in the group.
446
447 .. method:: __str__()
448
449 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
450 but with no Content Transfer Encoding of any non-ASCII characters. If
451 ``display_name`` is none and there is a single ``Address`` in the
452 ``addresses`` list, the ``str`` value will be the same as the ``str`` of
453 that single ``Address``.