blob: 4fc9594bc3beb41aefb731e54488a0dbfc4bdb8e [file] [log] [blame]
R David Murrayea976682012-05-27 15:03:38 -04001:mod:`email.headerregistry`: Custom Header Objects
2--------------------------------------------------
3
4.. module:: email.headerregistry
5 :synopsis: Automatic Parsing of headers based on the field name
6
7.. note::
8
9 The headerregistry module has been included in the standard library on a
10 :term:`provisional basis <provisional package>`. Backwards incompatible
11 changes (up to and including removal of the module) may occur if deemed
12 necessary by the core developers.
13
14.. versionadded:: 3.3
15 as a :term:`provisional module <provisional package>`
16
17Headers are represented by customized subclasses of :class:`str`. The
18particular class used to represent a given header is determined by the
19:attr:`~email.policy.EmailPolicy.header_factory` of the :mod:`~email.policy` in
20effect when the headers are created. This section documents the particular
21``header_factory`` implemented by the email package for handling :RFC:`5322`
22compliant email messages, which not only provides customized header objects for
23various header types, but also provides an extension mechanism for applications
24to add their own custom header types.
25
26When using any of the policy objects derived from
27:data:`~email.policy.EmailPolicy`, all headers are produced by
28:class:`.HeaderRegistry` and have :class:`.BaseHeader` as their last base
29class. Each header class has an additional base class that is determined by
30the type of the header. For example, many headers have the class
31:class:`.UnstructuredHeader` as their other base class. The specialized second
32class for a header is determined by the name of the header, using a lookup
33table stored in the :class:`.HeaderRegistry`. All of this is managed
34transparently for the typical application program, but interfaces are provided
35for modifying the default behavior for use by more complex applications.
36
37The sections below first document the header base classes and their attributes,
38followed by the API for modifying the behavior of :class:`.HeaderRegistry`, and
39finally the support classes used to represent the data parsed from structured
40headers.
41
42
43.. class:: BaseHeader(name, value)
44
45 *name* and *value* are passed to ``BaseHeader`` from the
46 :attr:`~email.policy.EmailPolicy.header_factory` call. The string value of
47 any header object is the *value* fully decoded to unicode.
48
49 This base class defines the following read-only properties:
50
51
52 .. attribute:: name
53
54 The name of the header (the portion of the field before the ':'). This
55 is exactly the value passed in the :attr:`~EmailPolicy.header_factory`
56 call for *name*; that is, case is preserved.
57
58
59 .. attribute:: defects
60
61 A tuple of :exc:`~email.errors.HeaderDefect` instances reporting any
62 RFC compliance problems found during parsing. The email package tries to
63 be complete about detecting compliance issues. See the :mod:`errors`
64 module for a discussion of the types of defects that may be reported.
65
66
67 .. attribute:: max_count
68
69 The maximum number of headers of this type that can have the same
70 ``name``. A value of ``None`` means unlimited. The ``BaseHeader`` value
71 for this attribute is ``None``; it is expected that specialized header
72 classes will override this value as needed.
73
74 ``BaseHeader`` also provides the following method, which is called by the
75 email library code and should not in general be called by application
76 programs:
77
78 .. method:: fold(*, policy)
79
80 Return a string containing :attr:`~email.policy.Policy.linesep`
81 characters as required to correctly fold the header according
82 to *policy*. A :attr:`~email.policy.Policy.cte_type` of
83 ``8bit`` will be treated as if it were ``7bit``, since strings
84 may not contain binary data.
85
86
87 ``BaseHeader`` by itself cannot be used to create a header object. It
88 defines a protocol that each specialized header cooperates with in order to
89 produce the header object. Specifically, ``BaseHeader`` requires that
90 the specialized class provide a :func:`classmethod` named ``parse``. This
91 method is called as follows::
92
93 parse(string, kwds)
94
95 ``kwds`` is a dictionary containing one pre-initialized key, ``defects``.
96 ``defects`` is an empty list. The parse method should append any detected
97 defects to this list. On return, the ``kwds`` dictionary *must* contain
98 values for at least the keys ``decoded`` and ``defects``. ``decoded``
99 should be the string value for the header (that is, the header value fully
100 decoded to unicode). The parse method should assume that *string* may
101 contain transport encoded parts, but should correctly handle all valid
102 unicode characters as well so that it can parse un-encoded header values.
103
104 ``BaseHeader``'s ``__new__`` then creates the header instance, and calls its
105 ``init`` method. The specialized class only needs to provide an ``init``
106 method if it wishes to set additional attributes beyond those provided by
107 ``BaseHeader`` itself. Such an ``init`` method should look like this::
108
109 def init(self, *args, **kw):
110 self._myattr = kw.pop('myattr')
111 super().init(*args, **kw)
112
113 That is, anything extra that the specialized class puts in to the ``kwds``
114 dictionary should be removed and handled, and the remaining contents of
115 ``kw`` (and ``args``) passed to the ``BaseHeader`` ``init`` method.
116
117
118.. class:: UnstructuredHeader
119
120 An "unstructured" header is the default type of header in :rfc:`5322`.
121 Any header that does not have a specified syntax is treated as
122 unstructured. The classic example of an unstructured header is the
123 :mailheader:`Subject` header.
124
125 In :rfc:`5322`, an unstructured header is a run of arbitrary text in the
126 ASCII character set. :rfc:`2047`, however, has an :rfc:`5322` compatible
127 mechanism for encoding non-ASCII text as ASCII characters within a header
128 value. When a *value* containing encoded words is passed to the
129 constructor, the ``UnstructuredHeader`` parser converts such encoded words
130 back in to the original unicode, following the :rfc:`2047` rules for
131 unstructured text. The parser uses heuristics to attempt to decode certain
132 non-compliant encoded words. Defects are registered in such cases, as well
133 as defects for issues such as invalid characters within the encoded words or
134 the non-encoded text.
135
136 This header type provides no additional attributes.
137
138
139.. class:: DateHeader
140
141 :rfc:`5322` specifies a very specific format for dates within email headers.
142 The ``DateHeader`` parser recognizes that date format, as well as
143 recognizing a number of variant forms that are sometimes found "in the
144 wild".
145
146 This header type provides the following additional attributes:
147
148 .. attribute:: datetime
149
150 If the header value can be recognized as a valid date of one form or
151 another, this attribute will contain a :class:`~datetime.datetime`
152 instance representing that date. If the timezone of the input date is
153 specified as ``-0000`` (indicating it is in UTC but contains no
154 information about the source timezone), then :attr:`.datetime` will be a
155 naive :class:`~datetime.datetime`. If a specific timezone offset is
156 found (including `+0000`), then :attr:`.datetime` will contain an aware
157 ``datetime`` that uses :class:`datetime.timezone` to record the timezone
158 offset.
159
160 The ``decoded`` value of the header is determined by formatting the
161 ``datetime`` according to the :rfc:`5322` rules; that is, it is set to::
162
163 email.utils.format_datetime(self.datetime)
164
165 When creating a ``DateHeader``, *value* may be
166 :class:`~datetime.datetime` instance. This means, for example, that
167 the following code is valid and does what one would expect::
168
169 msg['Date'] = datetime(2011, 7, 15, 21)
170
171 Because this is a naive ``datetime`` it will be interpreted as a UTC
172 timestamp, and the resulting value will have a timezone of ``-0000``. Much
173 more useful is to use the :func:`~email.utils.localtime` function from the
174 :mod:`~email.utils` module::
175
176 msg['Date'] = utils.localtime()
177
178 This example sets the date header to the current time and date using
179 the current timezone offset.
180
181
182.. class:: AddressHeader
183
184 Address headers are one of the most complex structured header types.
185 The ``AddressHeader`` class provides a generic interface to any address
186 header.
187
188 This header type provides the following additional attributes:
189
190
191 .. attribute:: groups
192
193 A tuple of :class:`.Group` objects encoding the
194 addresses and groups found in the header value. Addresses that are
195 not part of a group are represented in this list as single-address
196 ``Groups`` whose :attr:`~.Group.display_name` is ``None``.
197
198
199 .. attribute:: addresses
200
201 A tuple of :class:`.Address` objects encoding all
202 of the individual addresses from the header value. If the header value
203 contains any groups, the individual addresses from the group are included
204 in the list at the point where the group occurs in the value (that is,
205 the list of addresses is "flattened" into a one dimensional list).
206
207 The ``decoded`` value of the header will have all encoded words decoded to
208 unicode. :class:`~encodings.idna` encoded domain names are also decoded to unicode. The
209 ``decoded`` value is set by :attr:`~str.join`\ ing the :class:`str` value of
210 the elements of the ``groups`` attribute with ``', '``.
211
212 A list of :class:`.Address` and :class:`.Group` objects in any combination
213 may be used to set the value of an address header. ``Group`` objects whose
214 ``display_name`` is ``None`` will be interpreted as single addresses, which
215 allows an address list to be copied with groups intact by using the list
216 obtained ``groups`` attribute of the source header.
217
218
219.. class:: SingleAddressHeader
220
221 A subclass of :class:`.AddressHeader` that adds one
222 additional attribute:
223
224
225 .. attribute:: address
226
227 The single address encoded by the header value. If the header value
228 actually contains more than one address (which would be a violation of
229 the RFC under the default :mod:`policy`), accessing this attribute will
230 result in a :exc:`ValueError`.
231
232
233Each of the above classes also has a ``Unique`` variant (for example,
234``UniqueUnstructuredHeader``). The only difference is that in the ``Unique``
235variant, :attr:`~.BaseHeader.max_count` is set to 1.
236
237
238.. class:: HeaderRegistry(base_class=BaseHeader, \
239 default_class=UnstructuredHeader, \
240 use_default_map=True)
241
242 This is the factory used by :class:`~email.policy.EmailPolicy` by default.
243 ``HeaderRegistry`` builds the class used to create a header instance
244 dynamically, using *base_class* and a specialized class retrieved from a
245 registry that it holds. When a given header name does not appear in the
246 registry, the class specified by *default_class* is used as the specialized
247 class. When *use_default_map* is ``True`` (the default), the standard
248 mapping of header names to classes is copied in to the registry during
249 initialization. *base_class* is always the last class in the generated
250 class's ``__bases__`` list.
251
252 The default mappings are:
253
254 :subject: UniqueUnstructuredHeader
255 :date: UniqueDateHeader
256 :resent-date: DateHeader
257 :orig-date: UniqueDateHeader
258 :sender: UniqueSingleAddressHeader
259 :resent-sender: SingleAddressHeader
260 :to: UniqueAddressHeader
261 :resent-to: AddressHeader
262 :cc: UniqueAddressHeader
263 :resent-cc: AddressHeader
264 :from: UniqueAddressHeader
265 :resent-from: AddressHeader
266 :reply-to: UniqueAddressHeader
267
268 ``HeaderRegistry`` has the following methods:
269
270
271 .. method:: map_to_type(self, name, cls)
272
273 *name* is the name of the header to be mapped. It will be converted to
274 lower case in the registry. *cls* is the specialized class to be used,
275 along with *base_class*, to create the class used to instantiate headers
276 that match *name*.
277
278
279 .. method:: __getitem__(name)
280
281 Construct and return a class to handle creating a *name* header.
282
283
284 .. method:: __call__(name, value)
285
286 Retrieves the specialized header associated with *name* from the
287 registry (using *default_class* if *name* does not appear in the
288 registry) and composes it with *base_class* to produce a class,
289 calls the constructed class's constructor, passing it the same
290 argument list, and finally returns the class instance created thereby.
291
292
293The following classes are the classes used to represent data parsed from
294structured headers and can, in general, be used by an application program to
295construct structured values to assign to specific headers.
296
297
298.. class:: Address(display_name='', username='', domain='', addr_spec=None)
299
300 The class used to represent an email address. The general form of an
301 address is::
302
303 [display_name] <username@domain>
304
305 or::
306
307 username@domain
308
309 where each part must conform to specific syntax rules spelled out in
310 :rfc:`5322`.
311
312 As a convenience *addr_spec* can be specified instead of *username* and
313 *domain*, in which case *username* and *domain* will be parsed from the
314 *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is
315 not ``Address`` will raise an error. Unicode characters are allowed and
316 will be property encoded when serialized. However, per the RFCs, unicode is
317 *not* allowed in the username portion of the address.
318
319 .. attribute:: display_name
320
321 The display name portion of the address, if any, with all quoting
322 removed. If the address does not have a display name, this attribute
323 will be an empty string.
324
325 .. attribute:: username
326
327 The ``username`` portion of the address, with all quoting removed.
328
329 .. attribute:: domain
330
331 The ``domain`` portion of the address.
332
333 .. attribute:: addr_spec
334
335 The ``username@domain`` portion of the address, correctly quoted
336 for use as a bare address (the second form shown above). This
337 attribute is not mutable.
338
339 .. method:: __str__()
340
341 The ``str`` value of the object is the address quoted according to
342 :rfc:`5322` rules, but with no Content Transfer Encoding of any non-ASCII
343 characters.
344
345 To support SMTP (:rfc:`5321`), ``Address`` handles one special case: if
346 ``username`` and ``domain`` are both the empty string (or ``None``), then
347 the string value of the ``Address`` is ``<>``.
348
349
350.. class:: Group(display_name=None, addresses=None)
351
352 The class used to represent an address group. The general form of an
353 address group is::
354
355 display_name: [address-list];
356
357 As a convenience for processing lists of addresses that consist of a mixture
358 of groups and single addresses, a ``Group`` may also be used to represent
359 single addresses that are not part of a group by setting *display_name* to
360 ``None`` and providing a list of the single address as *addresses*.
361
362 .. attribute:: display_name
363
364 The ``display_name`` of the group. If it is ``None`` and there is
365 exactly one ``Address`` in ``addresses``, then the ``Group`` represents a
366 single address that is not in a group.
367
368 .. attribute:: addresses
369
370 A possibly empty tuple of :class:`.Address` objects representing the
371 addresses in the group.
372
373 .. method:: __str__()
374
375 The ``str`` value of a ``Group`` is formatted according to :rfc:`5322`,
376 but with no Content Transfer Encoding of any non-ASCII characters. If
377 ``display_name`` is none and there is a single ``Address`` in the
378 ``addresses`` list, the ``str`` value will be the same as the ``str`` of
379 that single ``Address``.