blob: 06d98214ca2d1908c72289ac3c90b026cbf1238a [file] [log] [blame]
R David Murray79cf3ba2012-05-27 17:10:36 -04001:mod:`email.policy`: Policy Objects
2-----------------------------------
R David Murray3edd22a2011-04-18 13:59:37 -04003
4.. module:: email.policy
5 :synopsis: Controlling the parsing and generating of messages
6
R David Murray79cf3ba2012-05-27 17:10:36 -04007.. moduleauthor:: R. David Murray <rdmurray@bitdance.com>
8.. sectionauthor:: R. David Murray <rdmurray@bitdance.com>
9
Éric Araujo54dbfbd2011-08-10 21:43:13 +020010.. versionadded:: 3.3
R David Murray6a45d3b2011-04-18 16:00:47 -040011
R David Murray3edd22a2011-04-18 13:59:37 -040012
13The :mod:`email` package's prime focus is the handling of email messages as
14described by the various email and MIME RFCs. However, the general format of
15email messages (a block of header fields each consisting of a name followed by
16a colon followed by a value, the whole block followed by a blank line and an
17arbitrary 'body'), is a format that has found utility outside of the realm of
18email. Some of these uses conform fairly closely to the main RFCs, some do
19not. And even when working with email, there are times when it is desirable to
20break strict compliance with the RFCs.
21
R David Murray6a45d3b2011-04-18 16:00:47 -040022Policy objects give the email package the flexibility to handle all these
23disparate use cases.
R David Murray3edd22a2011-04-18 13:59:37 -040024
25A :class:`Policy` object encapsulates a set of attributes and methods that
26control the behavior of various components of the email package during use.
27:class:`Policy` instances can be passed to various classes and methods in the
28email package to alter the default behavior. The settable values and their
R David Murrayc27e5222012-05-25 15:01:48 -040029defaults are described below.
R David Murray3edd22a2011-04-18 13:59:37 -040030
R David Murrayc27e5222012-05-25 15:01:48 -040031There is a default policy used by all classes in the email package. This
32policy is named :class:`Compat32`, with a corresponding pre-defined instance
33named :const:`compat32`. It provides for complete backward compatibility (in
34some cases, including bug compatibility) with the pre-Python3.3 version of the
35email package.
36
37The first part of this documentation covers the features of :class:`Policy`, an
38:term:`abstract base class` that defines the features that are common to all
39policy objects, including :const:`compat32`. This includes certain hook
40methods that are called internally by the email package, which a custom policy
41could override to obtain different behavior.
42
43When a :class:`~email.message.Message` object is created, it acquires a policy.
44By default this will be :const:`compat32`, but a different policy can be
45specified. If the ``Message`` is created by a :mod:`~email.parser`, a policy
46passed to the parser will be the policy used by the ``Message`` it creates. If
47the ``Message`` is created by the program, then the policy can be specified
48when it is created. When a ``Message`` is passed to a :mod:`~email.generator`,
49the generator uses the policy from the ``Message`` by default, but you can also
50pass a specific policy to the generator that will override the one stored on
51the ``Message`` object.
52
53:class:`Policy` instances are immutable, but they can be cloned, accepting the
54same keyword arguments as the class constructor and returning a new
55:class:`Policy` instance that is a copy of the original but with the specified
56attributes values changed.
R David Murray3edd22a2011-04-18 13:59:37 -040057
58As an example, the following code could be used to read an email message from a
R David Murray6a45d3b2011-04-18 16:00:47 -040059file on disk and pass it to the system ``sendmail`` program on a Unix system::
R David Murray3edd22a2011-04-18 13:59:37 -040060
61 >>> from email import msg_from_binary_file
62 >>> from email.generator import BytesGenerator
R David Murray3edd22a2011-04-18 13:59:37 -040063 >>> from subprocess import Popen, PIPE
64 >>> with open('mymsg.txt', 'b') as f:
R David Murrayc27e5222012-05-25 15:01:48 -040065 ... msg = msg_from_binary_file(f)
R David Murray3edd22a2011-04-18 13:59:37 -040066 >>> p = Popen(['sendmail', msg['To'][0].address], stdin=PIPE)
R David Murrayc27e5222012-05-25 15:01:48 -040067 >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n'))
R David Murray3edd22a2011-04-18 13:59:37 -040068 >>> g.flatten(msg)
69 >>> p.stdin.close()
70 >>> rc = p.wait()
71
R David Murrayc27e5222012-05-25 15:01:48 -040072Here we are telling :class:`~email.generator.BytesGenerator` to use the RFC
73correct line separator characters when creating the binary string to feed into
74``sendmail's`` ``stdin``, where the default policy would use ``\n`` line
75separators.
Éric Araujofe0472e2011-12-03 16:00:56 +010076
R David Murray3edd22a2011-04-18 13:59:37 -040077Some email package methods accept a *policy* keyword argument, allowing the
R David Murray6a45d3b2011-04-18 16:00:47 -040078policy to be overridden for that method. For example, the following code uses
R David Murrayc27e5222012-05-25 15:01:48 -040079the :meth:`~email.message.Message.as_string` method of the *msg* object from
80the previous example and writes the message to a file using the native line
81separators for the platform on which it is running::
R David Murray3edd22a2011-04-18 13:59:37 -040082
83 >>> import os
R David Murray3edd22a2011-04-18 13:59:37 -040084 >>> with open('converted.txt', 'wb') as f:
R David Murrayc27e5222012-05-25 15:01:48 -040085 ... f.write(msg.as_string(policy=msg.policy.clone(linesep=os.linesep))
R David Murray3edd22a2011-04-18 13:59:37 -040086
87Policy objects can also be combined using the addition operator, producing a
88policy object whose settings are a combination of the non-default values of the
89summed objects::
90
R David Murrayc27e5222012-05-25 15:01:48 -040091 >>> compat_SMTP = email.policy.clone(linesep='\r\n')
92 >>> compat_strict = email.policy.clone(raise_on_defect=True)
93 >>> compat_strict_SMTP = compat_SMTP + compat_strict
R David Murray3edd22a2011-04-18 13:59:37 -040094
95This operation is not commutative; that is, the order in which the objects are
96added matters. To illustrate::
97
R David Murrayc27e5222012-05-25 15:01:48 -040098 >>> policy100 = compat32.clone(max_line_length=100)
99 >>> policy80 = compat32.clone(max_line_length=80)
100 >>> apolicy = policy100 + Policy80
R David Murray3edd22a2011-04-18 13:59:37 -0400101 >>> apolicy.max_line_length
102 80
R David Murrayc27e5222012-05-25 15:01:48 -0400103 >>> apolicy = policy80 + policy100
R David Murray3edd22a2011-04-18 13:59:37 -0400104 >>> apolicy.max_line_length
105 100
106
107
108.. class:: Policy(**kw)
109
R David Murrayc27e5222012-05-25 15:01:48 -0400110 This is the :term:`abstract base class` for all policy classes. It provides
111 default implementations for a couple of trivial methods, as well as the
112 implementation of the immutability property, the :meth:`clone` method, and
113 the constructor semantics.
114
115 The constructor of a policy class can be passed various keyword arguments.
116 The arguments that may be specified are any non-method properties on this
117 class, plus any additional non-method properties on the concrete class. A
118 value specified in the constructor will override the default value for the
119 corresponding attribute.
120
121 This class defines the following properties, and thus values for the
122 following may be passed in the constructor of any policy class:
R David Murray3edd22a2011-04-18 13:59:37 -0400123
124 .. attribute:: max_line_length
125
126 The maximum length of any line in the serialized output, not counting the
127 end of line character(s). Default is 78, per :rfc:`5322`. A value of
128 ``0`` or :const:`None` indicates that no line wrapping should be
129 done at all.
130
131 .. attribute:: linesep
132
133 The string to be used to terminate lines in serialized output. The
R David Murray6a45d3b2011-04-18 16:00:47 -0400134 default is ``\n`` because that's the internal end-of-line discipline used
R David Murrayc27e5222012-05-25 15:01:48 -0400135 by Python, though ``\r\n`` is required by the RFCs.
R David Murray3edd22a2011-04-18 13:59:37 -0400136
R David Murrayc27e5222012-05-25 15:01:48 -0400137 .. attribute:: cte_type
R David Murray3edd22a2011-04-18 13:59:37 -0400138
R David Murrayc27e5222012-05-25 15:01:48 -0400139 Controls the type of Content Transfer Encodings that may be or are
140 required to be used. The possible values are:
141
Georg Brandl44ea77b2013-03-28 13:28:44 +0100142 .. tabularcolumns:: |l|L|
143
R David Murrayc27e5222012-05-25 15:01:48 -0400144 ======== ===============================================================
145 ``7bit`` all data must be "7 bit clean" (ASCII-only). This means that
146 where necessary data will be encoded using either
147 quoted-printable or base64 encoding.
148
149 ``8bit`` data is not constrained to be 7 bit clean. Data in headers is
150 still required to be ASCII-only and so will be encoded (see
151 'binary_fold' below for an exception), but body parts may use
152 the ``8bit`` CTE.
153 ======== ===============================================================
154
155 A ``cte_type`` value of ``8bit`` only works with ``BytesGenerator``, not
156 ``Generator``, because strings cannot contain binary data. If a
157 ``Generator`` is operating under a policy that specifies
158 ``cte_type=8bit``, it will act as if ``cte_type`` is ``7bit``.
R David Murray3edd22a2011-04-18 13:59:37 -0400159
160 .. attribute:: raise_on_defect
161
162 If :const:`True`, any defects encountered will be raised as errors. If
163 :const:`False` (the default), defects will be passed to the
164 :meth:`register_defect` method.
165
R David Murrayc27e5222012-05-25 15:01:48 -0400166 The following :class:`Policy` method is intended to be called by code using
167 the email library to create policy instances with custom settings:
R David Murray6a45d3b2011-04-18 16:00:47 -0400168
R David Murrayc27e5222012-05-25 15:01:48 -0400169 .. method:: clone(**kw)
R David Murray3edd22a2011-04-18 13:59:37 -0400170
171 Return a new :class:`Policy` instance whose attributes have the same
172 values as the current instance, except where those attributes are
173 given new values by the keyword arguments.
174
R David Murrayc27e5222012-05-25 15:01:48 -0400175 The remaining :class:`Policy` methods are called by the email package code,
176 and are not intended to be called by an application using the email package.
177 A custom policy must implement all of these methods.
R David Murray3edd22a2011-04-18 13:59:37 -0400178
R David Murrayc27e5222012-05-25 15:01:48 -0400179 .. method:: handle_defect(obj, defect)
R David Murray3edd22a2011-04-18 13:59:37 -0400180
R David Murrayc27e5222012-05-25 15:01:48 -0400181 Handle a *defect* found on *obj*. When the email package calls this
182 method, *defect* will always be a subclass of
183 :class:`~email.errors.Defect`.
R David Murray3edd22a2011-04-18 13:59:37 -0400184
R David Murrayc27e5222012-05-25 15:01:48 -0400185 The default implementation checks the :attr:`raise_on_defect` flag. If
186 it is ``True``, *defect* is raised as an exception. If it is ``False``
187 (the default), *obj* and *defect* are passed to :meth:`register_defect`.
R David Murray3edd22a2011-04-18 13:59:37 -0400188
R David Murrayc27e5222012-05-25 15:01:48 -0400189 .. method:: register_defect(obj, defect)
R David Murray3edd22a2011-04-18 13:59:37 -0400190
R David Murrayc27e5222012-05-25 15:01:48 -0400191 Register a *defect* on *obj*. In the email package, *defect* will always
192 be a subclass of :class:`~email.errors.Defect`.
R David Murray3edd22a2011-04-18 13:59:37 -0400193
R David Murrayc27e5222012-05-25 15:01:48 -0400194 The default implementation calls the ``append`` method of the ``defects``
195 attribute of *obj*. When the email package calls :attr:`handle_defect`,
196 *obj* will normally have a ``defects`` attribute that has an ``append``
197 method. Custom object types used with the email package (for example,
198 custom ``Message`` objects) should also provide such an attribute,
199 otherwise defects in parsed messages will raise unexpected errors.
R David Murray3edd22a2011-04-18 13:59:37 -0400200
R David Murrayabfc3742012-05-29 09:14:44 -0400201 .. method:: header_max_count(name)
202
203 Return the maximum allowed number of headers named *name*.
204
205 Called when a header is added to a :class:`~email.message.Message`
206 object. If the returned value is not ``0`` or ``None``, and there are
207 already a number of headers with the name *name* equal to the value
208 returned, a :exc:`ValueError` is raised.
209
210 Because the default behavior of ``Message.__setitem__`` is to append the
211 value to the list of headers, it is easy to create duplicate headers
212 without realizing it. This method allows certain headers to be limited
213 in the number of instances of that header that may be added to a
214 ``Message`` programmatically. (The limit is not observed by the parser,
215 which will faithfully produce as many headers as exist in the message
216 being parsed.)
217
218 The default implementation returns ``None`` for all header names.
219
R David Murrayc27e5222012-05-25 15:01:48 -0400220 .. method:: header_source_parse(sourcelines)
R David Murray3edd22a2011-04-18 13:59:37 -0400221
R David Murrayc27e5222012-05-25 15:01:48 -0400222 The email package calls this method with a list of strings, each string
223 ending with the line separation characters found in the source being
224 parsed. The first line includes the field header name and separator.
225 All whitespace in the source is preserved. The method should return the
226 ``(name, value)`` tuple that is to be stored in the ``Message`` to
227 represent the parsed header.
R David Murray3edd22a2011-04-18 13:59:37 -0400228
R David Murrayc27e5222012-05-25 15:01:48 -0400229 If an implementation wishes to retain compatibility with the existing
230 email package policies, *name* should be the case preserved name (all
231 characters up to the '``:``' separator), while *value* should be the
232 unfolded value (all line separator characters removed, but whitespace
233 kept intact), stripped of leading whitespace.
R David Murray3edd22a2011-04-18 13:59:37 -0400234
R David Murrayc27e5222012-05-25 15:01:48 -0400235 *sourcelines* may contain surrogateescaped binary data.
236
237 There is no default implementation
238
239 .. method:: header_store_parse(name, value)
240
241 The email package calls this method with the name and value provided by
242 the application program when the application program is modifying a
243 ``Message`` programmatically (as opposed to a ``Message`` created by a
244 parser). The method should return the ``(name, value)`` tuple that is to
245 be stored in the ``Message`` to represent the header.
246
247 If an implementation wishes to retain compatibility with the existing
248 email package policies, the *name* and *value* should be strings or
249 string subclasses that do not change the content of the passed in
250 arguments.
251
252 There is no default implementation
253
254 .. method:: header_fetch_parse(name, value)
255
256 The email package calls this method with the *name* and *value* currently
257 stored in the ``Message`` when that header is requested by the
258 application program, and whatever the method returns is what is passed
259 back to the application as the value of the header being retrieved.
260 Note that there may be more than one header with the same name stored in
261 the ``Message``; the method is passed the specific name and value of the
262 header destined to be returned to the application.
263
264 *value* may contain surrogateescaped binary data. There should be no
265 surrogateescaped binary data in the value returned by the method.
266
267 There is no default implementation
268
269 .. method:: fold(name, value)
270
271 The email package calls this method with the *name* and *value* currently
272 stored in the ``Message`` for a given header. The method should return a
273 string that represents that header "folded" correctly (according to the
274 policy settings) by composing the *name* with the *value* and inserting
275 :attr:`linesep` characters at the appropriate places. See :rfc:`5322`
276 for a discussion of the rules for folding email headers.
277
278 *value* may contain surrogateescaped binary data. There should be no
279 surrogateescaped binary data in the string returned by the method.
280
281 .. method:: fold_binary(name, value)
282
283 The same as :meth:`fold`, except that the returned value should be a
284 bytes object rather than a string.
285
286 *value* may contain surrogateescaped binary data. These could be
287 converted back into binary data in the returned bytes object.
288
289
290.. class:: Compat32(**kw)
291
292 This concrete :class:`Policy` is the backward compatibility policy. It
293 replicates the behavior of the email package in Python 3.2. The
294 :mod:`policy` module also defines an instance of this class,
295 :const:`compat32`, that is used as the default policy. Thus the default
296 behavior of the email package is to maintain compatibility with Python 3.2.
297
298 The class provides the following concrete implementations of the
299 abstract methods of :class:`Policy`:
300
301 .. method:: header_source_parse(sourcelines)
302
303 The name is parsed as everything up to the '``:``' and returned
304 unmodified. The value is determined by stripping leading whitespace off
305 the remainder of the first line, joining all subsequent lines together,
306 and stripping any trailing carriage return or linefeed characters.
307
308 .. method:: header_store_parse(name, value)
309
310 The name and value are returned unmodified.
311
312 .. method:: header_fetch_parse(name, value)
313
314 If the value contains binary data, it is converted into a
315 :class:`~email.header.Header` object using the ``unknown-8bit`` charset.
316 Otherwise it is returned unmodified.
317
318 .. method:: fold(name, value)
319
320 Headers are folded using the :class:`~email.header.Header` folding
321 algorithm, which preserves existing line breaks in the value, and wraps
322 each resulting line to the ``max_line_length``. Non-ASCII binary data are
323 CTE encoded using the ``unknown-8bit`` charset.
324
325 .. method:: fold_binary(name, value)
326
327 Headers are folded using the :class:`~email.header.Header` folding
328 algorithm, which preserves existing line breaks in the value, and wraps
329 each resulting line to the ``max_line_length``. If ``cte_type`` is
330 ``7bit``, non-ascii binary data is CTE encoded using the ``unknown-8bit``
331 charset. Otherwise the original source header is used, with its existing
Terry Jan Reedy0f847642013-03-11 18:34:00 -0400332 line breaks and any (RFC invalid) binary data it may contain.
R David Murray0b6f6c82012-05-25 18:42:14 -0400333
334
335.. note::
336
R David Murrayea976682012-05-27 15:03:38 -0400337 The documentation below describes new policies that are included in the
338 standard library on a :term:`provisional basis <provisional package>`.
339 Backwards incompatible changes (up to and including removal of the feature)
340 may occur if deemed necessary by the core developers.
R David Murray0b6f6c82012-05-25 18:42:14 -0400341
342
343.. class:: EmailPolicy(**kw)
344
345 This concrete :class:`Policy` provides behavior that is intended to be fully
346 compliant with the current email RFCs. These include (but are not limited
347 to) :rfc:`5322`, :rfc:`2047`, and the current MIME RFCs.
348
349 This policy adds new header parsing and folding algorithms. Instead of
350 simple strings, headers are custom objects with custom attributes depending
351 on the type of the field. The parsing and folding algorithm fully implement
352 :rfc:`2047` and :rfc:`5322`.
353
354 In addition to the settable attributes listed above that apply to all
355 policies, this policy adds the following additional attributes:
356
357 .. attribute:: refold_source
358
359 If the value for a header in the ``Message`` object originated from a
360 :mod:`~email.parser` (as opposed to being set by a program), this
361 attribute indicates whether or not a generator should refold that value
362 when transforming the message back into stream form. The possible values
363 are:
364
365 ======== ===============================================================
366 ``none`` all source values use original folding
367
368 ``long`` source values that have any line that is longer than
369 ``max_line_length`` will be refolded
370
371 ``all`` all values are refolded.
372 ======== ===============================================================
373
374 The default is ``long``.
375
376 .. attribute:: header_factory
377
378 A callable that takes two arguments, ``name`` and ``value``, where
379 ``name`` is a header field name and ``value`` is an unfolded header field
R David Murrayea976682012-05-27 15:03:38 -0400380 value, and returns a string subclass that represents that header. A
381 default ``header_factory`` (see :mod:`~email.headerregistry`) is provided
382 that understands some of the :RFC:`5322` header field types. (Currently
383 address fields and date fields have special treatment, while all other
384 fields are treated as unstructured. This list will be completed before
385 the extension is marked stable.)
R David Murray0b6f6c82012-05-25 18:42:14 -0400386
387 The class provides the following concrete implementations of the abstract
388 methods of :class:`Policy`:
389
R David Murrayabfc3742012-05-29 09:14:44 -0400390 .. method:: header_max_count(name)
391
392 Returns the value of the
393 :attr:`~email.headerregistry.BaseHeader.max_count` attribute of the
394 specialized class used to represent the header with the given name.
395
R David Murray0b6f6c82012-05-25 18:42:14 -0400396 .. method:: header_source_parse(sourcelines)
397
398 The implementation of this method is the same as that for the
399 :class:`Compat32` policy.
400
401 .. method:: header_store_parse(name, value)
402
403 The name is returned unchanged. If the input value has a ``name``
404 attribute and it matches *name* ignoring case, the value is returned
405 unchanged. Otherwise the *name* and *value* are passed to
406 ``header_factory``, and the resulting custom header object is returned as
407 the value. In this case a ``ValueError`` is raised if the input value
408 contains CR or LF characters.
409
410 .. method:: header_fetch_parse(name, value)
411
412 If the value has a ``name`` attribute, it is returned to unmodified.
413 Otherwise the *name*, and the *value* with any CR or LF characters
414 removed, are passed to the ``header_factory``, and the resulting custom
415 header object is returned. Any surrogateescaped bytes get turned into
416 the unicode unknown-character glyph.
417
418 .. method:: fold(name, value)
419
420 Header folding is controlled by the :attr:`refold_source` policy setting.
421 A value is considered to be a 'source value' if and only if it does not
422 have a ``name`` attribute (having a ``name`` attribute means it is a
423 header object of some sort). If a source value needs to be refolded
424 according to the policy, it is converted into a custom header object by
425 passing the *name* and the *value* with any CR and LF characters removed
426 to the ``header_factory``. Folding of a custom header object is done by
427 calling its ``fold`` method with the current policy.
428
429 Source values are split into lines using :meth:`~str.splitlines`. If
430 the value is not to be refolded, the lines are rejoined using the
431 ``linesep`` from the policy and returned. The exception is lines
432 containing non-ascii binary data. In that case the value is refolded
433 regardless of the ``refold_source`` setting, which causes the binary data
434 to be CTE encoded using the ``unknown-8bit`` charset.
435
436 .. method:: fold_binary(name, value)
437
438 The same as :meth:`fold` if :attr:`cte_type` is ``7bit``, except that
439 the returned value is bytes.
440
441 If :attr:`cte_type` is ``8bit``, non-ASCII binary data is converted back
442 into bytes. Headers with binary data are not refolded, regardless of the
443 ``refold_header`` setting, since there is no way to know whether the
444 binary data consists of single byte characters or multibyte characters.
445
446The following instances of :class:`EmailPolicy` provide defaults suitable for
447specific application domains. Note that in the future the behavior of these
Georg Brandl38e0e1e2012-05-27 09:31:10 +0200448instances (in particular the ``HTTP`` instance) may be adjusted to conform even
R David Murray0b6f6c82012-05-25 18:42:14 -0400449more closely to the RFCs relevant to their domains.
450
451.. data:: default
452
453 An instance of ``EmailPolicy`` with all defaults unchanged. This policy
454 uses the standard Python ``\n`` line endings rather than the RFC-correct
455 ``\r\n``.
456
457.. data:: SMTP
458
459 Suitable for serializing messages in conformance with the email RFCs.
460 Like ``default``, but with ``linesep`` set to ``\r\n``, which is RFC
461 compliant.
462
463.. data:: HTTP
464
465 Suitable for serializing headers with for use in HTTP traffic. Like
466 ``SMTP`` except that ``max_line_length`` is set to ``None`` (unlimited).
467
468.. data:: strict
469
470 Convenience instance. The same as ``default`` except that
471 ``raise_on_defect`` is set to ``True``. This allows any policy to be made
472 strict by writing::
473
474 somepolicy + policy.strict
475
476With all of these :class:`EmailPolicies <.EmailPolicy>`, the effective API of
477the email package is changed from the Python 3.2 API in the following ways:
478
479 * Setting a header on a :class:`~email.message.Message` results in that
480 header being parsed and a custom header object created.
481
482 * Fetching a header value from a :class:`~email.message.Message` results
483 in that header being parsed and a custom header object created and
484 returned.
485
486 * Any custom header object, or any header that is refolded due to the
487 policy settings, is folded using an algorithm that fully implements the
488 RFC folding algorithms, including knowing where encoded words are required
489 and allowed.
490
491From the application view, this means that any header obtained through the
492:class:`~email.message.Message` is a custom header object with custom
493attributes, whose string value is the fully decoded unicode value of the
494header. Likewise, a header may be assigned a new value, or a new header
495created, using a unicode string, and the policy will take care of converting
496the unicode string into the correct RFC encoded form.
497
R David Murrayea976682012-05-27 15:03:38 -0400498The custom header objects and their attributes are described in
499:mod:`~email.headerregistry`.