blob: 33aa8519aa9e10ff609e81709d760b8919a4d36d [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`rfc822` --- Parse RFC 2822 mail headers
3=============================================
4
5.. module:: rfc822
6 :synopsis: Parse 2822 style mail messages.
Georg Brandl7f758c42007-08-15 18:41:25 +00007 :deprecated:
Georg Brandl8ec7f652007-08-15 14:28:01 +00008
9
10.. deprecated:: 2.3
11 The :mod:`email` package should be used in preference to the :mod:`rfc822`
Benjamin Peterson26305a02008-06-12 22:33:06 +000012 module. This module is present only to maintain backward compatibility, and
Ezio Melotti510ff542012-05-03 19:21:40 +030013 has been removed in Python 3.
Georg Brandl8ec7f652007-08-15 14:28:01 +000014
15This module defines a class, :class:`Message`, which represents an "email
16message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages
17consist of a collection of message headers, and a message body. This module
18also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
19addresses. Please refer to the RFC for information on the specific syntax of
20:rfc:`2822` messages.
21
22.. index:: module: mailbox
23
24The :mod:`mailbox` module provides classes to read mailboxes produced by
25various end-user mail programs.
26
27
28.. class:: Message(file[, seekable])
29
30 A :class:`Message` instance is instantiated with an input object as parameter.
31 Message relies only on the input object having a :meth:`readline` method; in
32 particular, ordinary file objects qualify. Instantiation reads headers from the
33 input object up to a delimiter line (normally a blank line) and stores them in
34 the instance. The message body, following the headers, is not consumed.
35
36 This class can work with any input object that supports a :meth:`readline`
37 method. If the input object has seek and tell capability, the
38 :meth:`rewindbody` method will work; also, illegal lines will be pushed back
39 onto the input stream. If the input object lacks seek but has an :meth:`unread`
40 method that can push back a line of input, :class:`Message` will use that to
41 push back illegal lines. Thus this class can be used to parse messages coming
42 from a buffered stream.
43
44 The optional *seekable* argument is provided as a workaround for certain stdio
Sandro Tosi98ed08f2012-01-14 16:42:02 +010045 libraries in which :c:func:`tell` discards buffered data before discovering that
46 the :c:func:`lseek` system call doesn't work. For maximum portability, you
Georg Brandl8ec7f652007-08-15 14:28:01 +000047 should set the seekable argument to zero to prevent that initial :meth:`tell`
48 when passing in an unseekable object such as a file object created from a socket
49 object.
50
51 Input lines as read from the file may either be terminated by CR-LF or by a
52 single linefeed; a terminating CR-LF is replaced by a single linefeed before the
53 line is stored.
54
55 All header matching is done independent of upper or lower case; e.g.
56 ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.
57
58
59.. class:: AddressList(field)
60
61 You may instantiate the :class:`AddressList` helper class using a single string
62 parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The
63 parameter ``None`` yields an empty list.)
64
65
66.. function:: quote(str)
67
68 Return a new string with backslashes in *str* replaced by two backslashes and
69 double quotes replaced by backslash-double quote.
70
71
72.. function:: unquote(str)
73
74 Return a new string which is an *unquoted* version of *str*. If *str* ends and
75 begins with double quotes, they are stripped off. Likewise if *str* ends and
76 begins with angle brackets, they are stripped off.
77
78
79.. function:: parseaddr(address)
80
81 Parse *address*, which should be the value of some address-containing field such
82 as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
83 "email address" parts. Returns a tuple of that information, unless the parse
84 fails, in which case a 2-tuple ``(None, None)`` is returned.
85
86
87.. function:: dump_address_pair(pair)
88
89 The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
90 email_address)`` and returns the string value suitable for a :mailheader:`To` or
91 :mailheader:`Cc` header. If the first element of *pair* is false, then the
92 second element is returned unmodified.
93
94
95.. function:: parsedate(date)
96
97 Attempts to parse a date according to the rules in :rfc:`2822`. however, some
98 mailers don't follow that format as specified, so :func:`parsedate` tries to
99 guess correctly in such cases. *date* is a string containing an :rfc:`2822`
100 date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing
101 the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
102 :func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6,
103 7, and 8 of the result tuple are not usable.
104
105
106.. function:: parsedate_tz(date)
107
108 Performs the same function as :func:`parsedate`, but returns either ``None`` or
109 a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
110 :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
111 (which is the official term for Greenwich Mean Time). (Note that the sign of
112 the timezone offset is the opposite of the sign of the ``time.timezone``
113 variable for the same timezone; the latter variable follows the POSIX standard
114 while this module follows :rfc:`2822`.) If the input string has no timezone,
115 the last element of the tuple returned is ``None``. Note that indexes 6, 7, and
116 8 of the result tuple are not usable.
117
118
119.. function:: mktime_tz(tuple)
120
121 Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If
122 the timezone item in the tuple is ``None``, assume local time. Minor
123 deficiency: this first interprets the first 8 elements as a local time and then
124 compensates for the timezone difference; this may yield a slight error around
125 daylight savings time switch dates. Not enough to worry about for common use.
126
127
128.. seealso::
129
130 Module :mod:`email`
131 Comprehensive email handling package; supersedes the :mod:`rfc822` module.
132
133 Module :mod:`mailbox`
134 Classes to read various mailbox formats produced by end-user mail programs.
135
136 Module :mod:`mimetools`
137 Subclass of :class:`rfc822.Message` that handles MIME encoded messages.
138
139
140.. _message-objects:
141
142Message Objects
143---------------
144
145A :class:`Message` instance has the following methods:
146
147
148.. method:: Message.rewindbody()
149
150 Seek to the start of the message body. This only works if the file object is
151 seekable.
152
153
154.. method:: Message.isheader(line)
155
156 Returns a line's canonicalized fieldname (the dictionary key that will be used
157 to index it) if the line is a legal :rfc:`2822` header; otherwise returns
158 ``None`` (implying that parsing should stop here and the line be pushed back on
159 the input stream). It is sometimes useful to override this method in a
160 subclass.
161
162
163.. method:: Message.islast(line)
164
165 Return true if the given line is a delimiter on which Message should stop. The
166 delimiter line is consumed, and the file object's read location positioned
167 immediately after it. By default this method just checks that the line is
168 blank, but you can override it in a subclass.
169
170
171.. method:: Message.iscomment(line)
172
173 Return ``True`` if the given line should be ignored entirely, just skipped. By
174 default this is a stub that always returns ``False``, but you can override it in
175 a subclass.
176
177
178.. method:: Message.getallmatchingheaders(name)
179
180 Return a list of lines consisting of all headers matching *name*, if any. Each
181 physical line, whether it is a continuation line or not, is a separate list
182 item. Return the empty list if no header matches *name*.
183
184
185.. method:: Message.getfirstmatchingheader(name)
186
187 Return a list of lines comprising the first header matching *name*, and its
188 continuation line(s), if any. Return ``None`` if there is no header matching
189 *name*.
190
191
192.. method:: Message.getrawheader(name)
193
194 Return a single string consisting of the text after the colon in the first
195 header matching *name*. This includes leading whitespace, the trailing
196 linefeed, and internal linefeeds and whitespace if there any continuation
197 line(s) were present. Return ``None`` if there is no header matching *name*.
198
199
200.. method:: Message.getheader(name[, default])
201
Georg Brandl94bda3a2007-08-24 17:23:23 +0000202 Return a single string consisting of the last header matching *name*,
203 but strip leading and trailing whitespace.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000204 Internal whitespace is not stripped. The optional *default* argument can be
205 used to specify a different default to be returned when there is no header
Georg Brandl94bda3a2007-08-24 17:23:23 +0000206 matching *name*; it defaults to ``None``.
207 This is the preferred way to get parsed headers.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000208
209
210.. method:: Message.get(name[, default])
211
212 An alias for :meth:`getheader`, to make the interface more compatible with
213 regular dictionaries.
214
215
216.. method:: Message.getaddr(name)
217
218 Return a pair ``(full name, email address)`` parsed from the string returned by
219 ``getheader(name)``. If no header matching *name* exists, return ``(None,
220 None)``; otherwise both the full name and the address are (possibly empty)
221 strings.
222
223 Example: If *m*'s first :mailheader:`From` header contains the string
224 ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
225 ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
226 <jack@cwi.nl>'`` instead, it would yield the exact same result.
227
228
229.. method:: Message.getaddrlist(name)
230
231 This is similar to ``getaddr(list)``, but parses a header containing a list of
232 email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
233 name, email address)`` pairs (even if there was only one address in the header).
234 If there is no header matching *name*, return an empty list.
235
236 If multiple headers exist that match the named header (e.g. if there are several
237 :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
238 the named headers contain are also parsed.
239
240
241.. method:: Message.getdate(name)
242
243 Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
244 with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If
245 there is no header matching *name*, or it is unparsable, return ``None``.
246
247 Date parsing appears to be a black art, and not all mailers adhere to the
248 standard. While it has been tested and found correct on a large collection of
249 email from many sources, it is still possible that this function may
250 occasionally yield an incorrect result.
251
252
253.. method:: Message.getdate_tz(name)
254
255 Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
256 first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
257 10th is a number giving the offset of the date's timezone from UTC. Note that
258 fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is
259 no header matching *name*, or it is unparsable, return ``None``.
260
261:class:`Message` instances also support a limited mapping interface. In
262particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
263if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
Georg Brandl8ca6c842008-03-28 12:22:12 +0000264``name in m``, ``m.keys()``, ``m.values()`` ``m.items()``, and
Georg Brandl8ec7f652007-08-15 14:28:01 +0000265``m.setdefault(name[, default])`` act as expected, with the one difference
266that :meth:`setdefault` uses an empty string as the default value.
267:class:`Message` instances also support the mapping writable interface ``m[name]
268= value`` and ``del m[name]``. :class:`Message` objects do not support the
269:meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
270mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only
271added in Python 2.2.)
272
273Finally, :class:`Message` instances have some public instance variables:
274
275
276.. attribute:: Message.headers
277
278 A list containing the entire set of header lines, in the order in which they
279 were read (except that setitem calls may disturb this order). Each line contains
280 a trailing newline. The blank line terminating the headers is not contained in
281 the list.
282
283
284.. attribute:: Message.fp
285
286 The file or file-like object passed at instantiation time. This can be used to
287 read the message content.
288
289
290.. attribute:: Message.unixfrom
291
292 The Unix ``From`` line, if the message had one, or an empty string. This is
293 needed to regenerate the message in some contexts, such as an ``mbox``\ -style
294 mailbox file.
295
296
297.. _addresslist-objects:
298
299AddressList Objects
300-------------------
301
302An :class:`AddressList` instance has the following methods:
303
304
305.. method:: AddressList.__len__()
306
307 Return the number of addresses in the address list.
308
309
310.. method:: AddressList.__str__()
311
312 Return a canonicalized string representation of the address list. Addresses are
313 rendered in "name" <host@domain> form, comma-separated.
314
315
316.. method:: AddressList.__add__(alist)
317
318 Return a new :class:`AddressList` instance that contains all addresses in both
319 :class:`AddressList` operands, with duplicates removed (set union).
320
321
322.. method:: AddressList.__iadd__(alist)
323
324 In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
325 into the union of itself and the right-hand instance, *alist*.
326
327
328.. method:: AddressList.__sub__(alist)
329
330 Return a new :class:`AddressList` instance that contains every address in the
331 left-hand :class:`AddressList` operand that is not present in the right-hand
332 address operand (set difference).
333
334
335.. method:: AddressList.__isub__(alist)
336
337 In-place version of :meth:`__sub__`, removing addresses in this list which are
338 also in *alist*.
339
340Finally, :class:`AddressList` instances have one public instance variable:
341
342
343.. attribute:: AddressList.addresslist
344
345 A list of tuple string pairs, one per address. In each member, the first is the
346 canonicalized name part, the second is the actual route-address (``'@'``\
347 -separated username-host.domain pair).
348
349.. rubric:: Footnotes
350
351.. [#] This module originally conformed to :rfc:`822`, hence the name. Since then,
352 :rfc:`2822` has been released as an update to :rfc:`822`. This module should be
353 considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
354 semantics have changed since :rfc:`822`.
355