blob: fa25ba5b1bf8d09a32c9640ebb3a084b8e0e0d79 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`rfc822` --- Parse RFC 2822 mail headers
3=============================================
4
5.. module:: rfc822
6 :synopsis: Parse 2822 style mail messages.
7
8
9.. deprecated:: 2.3
10 The :mod:`email` package should be used in preference to the :mod:`rfc822`
11 module. This module is present only to maintain backward compatibility.
12
13This module defines a class, :class:`Message`, which represents an "email
14message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages
15consist of a collection of message headers, and a message body. This module
16also defines a helper class :class:`AddressList` for parsing :rfc:`2822`
17addresses. Please refer to the RFC for information on the specific syntax of
18:rfc:`2822` messages.
19
20.. index:: module: mailbox
21
22The :mod:`mailbox` module provides classes to read mailboxes produced by
23various end-user mail programs.
24
25
26.. class:: Message(file[, seekable])
27
28 A :class:`Message` instance is instantiated with an input object as parameter.
29 Message relies only on the input object having a :meth:`readline` method; in
30 particular, ordinary file objects qualify. Instantiation reads headers from the
31 input object up to a delimiter line (normally a blank line) and stores them in
32 the instance. The message body, following the headers, is not consumed.
33
34 This class can work with any input object that supports a :meth:`readline`
35 method. If the input object has seek and tell capability, the
36 :meth:`rewindbody` method will work; also, illegal lines will be pushed back
37 onto the input stream. If the input object lacks seek but has an :meth:`unread`
38 method that can push back a line of input, :class:`Message` will use that to
39 push back illegal lines. Thus this class can be used to parse messages coming
40 from a buffered stream.
41
42 The optional *seekable* argument is provided as a workaround for certain stdio
43 libraries in which :cfunc:`tell` discards buffered data before discovering that
44 the :cfunc:`lseek` system call doesn't work. For maximum portability, you
45 should set the seekable argument to zero to prevent that initial :meth:`tell`
46 when passing in an unseekable object such as a file object created from a socket
47 object.
48
49 Input lines as read from the file may either be terminated by CR-LF or by a
50 single linefeed; a terminating CR-LF is replaced by a single linefeed before the
51 line is stored.
52
53 All header matching is done independent of upper or lower case; e.g.
54 ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result.
55
56
57.. class:: AddressList(field)
58
59 You may instantiate the :class:`AddressList` helper class using a single string
60 parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The
61 parameter ``None`` yields an empty list.)
62
63
64.. function:: quote(str)
65
66 Return a new string with backslashes in *str* replaced by two backslashes and
67 double quotes replaced by backslash-double quote.
68
69
70.. function:: unquote(str)
71
72 Return a new string which is an *unquoted* version of *str*. If *str* ends and
73 begins with double quotes, they are stripped off. Likewise if *str* ends and
74 begins with angle brackets, they are stripped off.
75
76
77.. function:: parseaddr(address)
78
79 Parse *address*, which should be the value of some address-containing field such
80 as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and
81 "email address" parts. Returns a tuple of that information, unless the parse
82 fails, in which case a 2-tuple ``(None, None)`` is returned.
83
84
85.. function:: dump_address_pair(pair)
86
87 The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname,
88 email_address)`` and returns the string value suitable for a :mailheader:`To` or
89 :mailheader:`Cc` header. If the first element of *pair* is false, then the
90 second element is returned unmodified.
91
92
93.. function:: parsedate(date)
94
95 Attempts to parse a date according to the rules in :rfc:`2822`. however, some
96 mailers don't follow that format as specified, so :func:`parsedate` tries to
97 guess correctly in such cases. *date* is a string containing an :rfc:`2822`
98 date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing
99 the date, :func:`parsedate` returns a 9-tuple that can be passed directly to
100 :func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6,
101 7, and 8 of the result tuple are not usable.
102
103
104.. function:: parsedate_tz(date)
105
106 Performs the same function as :func:`parsedate`, but returns either ``None`` or
107 a 10-tuple; the first 9 elements make up a tuple that can be passed directly to
108 :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC
109 (which is the official term for Greenwich Mean Time). (Note that the sign of
110 the timezone offset is the opposite of the sign of the ``time.timezone``
111 variable for the same timezone; the latter variable follows the POSIX standard
112 while this module follows :rfc:`2822`.) If the input string has no timezone,
113 the last element of the tuple returned is ``None``. Note that indexes 6, 7, and
114 8 of the result tuple are not usable.
115
116
117.. function:: mktime_tz(tuple)
118
119 Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If
120 the timezone item in the tuple is ``None``, assume local time. Minor
121 deficiency: this first interprets the first 8 elements as a local time and then
122 compensates for the timezone difference; this may yield a slight error around
123 daylight savings time switch dates. Not enough to worry about for common use.
124
125
126.. seealso::
127
128 Module :mod:`email`
129 Comprehensive email handling package; supersedes the :mod:`rfc822` module.
130
131 Module :mod:`mailbox`
132 Classes to read various mailbox formats produced by end-user mail programs.
133
134 Module :mod:`mimetools`
135 Subclass of :class:`rfc822.Message` that handles MIME encoded messages.
136
137
138.. _message-objects:
139
140Message Objects
141---------------
142
143A :class:`Message` instance has the following methods:
144
145
146.. method:: Message.rewindbody()
147
148 Seek to the start of the message body. This only works if the file object is
149 seekable.
150
151
152.. method:: Message.isheader(line)
153
154 Returns a line's canonicalized fieldname (the dictionary key that will be used
155 to index it) if the line is a legal :rfc:`2822` header; otherwise returns
156 ``None`` (implying that parsing should stop here and the line be pushed back on
157 the input stream). It is sometimes useful to override this method in a
158 subclass.
159
160
161.. method:: Message.islast(line)
162
163 Return true if the given line is a delimiter on which Message should stop. The
164 delimiter line is consumed, and the file object's read location positioned
165 immediately after it. By default this method just checks that the line is
166 blank, but you can override it in a subclass.
167
168
169.. method:: Message.iscomment(line)
170
171 Return ``True`` if the given line should be ignored entirely, just skipped. By
172 default this is a stub that always returns ``False``, but you can override it in
173 a subclass.
174
175
176.. method:: Message.getallmatchingheaders(name)
177
178 Return a list of lines consisting of all headers matching *name*, if any. Each
179 physical line, whether it is a continuation line or not, is a separate list
180 item. Return the empty list if no header matches *name*.
181
182
183.. method:: Message.getfirstmatchingheader(name)
184
185 Return a list of lines comprising the first header matching *name*, and its
186 continuation line(s), if any. Return ``None`` if there is no header matching
187 *name*.
188
189
190.. method:: Message.getrawheader(name)
191
192 Return a single string consisting of the text after the colon in the first
193 header matching *name*. This includes leading whitespace, the trailing
194 linefeed, and internal linefeeds and whitespace if there any continuation
195 line(s) were present. Return ``None`` if there is no header matching *name*.
196
197
198.. method:: Message.getheader(name[, default])
199
200 Like ``getrawheader(name)``, but strip leading and trailing whitespace.
201 Internal whitespace is not stripped. The optional *default* argument can be
202 used to specify a different default to be returned when there is no header
203 matching *name*.
204
205
206.. method:: Message.get(name[, default])
207
208 An alias for :meth:`getheader`, to make the interface more compatible with
209 regular dictionaries.
210
211
212.. method:: Message.getaddr(name)
213
214 Return a pair ``(full name, email address)`` parsed from the string returned by
215 ``getheader(name)``. If no header matching *name* exists, return ``(None,
216 None)``; otherwise both the full name and the address are (possibly empty)
217 strings.
218
219 Example: If *m*'s first :mailheader:`From` header contains the string
220 ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair
221 ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen
222 <jack@cwi.nl>'`` instead, it would yield the exact same result.
223
224
225.. method:: Message.getaddrlist(name)
226
227 This is similar to ``getaddr(list)``, but parses a header containing a list of
228 email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full
229 name, email address)`` pairs (even if there was only one address in the header).
230 If there is no header matching *name*, return an empty list.
231
232 If multiple headers exist that match the named header (e.g. if there are several
233 :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines
234 the named headers contain are also parsed.
235
236
237.. method:: Message.getdate(name)
238
239 Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible
240 with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If
241 there is no header matching *name*, or it is unparsable, return ``None``.
242
243 Date parsing appears to be a black art, and not all mailers adhere to the
244 standard. While it has been tested and found correct on a large collection of
245 email from many sources, it is still possible that this function may
246 occasionally yield an incorrect result.
247
248
249.. method:: Message.getdate_tz(name)
250
251 Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the
252 first 9 elements will make a tuple compatible with :func:`time.mktime`, and the
253 10th is a number giving the offset of the date's timezone from UTC. Note that
254 fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is
255 no header matching *name*, or it is unparsable, return ``None``.
256
257:class:`Message` instances also support a limited mapping interface. In
258particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError`
259if there is no matching header; and ``len(m)``, ``m.get(name[, default])``,
260``m.has_key(name)``, ``m.keys()``, ``m.values()`` ``m.items()``, and
261``m.setdefault(name[, default])`` act as expected, with the one difference
262that :meth:`setdefault` uses an empty string as the default value.
263:class:`Message` instances also support the mapping writable interface ``m[name]
264= value`` and ``del m[name]``. :class:`Message` objects do not support the
265:meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the
266mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only
267added in Python 2.2.)
268
269Finally, :class:`Message` instances have some public instance variables:
270
271
272.. attribute:: Message.headers
273
274 A list containing the entire set of header lines, in the order in which they
275 were read (except that setitem calls may disturb this order). Each line contains
276 a trailing newline. The blank line terminating the headers is not contained in
277 the list.
278
279
280.. attribute:: Message.fp
281
282 The file or file-like object passed at instantiation time. This can be used to
283 read the message content.
284
285
286.. attribute:: Message.unixfrom
287
288 The Unix ``From`` line, if the message had one, or an empty string. This is
289 needed to regenerate the message in some contexts, such as an ``mbox``\ -style
290 mailbox file.
291
292
293.. _addresslist-objects:
294
295AddressList Objects
296-------------------
297
298An :class:`AddressList` instance has the following methods:
299
300
301.. method:: AddressList.__len__()
302
303 Return the number of addresses in the address list.
304
305
306.. method:: AddressList.__str__()
307
308 Return a canonicalized string representation of the address list. Addresses are
309 rendered in "name" <host@domain> form, comma-separated.
310
311
312.. method:: AddressList.__add__(alist)
313
314 Return a new :class:`AddressList` instance that contains all addresses in both
315 :class:`AddressList` operands, with duplicates removed (set union).
316
317
318.. method:: AddressList.__iadd__(alist)
319
320 In-place version of :meth:`__add__`; turns this :class:`AddressList` instance
321 into the union of itself and the right-hand instance, *alist*.
322
323
324.. method:: AddressList.__sub__(alist)
325
326 Return a new :class:`AddressList` instance that contains every address in the
327 left-hand :class:`AddressList` operand that is not present in the right-hand
328 address operand (set difference).
329
330
331.. method:: AddressList.__isub__(alist)
332
333 In-place version of :meth:`__sub__`, removing addresses in this list which are
334 also in *alist*.
335
336Finally, :class:`AddressList` instances have one public instance variable:
337
338
339.. attribute:: AddressList.addresslist
340
341 A list of tuple string pairs, one per address. In each member, the first is the
342 canonicalized name part, the second is the actual route-address (``'@'``\
343 -separated username-host.domain pair).
344
345.. rubric:: Footnotes
346
347.. [#] This module originally conformed to :rfc:`822`, hence the name. Since then,
348 :rfc:`2822` has been released as an update to :rfc:`822`. This module should be
349 considered :rfc:`2822`\ -conformant, especially in cases where the syntax or
350 semantics have changed since :rfc:`822`.
351