Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | |
| 2 | :mod:`rfc822` --- Parse RFC 2822 mail headers |
| 3 | ============================================= |
| 4 | |
| 5 | .. module:: rfc822 |
| 6 | :synopsis: Parse 2822 style mail messages. |
Georg Brandl | 7f758c4 | 2007-08-15 18:41:25 +0000 | [diff] [blame] | 7 | :deprecated: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 8 | |
| 9 | |
| 10 | .. deprecated:: 2.3 |
| 11 | The :mod:`email` package should be used in preference to the :mod:`rfc822` |
| 12 | module. This module is present only to maintain backward compatibility. |
| 13 | |
| 14 | This module defines a class, :class:`Message`, which represents an "email |
| 15 | message" as defined by the Internet standard :rfc:`2822`. [#]_ Such messages |
| 16 | consist of a collection of message headers, and a message body. This module |
| 17 | also defines a helper class :class:`AddressList` for parsing :rfc:`2822` |
| 18 | addresses. Please refer to the RFC for information on the specific syntax of |
| 19 | :rfc:`2822` messages. |
| 20 | |
| 21 | .. index:: module: mailbox |
| 22 | |
| 23 | The :mod:`mailbox` module provides classes to read mailboxes produced by |
| 24 | various end-user mail programs. |
| 25 | |
| 26 | |
| 27 | .. class:: Message(file[, seekable]) |
| 28 | |
| 29 | A :class:`Message` instance is instantiated with an input object as parameter. |
| 30 | Message relies only on the input object having a :meth:`readline` method; in |
| 31 | particular, ordinary file objects qualify. Instantiation reads headers from the |
| 32 | input object up to a delimiter line (normally a blank line) and stores them in |
| 33 | the instance. The message body, following the headers, is not consumed. |
| 34 | |
| 35 | This class can work with any input object that supports a :meth:`readline` |
| 36 | method. If the input object has seek and tell capability, the |
| 37 | :meth:`rewindbody` method will work; also, illegal lines will be pushed back |
| 38 | onto the input stream. If the input object lacks seek but has an :meth:`unread` |
| 39 | method that can push back a line of input, :class:`Message` will use that to |
| 40 | push back illegal lines. Thus this class can be used to parse messages coming |
| 41 | from a buffered stream. |
| 42 | |
| 43 | The optional *seekable* argument is provided as a workaround for certain stdio |
| 44 | libraries in which :cfunc:`tell` discards buffered data before discovering that |
| 45 | the :cfunc:`lseek` system call doesn't work. For maximum portability, you |
| 46 | should set the seekable argument to zero to prevent that initial :meth:`tell` |
| 47 | when passing in an unseekable object such as a file object created from a socket |
| 48 | object. |
| 49 | |
| 50 | Input lines as read from the file may either be terminated by CR-LF or by a |
| 51 | single linefeed; a terminating CR-LF is replaced by a single linefeed before the |
| 52 | line is stored. |
| 53 | |
| 54 | All header matching is done independent of upper or lower case; e.g. |
| 55 | ``m['From']``, ``m['from']`` and ``m['FROM']`` all yield the same result. |
| 56 | |
| 57 | |
| 58 | .. class:: AddressList(field) |
| 59 | |
| 60 | You may instantiate the :class:`AddressList` helper class using a single string |
| 61 | parameter, a comma-separated list of :rfc:`2822` addresses to be parsed. (The |
| 62 | parameter ``None`` yields an empty list.) |
| 63 | |
| 64 | |
| 65 | .. function:: quote(str) |
| 66 | |
| 67 | Return a new string with backslashes in *str* replaced by two backslashes and |
| 68 | double quotes replaced by backslash-double quote. |
| 69 | |
| 70 | |
| 71 | .. function:: unquote(str) |
| 72 | |
| 73 | Return a new string which is an *unquoted* version of *str*. If *str* ends and |
| 74 | begins with double quotes, they are stripped off. Likewise if *str* ends and |
| 75 | begins with angle brackets, they are stripped off. |
| 76 | |
| 77 | |
| 78 | .. function:: parseaddr(address) |
| 79 | |
| 80 | Parse *address*, which should be the value of some address-containing field such |
| 81 | as :mailheader:`To` or :mailheader:`Cc`, into its constituent "realname" and |
| 82 | "email address" parts. Returns a tuple of that information, unless the parse |
| 83 | fails, in which case a 2-tuple ``(None, None)`` is returned. |
| 84 | |
| 85 | |
| 86 | .. function:: dump_address_pair(pair) |
| 87 | |
| 88 | The inverse of :meth:`parseaddr`, this takes a 2-tuple of the form ``(realname, |
| 89 | email_address)`` and returns the string value suitable for a :mailheader:`To` or |
| 90 | :mailheader:`Cc` header. If the first element of *pair* is false, then the |
| 91 | second element is returned unmodified. |
| 92 | |
| 93 | |
| 94 | .. function:: parsedate(date) |
| 95 | |
| 96 | Attempts to parse a date according to the rules in :rfc:`2822`. however, some |
| 97 | mailers don't follow that format as specified, so :func:`parsedate` tries to |
| 98 | guess correctly in such cases. *date* is a string containing an :rfc:`2822` |
| 99 | date, such as ``'Mon, 20 Nov 1995 19:12:08 -0500'``. If it succeeds in parsing |
| 100 | the date, :func:`parsedate` returns a 9-tuple that can be passed directly to |
| 101 | :func:`time.mktime`; otherwise ``None`` will be returned. Note that indexes 6, |
| 102 | 7, and 8 of the result tuple are not usable. |
| 103 | |
| 104 | |
| 105 | .. function:: parsedate_tz(date) |
| 106 | |
| 107 | Performs the same function as :func:`parsedate`, but returns either ``None`` or |
| 108 | a 10-tuple; the first 9 elements make up a tuple that can be passed directly to |
| 109 | :func:`time.mktime`, and the tenth is the offset of the date's timezone from UTC |
| 110 | (which is the official term for Greenwich Mean Time). (Note that the sign of |
| 111 | the timezone offset is the opposite of the sign of the ``time.timezone`` |
| 112 | variable for the same timezone; the latter variable follows the POSIX standard |
| 113 | while this module follows :rfc:`2822`.) If the input string has no timezone, |
| 114 | the last element of the tuple returned is ``None``. Note that indexes 6, 7, and |
| 115 | 8 of the result tuple are not usable. |
| 116 | |
| 117 | |
| 118 | .. function:: mktime_tz(tuple) |
| 119 | |
| 120 | Turn a 10-tuple as returned by :func:`parsedate_tz` into a UTC timestamp. If |
| 121 | the timezone item in the tuple is ``None``, assume local time. Minor |
| 122 | deficiency: this first interprets the first 8 elements as a local time and then |
| 123 | compensates for the timezone difference; this may yield a slight error around |
| 124 | daylight savings time switch dates. Not enough to worry about for common use. |
| 125 | |
| 126 | |
| 127 | .. seealso:: |
| 128 | |
| 129 | Module :mod:`email` |
| 130 | Comprehensive email handling package; supersedes the :mod:`rfc822` module. |
| 131 | |
| 132 | Module :mod:`mailbox` |
| 133 | Classes to read various mailbox formats produced by end-user mail programs. |
| 134 | |
| 135 | Module :mod:`mimetools` |
| 136 | Subclass of :class:`rfc822.Message` that handles MIME encoded messages. |
| 137 | |
| 138 | |
| 139 | .. _message-objects: |
| 140 | |
| 141 | Message Objects |
| 142 | --------------- |
| 143 | |
| 144 | A :class:`Message` instance has the following methods: |
| 145 | |
| 146 | |
| 147 | .. method:: Message.rewindbody() |
| 148 | |
| 149 | Seek to the start of the message body. This only works if the file object is |
| 150 | seekable. |
| 151 | |
| 152 | |
| 153 | .. method:: Message.isheader(line) |
| 154 | |
| 155 | Returns a line's canonicalized fieldname (the dictionary key that will be used |
| 156 | to index it) if the line is a legal :rfc:`2822` header; otherwise returns |
| 157 | ``None`` (implying that parsing should stop here and the line be pushed back on |
| 158 | the input stream). It is sometimes useful to override this method in a |
| 159 | subclass. |
| 160 | |
| 161 | |
| 162 | .. method:: Message.islast(line) |
| 163 | |
| 164 | Return true if the given line is a delimiter on which Message should stop. The |
| 165 | delimiter line is consumed, and the file object's read location positioned |
| 166 | immediately after it. By default this method just checks that the line is |
| 167 | blank, but you can override it in a subclass. |
| 168 | |
| 169 | |
| 170 | .. method:: Message.iscomment(line) |
| 171 | |
| 172 | Return ``True`` if the given line should be ignored entirely, just skipped. By |
| 173 | default this is a stub that always returns ``False``, but you can override it in |
| 174 | a subclass. |
| 175 | |
| 176 | |
| 177 | .. method:: Message.getallmatchingheaders(name) |
| 178 | |
| 179 | Return a list of lines consisting of all headers matching *name*, if any. Each |
| 180 | physical line, whether it is a continuation line or not, is a separate list |
| 181 | item. Return the empty list if no header matches *name*. |
| 182 | |
| 183 | |
| 184 | .. method:: Message.getfirstmatchingheader(name) |
| 185 | |
| 186 | Return a list of lines comprising the first header matching *name*, and its |
| 187 | continuation line(s), if any. Return ``None`` if there is no header matching |
| 188 | *name*. |
| 189 | |
| 190 | |
| 191 | .. method:: Message.getrawheader(name) |
| 192 | |
| 193 | Return a single string consisting of the text after the colon in the first |
| 194 | header matching *name*. This includes leading whitespace, the trailing |
| 195 | linefeed, and internal linefeeds and whitespace if there any continuation |
| 196 | line(s) were present. Return ``None`` if there is no header matching *name*. |
| 197 | |
| 198 | |
| 199 | .. method:: Message.getheader(name[, default]) |
| 200 | |
Georg Brandl | 94bda3a | 2007-08-24 17:23:23 +0000 | [diff] [blame] | 201 | Return a single string consisting of the last header matching *name*, |
| 202 | but strip leading and trailing whitespace. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 203 | Internal whitespace is not stripped. The optional *default* argument can be |
| 204 | used to specify a different default to be returned when there is no header |
Georg Brandl | 94bda3a | 2007-08-24 17:23:23 +0000 | [diff] [blame] | 205 | matching *name*; it defaults to ``None``. |
| 206 | This is the preferred way to get parsed headers. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 207 | |
| 208 | |
| 209 | .. method:: Message.get(name[, default]) |
| 210 | |
| 211 | An alias for :meth:`getheader`, to make the interface more compatible with |
| 212 | regular dictionaries. |
| 213 | |
| 214 | |
| 215 | .. method:: Message.getaddr(name) |
| 216 | |
| 217 | Return a pair ``(full name, email address)`` parsed from the string returned by |
| 218 | ``getheader(name)``. If no header matching *name* exists, return ``(None, |
| 219 | None)``; otherwise both the full name and the address are (possibly empty) |
| 220 | strings. |
| 221 | |
| 222 | Example: If *m*'s first :mailheader:`From` header contains the string |
| 223 | ``'jack@cwi.nl (Jack Jansen)'``, then ``m.getaddr('From')`` will yield the pair |
| 224 | ``('Jack Jansen', 'jack@cwi.nl')``. If the header contained ``'Jack Jansen |
| 225 | <jack@cwi.nl>'`` instead, it would yield the exact same result. |
| 226 | |
| 227 | |
| 228 | .. method:: Message.getaddrlist(name) |
| 229 | |
| 230 | This is similar to ``getaddr(list)``, but parses a header containing a list of |
| 231 | email addresses (e.g. a :mailheader:`To` header) and returns a list of ``(full |
| 232 | name, email address)`` pairs (even if there was only one address in the header). |
| 233 | If there is no header matching *name*, return an empty list. |
| 234 | |
| 235 | If multiple headers exist that match the named header (e.g. if there are several |
| 236 | :mailheader:`Cc` headers), all are parsed for addresses. Any continuation lines |
| 237 | the named headers contain are also parsed. |
| 238 | |
| 239 | |
| 240 | .. method:: Message.getdate(name) |
| 241 | |
| 242 | Retrieve a header using :meth:`getheader` and parse it into a 9-tuple compatible |
| 243 | with :func:`time.mktime`; note that fields 6, 7, and 8 are not usable. If |
| 244 | there is no header matching *name*, or it is unparsable, return ``None``. |
| 245 | |
| 246 | Date parsing appears to be a black art, and not all mailers adhere to the |
| 247 | standard. While it has been tested and found correct on a large collection of |
| 248 | email from many sources, it is still possible that this function may |
| 249 | occasionally yield an incorrect result. |
| 250 | |
| 251 | |
| 252 | .. method:: Message.getdate_tz(name) |
| 253 | |
| 254 | Retrieve a header using :meth:`getheader` and parse it into a 10-tuple; the |
| 255 | first 9 elements will make a tuple compatible with :func:`time.mktime`, and the |
| 256 | 10th is a number giving the offset of the date's timezone from UTC. Note that |
| 257 | fields 6, 7, and 8 are not usable. Similarly to :meth:`getdate`, if there is |
| 258 | no header matching *name*, or it is unparsable, return ``None``. |
| 259 | |
| 260 | :class:`Message` instances also support a limited mapping interface. In |
| 261 | particular: ``m[name]`` is like ``m.getheader(name)`` but raises :exc:`KeyError` |
| 262 | if there is no matching header; and ``len(m)``, ``m.get(name[, default])``, |
Georg Brandl | 8ca6c84 | 2008-03-28 12:22:12 +0000 | [diff] [blame] | 263 | ``name in m``, ``m.keys()``, ``m.values()`` ``m.items()``, and |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 264 | ``m.setdefault(name[, default])`` act as expected, with the one difference |
| 265 | that :meth:`setdefault` uses an empty string as the default value. |
| 266 | :class:`Message` instances also support the mapping writable interface ``m[name] |
| 267 | = value`` and ``del m[name]``. :class:`Message` objects do not support the |
| 268 | :meth:`clear`, :meth:`copy`, :meth:`popitem`, or :meth:`update` methods of the |
| 269 | mapping interface. (Support for :meth:`get` and :meth:`setdefault` was only |
| 270 | added in Python 2.2.) |
| 271 | |
| 272 | Finally, :class:`Message` instances have some public instance variables: |
| 273 | |
| 274 | |
| 275 | .. attribute:: Message.headers |
| 276 | |
| 277 | A list containing the entire set of header lines, in the order in which they |
| 278 | were read (except that setitem calls may disturb this order). Each line contains |
| 279 | a trailing newline. The blank line terminating the headers is not contained in |
| 280 | the list. |
| 281 | |
| 282 | |
| 283 | .. attribute:: Message.fp |
| 284 | |
| 285 | The file or file-like object passed at instantiation time. This can be used to |
| 286 | read the message content. |
| 287 | |
| 288 | |
| 289 | .. attribute:: Message.unixfrom |
| 290 | |
| 291 | The Unix ``From`` line, if the message had one, or an empty string. This is |
| 292 | needed to regenerate the message in some contexts, such as an ``mbox``\ -style |
| 293 | mailbox file. |
| 294 | |
| 295 | |
| 296 | .. _addresslist-objects: |
| 297 | |
| 298 | AddressList Objects |
| 299 | ------------------- |
| 300 | |
| 301 | An :class:`AddressList` instance has the following methods: |
| 302 | |
| 303 | |
| 304 | .. method:: AddressList.__len__() |
| 305 | |
| 306 | Return the number of addresses in the address list. |
| 307 | |
| 308 | |
| 309 | .. method:: AddressList.__str__() |
| 310 | |
| 311 | Return a canonicalized string representation of the address list. Addresses are |
| 312 | rendered in "name" <host@domain> form, comma-separated. |
| 313 | |
| 314 | |
| 315 | .. method:: AddressList.__add__(alist) |
| 316 | |
| 317 | Return a new :class:`AddressList` instance that contains all addresses in both |
| 318 | :class:`AddressList` operands, with duplicates removed (set union). |
| 319 | |
| 320 | |
| 321 | .. method:: AddressList.__iadd__(alist) |
| 322 | |
| 323 | In-place version of :meth:`__add__`; turns this :class:`AddressList` instance |
| 324 | into the union of itself and the right-hand instance, *alist*. |
| 325 | |
| 326 | |
| 327 | .. method:: AddressList.__sub__(alist) |
| 328 | |
| 329 | Return a new :class:`AddressList` instance that contains every address in the |
| 330 | left-hand :class:`AddressList` operand that is not present in the right-hand |
| 331 | address operand (set difference). |
| 332 | |
| 333 | |
| 334 | .. method:: AddressList.__isub__(alist) |
| 335 | |
| 336 | In-place version of :meth:`__sub__`, removing addresses in this list which are |
| 337 | also in *alist*. |
| 338 | |
| 339 | Finally, :class:`AddressList` instances have one public instance variable: |
| 340 | |
| 341 | |
| 342 | .. attribute:: AddressList.addresslist |
| 343 | |
| 344 | A list of tuple string pairs, one per address. In each member, the first is the |
| 345 | canonicalized name part, the second is the actual route-address (``'@'``\ |
| 346 | -separated username-host.domain pair). |
| 347 | |
| 348 | .. rubric:: Footnotes |
| 349 | |
| 350 | .. [#] This module originally conformed to :rfc:`822`, hence the name. Since then, |
| 351 | :rfc:`2822` has been released as an update to :rfc:`822`. This module should be |
| 352 | considered :rfc:`2822`\ -conformant, especially in cases where the syntax or |
| 353 | semantics have changed since :rfc:`822`. |
| 354 | |