#4661: add bytes parsing and generation to email (email version bump to 5.1.0)

The work on this is not 100% complete, but everything is present to
allow real-world testing of the code.  The only remaining major todo
item is to (hopefully!) enhance the handling of non-ASCII bytes in headers
converted to unicode by RFC2047 encoding them rather than replacing them with
'?'s.
diff --git a/Doc/library/email.generator.rst b/Doc/library/email.generator.rst
index 930905a..954f175 100644
--- a/Doc/library/email.generator.rst
+++ b/Doc/library/email.generator.rst
@@ -22,6 +22,12 @@
 result in changes to the :class:`~email.message.Message` object as defaults are
 filled in.
 
+:class:`bytes` output can be generated using the :class:`BytesGenerator` class.
+If the message object structure contains non-ASCII bytes, this generator's
+:meth:`~BytesGenerator.flatten` method will emit the original bytes.  Parsing a
+binary message and then flattening it with :class:`BytesGenerator` should be
+idempotent for standards compliant messages.
+
 Here are the public methods of the :class:`Generator` class, imported from the
 :mod:`email.generator` module:
 
@@ -65,6 +71,13 @@
 
       Note that for subparts, no envelope header is ever printed.
 
+      Messages parsed with a Bytes parser that have a
+      :mailheader:`Content-Transfer-Encoding` of 8bit will be converted to a
+      use a 7bit Content-Transfer-Encoding.  Any other non-ASCII bytes in the
+      message structure will be converted to '?' characters.
+
+      .. versionchanged:: 3.2 added support for re-encoding 8bit message bodies.
+
    .. method:: clone(fp)
 
       Return an independent clone of this :class:`Generator` instance with the
@@ -76,11 +89,27 @@
       :class:`Generator`'s constructor.  This provides just enough file-like API
       for :class:`Generator` instances to be used in the :func:`print` function.
 
-As a convenience, see the methods :meth:`Message.as_string` and
-``str(aMessage)``, a.k.a. :meth:`Message.__str__`, which simplify the generation
-of a formatted string representation of a message object.  For more detail, see
+As a convenience, see the :class:`~email.message.Message` methods
+:meth:`~email.message.Message.as_string` and ``str(aMessage)``, a.k.a.
+:meth:`~email.message.Message.__str__`, which simplify the generation of a
+formatted string representation of a message object.  For more detail, see
 :mod:`email.message`.
 
+.. class:: BytesGenerator(outfp, mangle_from_=True, maxheaderlen=78, fmt=None)
+
+   This class has the same API as the :class:`Generator` class, except that
+   *outfp* must be a file like object that will accept :class`bytes` input to
+   its `write` method.  If the message object structure contains non-ASCII
+   bytes, this generator's :meth:`~BytesGenerator.flatten` method will produce
+   them as-is, including preserving parts with a
+   :mailheader:`Content-Transfer-Encoding` of ``8bit``.
+
+   Note that even the :meth:`write` method API is identical:  it expects
+   strings as input, and converts them to bytes by encoding them using
+   the ASCII codec.
+
+   .. versionadded:: 3.2
+
 The :mod:`email.generator` module also provides a derived class, called
 :class:`DecodedGenerator` which is like the :class:`Generator` base class,
 except that non-\ :mimetype:`text` parts are substituted with a format string
diff --git a/Doc/library/email.message.rst b/Doc/library/email.message.rst
index 9dcb2b4..dc305a7 100644
--- a/Doc/library/email.message.rst
+++ b/Doc/library/email.message.rst
@@ -111,9 +111,17 @@
       be decoded if this header's value is ``quoted-printable`` or ``base64``.
       If some other encoding is used, or :mailheader:`Content-Transfer-Encoding`
       header is missing, or if the payload has bogus base64 data, the payload is
-      returned as-is (undecoded).  If the message is a multipart and the
-      *decode* flag is ``True``, then ``None`` is returned.  The default for
-      *decode* is ``False``.
+      returned as-is (undecoded).  In all cases the returned value is binary
+      data.  If the message is a multipart and the *decode* flag is ``True``,
+      then ``None`` is returned.
+
+      When *decode* is ``False`` (the default) the body is returned as a string
+      without decoding the :mailheader:`Content-Transfer-Encoding`.  However,
+      for a :mailheader:`Content-Transfer-Encoding` of 8bit, an attempt is made
+      to decode the original bytes using the `charset` specified by the
+      :mailheader:`Content-Type` header, using the `replace` error handler.  If
+      no `charset` is specified, or if the `charset` given is not recognized by
+      the email package, the body is decoded using the default ASCII charset.
 
 
    .. method:: set_payload(payload, charset=None)
@@ -160,6 +168,10 @@
    Note that in all cases, any envelope header present in the message is not
    included in the mapping interface.
 
+   In a model generated from bytes, any header values that (in contravention
+   of the RFCs) contain non-ASCII bytes will have those bytes transformed
+   into '?' characters when the values are retrieved through this interface.
+
 
    .. method:: __len__()
 
diff --git a/Doc/library/email.parser.rst b/Doc/library/email.parser.rst
index 32f4ff1..77a0b69 100644
--- a/Doc/library/email.parser.rst
+++ b/Doc/library/email.parser.rst
@@ -80,6 +80,14 @@
       if you feed more data to a closed :class:`FeedParser`.
 
 
+.. class:: BytesFeedParser(_factory=email.message.Message)
+
+   Works exactly like :class:`FeedParser` except that the input to the
+   :meth:`~FeedParser.feed` method must be bytes and not string.
+
+   .. versionadded:: 3.2
+
+
 Parser class API
 ^^^^^^^^^^^^^^^^
 
@@ -131,7 +139,7 @@
 
       Similar to the :meth:`parse` method, except it takes a string object
       instead of a file-like object.  Calling this method on a string is exactly
-      equivalent to wrapping *text* in a :class:`StringIO` instance first and
+      equivalent to wrapping *text* in a :class:`~io.StringIO` instance first and
       calling :meth:`parse`.
 
       Optional *headersonly* is a flag specifying whether to stop parsing after
@@ -139,25 +147,78 @@
       the entire contents of the file.
 
 
+.. class:: BytesParser(_class=email.message.Message, strict=None)
+
+   This class is exactly parallel to :class:`Parser`, but handles bytes input.
+   The *_class* and *strict* arguments are interpreted in the same way as for
+   the :class:`Parser` constructor.  *strict* is supported only to make porting
+   code easier; it is deprecated.
+
+   .. method:: parse(fp, headeronly=False)
+
+      Read all the data from the binary file-like object *fp*, parse the
+      resulting bytes, and return the message object.  *fp* must support
+      both the :meth:`readline` and the :meth:`read` methods on file-like
+      objects.
+
+      The bytes contained in *fp* must be formatted as a block of :rfc:`2822`
+      style headers and header continuation lines, optionally preceded by a
+      envelope header.  The header block is terminated either by the end of the
+      data or by a blank line.  Following the header block is the body of the
+      message (which may contain MIME-encoded subparts, including subparts
+      with a :mailheader:`Content-Transfer-Encoding` of ``8bit``.
+
+      Optional *headersonly* is a flag specifying whether to stop parsing after
+      reading the headers or not.  The default is ``False``, meaning it parses
+      the entire contents of the file.
+
+   .. method:: parsebytes(bytes, headersonly=False)
+
+      Similar to the :meth:`parse` method, except it takes a byte string object
+      instead of a file-like object.  Calling this method on a byte string is
+      exactly equivalent to wrapping *text* in a :class:`~io.BytesIO` instance
+      first and calling :meth:`parse`.
+
+      Optional *headersonly* is as with the :meth:`parse` method.
+
+   .. versionadded:: 3.2
+
+
 Since creating a message object structure from a string or a file object is such
-a common task, two functions are provided as a convenience.  They are available
+a common task, four functions are provided as a convenience.  They are available
 in the top-level :mod:`email` package namespace.
 
 .. currentmodule:: email
 
-.. function:: message_from_string(s[, _class][, strict])
+.. function:: message_from_string(s, _class=email.message.Message, strict=None)
 
    Return a message object structure from a string.  This is exactly equivalent to
    ``Parser().parsestr(s)``.  Optional *_class* and *strict* are interpreted as
    with the :class:`Parser` class constructor.
 
+.. function:: message_from_bytes(s, _class=email.message.Message, strict=None)
 
-.. function:: message_from_file(fp[, _class][, strict])
+   Return a message object structure from a byte string.  This is exactly
+   equivalent to ``BytesParser().parsebytes(s)``.  Optional *_class* and
+   *strict* are interpreted as with the :class:`Parser` class constructor.
+
+   .. versionadded:: 3.2
+
+.. function:: message_from_file(fp, _class=email.message.Message, strict=None)
 
    Return a message object structure tree from an open :term:`file object`.
    This is exactly equivalent to ``Parser().parse(fp)``.  Optional *_class*
    and *strict* are interpreted as with the :class:`Parser` class constructor.
 
+.. function:: message_from_binary_file(fp, _class=email.message.Message, strict=None)
+
+   Return a message object structure tree from an open binary :term:`file
+   object`.  This is exactly equivalent to ``BytesParser().parse(fp)``.
+   Optional *_class* and *strict* are interpreted as with the :class:`Parser`
+   class constructor.
+
+   .. versionadded:: 3.2
+
 Here's an example of how you might use this at an interactive Python prompt::
 
    >>> import email
diff --git a/Doc/library/email.rst b/Doc/library/email.rst
index d3f1908..8926ae4 100644
--- a/Doc/library/email.rst
+++ b/Doc/library/email.rst
@@ -6,7 +6,7 @@
               email messages, including MIME documents.
 .. moduleauthor:: Barry A. Warsaw <barry@python.org>
 .. sectionauthor:: Barry A. Warsaw <barry@python.org>
-.. Copyright (C) 2001-2007 Python Software Foundation
+.. Copyright (C) 2001-2010 Python Software Foundation
 
 
 The :mod:`email` package is a library for managing email messages, including
@@ -92,6 +92,44 @@
 +---------------+------------------------------+-----------------------+
 | :const:`4.0`  | Python 2.5                   | Python 2.3 to 2.5     |
 +---------------+------------------------------+-----------------------+
+| :const:`5.0`  | Python 3.0 and Python 3.1    | Python 3.0 to 3.2     |
++---------------+------------------------------+-----------------------+
+| :const:`5.1`  | Python 3.2                   | Python 3.0 to 3.2     |
++---------------+------------------------------+-----------------------+
+
+Here are the major differences between :mod:`email` version 5.1 and
+version 5.0:
+
+* It is once again possible to parse messages containing non-ASCII bytes,
+  and to reproduce such messages if the data containing the non-ASCII
+  bytes is not modified.
+
+* New functions :func:`message_from_bytes` and :func:`message_from_binary_file`,
+  and new classes :class:`~email.parser.BytesFeedParser` and
+  :class:`~email.parser.BytesParser` allow binary message data to be parsed
+  into model objects.
+
+* Given bytes input to the model, :meth:`~email.message.Message.get_payload`
+  will by default decode a message body that has a
+  :mailheader:`Content-Transfer-Encoding` of `8bit` using the charset specified
+  in the MIME headers and return the resulting string.
+
+* Given bytes input to the model, :class:`~email.generator.Generator` will
+  convert message bodies that have a :mailheader:`Content-Transfer-Encoding` of
+  8bit to instead have a 7bit Content-Transfer-Encoding.
+
+* New function :class:`~email.generator.BytesGenerator` produces bytes
+  as output, preserving any unchanged non-ASCII data that was
+  present in the input used to build the model, including message bodies
+  with a :mailheader:`Content-Transfer-Encoding` of 8bit.
+
+Here are the major differences between :mod:`email` version 5.0 and version 4:
+
+* All operations are on unicode strings.  Text inputs must be strings,
+  text outputs are strings.  Outputs are limited to the ASCII character
+  set and so can be encoded to ASCII for transmission.  Inputs are also
+  limited to ASCII; this is an acknowledged limitation of email 5.0 and
+  means it can only be used to parse email that is 7bit clean.
 
 Here are the major differences between :mod:`email` version 4 and version 3: