Proofread and spell checked, all except the Examples section (which I'll do next).

commit: 5db478fa29299416f8475445f2584b20d8e534ed [log] [tgz]
author: Barry Warsaw <barry@python.org> Tue Oct 01 04:33:16 2002 +0000
committer: Barry Warsaw <barry@python.org> Tue Oct 01 04:33:16 2002 +0000
tree: f01a56123be3884f6466ba3898bcf965771b2e87
parent: cc3a6df506db57d614225b3657b4e97efc078970 [diff]
diff --git a/Doc/lib/email.tex b/Doc/lib/email.tex
index aa9f3e5..cbbcf87 100644
--- a/Doc/lib/email.tex
+++ b/Doc/lib/email.tex

@@ -39,14 +39,13 @@
 The following sections describe the functionality of the
 \module{email} package.  The ordering follows a progression that
 should be common in applications: an email message is read as flat
-text from a file or other source, the text is parsed to produce an
-object model representation of the email message, this model is
-manipulated, and finally the model is rendered back into
-flat text.
+text from a file or other source, the text is parsed to produce the
+object structure of the email message, this structure is manipulated,
+and finally rendered back into flat text.
 
-It is perfectly feasible to create the object model out of whole cloth
---- i.e. completely from scratch.  From there, a similar progression
-can be taken as above.  
+It is perfectly feasible to create the object structure out of whole
+cloth --- i.e. completely from scratch.  From there, a similar
+progression can be taken as above.
 
 Also included are detailed specifications of all the classes and
 modules that the \module{email} package provides, the exception
@@ -71,9 +70,12 @@
 \subsection{Creating email and MIME objects from scratch}
 \input{emailmimebase}
 
-\subsection{Headers, Character sets, and Internationalization}
+\subsection{Internationalized headers}
 \input{emailheaders}
 
+\subsection{Representing character sets}
+\input{emailcharsets}
+
 \subsection{Encoders}
 \input{emailencoders}
 
@@ -92,7 +94,7 @@
 releases up to Python 2.2.1.  Version 2 was developed for the Python
 2.3 release, and backported to Python 2.2.2.  It was also available as
 a separate distutils based package.  \module{email} version 2 is
-almost entirely backwards compatible with version 1, with the
+almost entirely backward compatible with version 1, with the
 following differences:
 
 \begin{itemize}
@@ -100,31 +102,31 @@
       have been added.
 \item The pickle format for \class{Message} instances has changed.
       Since this was never (and still isn't) formally defined, this
-      isn't considered a backwards incompatibility.  However if your
+      isn't considered a backward incompatibility.  However if your
       application pickles and unpickles \class{Message} instances, be
       aware that in \module{email} version 2, \class{Message}
       instances now have private variables \var{_charset} and
       \var{_default_type}.
 \item Several methods in the \class{Message} class have been
-      deprecated, or their signatures changes.  Also, many new methods
+      deprecated, or their signatures changed.  Also, many new methods
       have been added.  See the documentation for the \class{Message}
-      class for deatils.  The changes should be completely backwards
+      class for details.  The changes should be completely backward
       compatible.
 \item The object structure has changed in the face of
       \mimetype{message/rfc822} content types.  In \module{email}
       version 1, such a type would be represented by a scalar payload,
       i.e. the container message's \method{is_multipart()} returned
-      false, \method{get_payload()} was not a list object, and was
-      actually a \class{Message} instance.
+      false, \method{get_payload()} was not a list object, but a single
+      \class{Message} instance.
 
       This structure was inconsistent with the rest of the package, so
       the object representation for \mimetype{message/rfc822} content
-      types was changed.  In module{email} version 2, the container
+      types was changed.  In \module{email} version 2, the container
       \emph{does} return \code{True} from \method{is_multipart()}, and
       \method{get_payload()} returns a list containing a single
       \class{Message} item.
 
-      Note that this is one place that backwards compatibility could
+      Note that this is one place that backward compatibility could
       not be completely maintained.  However, if you're already
       testing the return type of \method{get_payload()}, you should be
       fine.  You just need to make sure your code doesn't do a
@@ -142,7 +144,7 @@
       \module{email.Generator} module was added.
 \item The intermediate base classes \class{MIMENonMultipart} and
       \class{MIMEMultipart} have been added, and interposed in the
-      class heirarchy for most of the other MIME-related derived
+      class hierarchy for most of the other MIME-related derived
       classes.
 \item The \var{_encoder} argument to the \class{MIMEText} constructor
       has been deprecated.  Encoding  now happens implicitly based
@@ -167,7 +169,9 @@
 either been added or removed.  The semantics of some of the methods
 have also changed.  For the most part, any functionality available in
 \module{mimelib} is still available in the \refmodule{email} package,
-albeit often in a different way.
+albeit often in a different way.  Backward compatibility between
+the \module{mimelib} package and the \module{email} package was not a
+priority.
 
 Here is a brief description of the differences between the
 \module{mimelib} and the \refmodule{email} packages, along with hints on

diff --git a/Doc/lib/emailcharsets.tex b/Doc/lib/emailcharsets.tex
new file mode 100644
index 0000000..d1ae728
--- /dev/null
+++ b/Doc/lib/emailcharsets.tex

@@ -0,0 +1,240 @@
+\declaremodule{standard}{email.Charset}
+\modulesynopsis{Character Sets}
+
+This module provides a class \class{Charset} for representing
+character sets and character set conversions in email messages, as
+well as a character set registry and several convenience methods for
+manipulating this registry.  Instances of \class{Charset} are used in
+several other modules within the \module{email} package.
+
+\versionadded{2.2.2}
+
+\begin{classdesc}{Charset}{\optional{input_charset}}
+Map character sets to their email properties.
+
+This class provides information about the requirements imposed on
+email for a specific character set.  It also provides convenience
+routines for converting between character sets, given the availability
+of the applicable codecs.  Given a character set, it will do its best
+to provide information on how to use that character set in an email
+message in an RFC-compliant way.
+
+Certain character sets must be encoded with quoted-printable or base64
+when used in email headers or bodies.  Certain character sets must be
+converted outright, and are not allowed in email.
+
+Optional \var{input_charset} is as described below.  After being alias
+normalized it is also used as a lookup into the registry of character
+sets to find out the header encoding, body encoding, and output
+conversion codec to be used for the character set.  For example, if
+\var{input_charset} is \code{iso-8859-1}, then headers and bodies will
+be encoded using quoted-printable and no output conversion codec is
+necessary.  If \var{input_charset} is \code{euc-jp}, then headers will
+be encoded with base64, bodies will not be encoded, but output text
+will be converted from the \code{euc-jp} character set to the
+\code{iso-2022-jp} character set.
+\end{classdesc}
+
+\class{Charset} instances have the following data attributes:
+
+\begin{datadesc}{input_charset}
+The initial character set specified.  Common aliases are converted to
+their \emph{official} email names (e.g. \code{latin_1} is converted to
+\code{iso-8859-1}).  Defaults to 7-bit \code{us-ascii}.
+\end{datadesc}
+
+\begin{datadesc}{header_encoding}
+If the character set must be encoded before it can be used in an
+email header, this attribute will be set to \code{Charset.QP} (for
+quoted-printable), \code{Charset.BASE64} (for base64 encoding), or
+\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding.
+Otherwise, it will be \code{None}.
+\end{datadesc}
+
+\begin{datadesc}{body_encoding}
+Same as \var{header_encoding}, but describes the encoding for the
+mail message's body, which indeed may be different than the header
+encoding.  \code{Charset.SHORTEST} is not allowed for
+\var{body_encoding}.
+\end{datadesc}
+
+\begin{datadesc}{output_charset}
+Some character sets must be converted before they can be used in
+email headers or bodies.  If the \var{input_charset} is one of
+them, this attribute will contain the name of the character set
+output will be converted to.  Otherwise, it will be \code{None}.
+\end{datadesc}
+
+\begin{datadesc}{input_codec}
+The name of the Python codec used to convert the \var{input_charset} to
+Unicode.  If no conversion codec is necessary, this attribute will be
+\code{None}.
+\end{datadesc}
+
+\begin{datadesc}{output_codec}
+The name of the Python codec used to convert Unicode to the
+\var{output_charset}.  If no conversion codec is necessary, this
+attribute will have the same value as the \var{input_codec}.
+\end{datadesc}
+
+\class{Charset} instances also have the following methods:
+
+\begin{methoddesc}[Charset]{get_body_encoding}{}
+Return the content transfer encoding used for body encoding.
+
+This is either the string \samp{quoted-printable} or \samp{base64}
+depending on the encoding used, or it is a function, in which case you
+should call the function with a single argument, the Message object
+being encoded.  The function should then set the
+\mailheader{Content-Transfer-Encoding} header itself to whatever is
+appropriate.
+
+Returns the string \samp{quoted-printable} if
+\var{body_encoding} is \code{QP}, returns the string
+\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the
+string \samp{7bit} otherwise.
+\end{methoddesc}
+
+\begin{methoddesc}{convert}{s}
+Convert the string \var{s} from the \var{input_codec} to the
+\var{output_codec}.
+\end{methoddesc}
+
+\begin{methoddesc}{to_splittable}{s}
+Convert a possibly multibyte string to a safely splittable format.
+\var{s} is the string to split.
+
+Uses the \var{input_codec} to try and convert the string to Unicode,
+so it can be safely split on character boundaries (even for multibyte
+characters).
+
+Returns the string as-is if it isn't known how to convert \var{s} to
+Unicode with the \var{input_charset}.
+
+Characters that could not be converted to Unicode will be replaced
+with the Unicode replacement character \character{U+FFFD}.
+\end{methoddesc}
+
+\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}}
+Convert a splittable string back into an encoded string.  \var{ustr}
+is a Unicode string to ``unsplit''.
+
+This method uses the proper codec to try and convert the string from
+Unicode back into an encoded format.  Return the string as-is if it is
+not Unicode, or if it could not be converted from Unicode.
+
+Characters that could not be converted from Unicode will be replaced
+with an appropriate character (usually \character{?}).
+
+If \var{to_output} is \code{True} (the default), uses
+\var{output_codec} to convert to an 
+encoded format.  If \var{to_output} is \code{False}, it uses
+\var{input_codec}.
+\end{methoddesc}
+
+\begin{methoddesc}{get_output_charset}{}
+Return the output character set.
+
+This is the \var{output_charset} attribute if that is not \code{None},
+otherwise it is \var{input_charset}.
+\end{methoddesc}
+
+\begin{methoddesc}{encoded_header_len}{}
+Return the length of the encoded header string, properly calculating
+for quoted-printable or base64 encoding.
+\end{methoddesc}
+
+\begin{methoddesc}{header_encode}{s\optional{, convert}}
+Header-encode the string \var{s}.
+
+If \var{convert} is \code{True}, the string will be converted from the
+input charset to the output charset automatically.  This is not useful
+for multibyte character sets, which have line length issues (multibyte
+characters must be split on a character, not a byte boundary); use the
+higher-level \class{Header} class to deal with these issues (see
+\refmodule{email.Header}).  \var{convert} defaults to \code{False}.
+
+The type of encoding (base64 or quoted-printable) will be based on
+the \var{header_encoding} attribute.
+\end{methoddesc}
+
+\begin{methoddesc}{body_encode}{s\optional{, convert}}
+Body-encode the string \var{s}.
+
+If \var{convert} is \code{True} (the default), the string will be
+converted from the input charset to output charset automatically.
+Unlike \method{header_encode()}, there are no issues with byte
+boundaries and multibyte charsets in email bodies, so this is usually
+pretty safe.
+
+The type of encoding (base64 or quoted-printable) will be based on
+the \var{body_encoding} attribute.
+\end{methoddesc}
+
+The \class{Charset} class also provides a number of methods to support
+standard operations and built-in functions.
+
+\begin{methoddesc}[Charset]{__str__}{}
+Returns \var{input_charset} as a string coerced to lower case.
+\end{methoddesc}
+
+\begin{methoddesc}[Charset]{__eq__}{other}
+This method allows you to compare two \class{Charset} instances for equality.
+\end{methoddesc}
+
+\begin{methoddesc}[Header]{__ne__}{other}
+This method allows you to compare two \class{Charset} instances for inequality.
+\end{methoddesc}
+
+The \module{email.Charset} module also provides the following
+functions for adding new entries to the global character set, alias,
+and codec registries:
+
+\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{,
+    body_enc\optional{, output_charset}}}}
+Add character properties to the global registry.
+
+\var{charset} is the input character set, and must be the canonical
+name of a character set.
+
+Optional \var{header_enc} and \var{body_enc} is either
+\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for
+base64 encoding, \code{Charset.SHORTEST} for the shortest of
+quoted-printable or base64 encoding, or \code{None} for no encoding.
+\code{SHORTEST} is only valid for \var{header_enc}. The default is
+\code{None} for no encoding.
+
+Optional \var{output_charset} is the character set that the output
+should be in.  Conversions will proceed from input charset, to
+Unicode, to the output charset when the method
+\method{Charset.convert()} is called.  The default is to output in the
+same character set as the input.
+
+Both \var{input_charset} and \var{output_charset} must have Unicode
+codec entries in the module's character set-to-codec mapping; use
+\function{add_codec()} to add codecs the module does
+not know about.  See the \refmodule{codecs} module's documentation for
+more information.
+
+The global character set registry is kept in the module global
+dictionary \code{CHARSETS}.
+\end{funcdesc}
+
+\begin{funcdesc}{add_alias}{alias, canonical}
+Add a character set alias.  \var{alias} is the alias name,
+e.g. \code{latin-1}.  \var{canonical} is the character set's canonical
+name, e.g. \code{iso-8859-1}.
+
+The global charset alias registry is kept in the module global
+dictionary \code{ALIASES}.
+\end{funcdesc}
+
+\begin{funcdesc}{add_codec}{charset, codecname}
+Add a codec that map characters in the given character set to and from
+Unicode.
+
+\var{charset} is the canonical name of a character set.
+\var{codecname} is the name of a Python codec, as appropriate for the
+second argument to the \function{unicode()} built-in, or to the
+\method{encode()} method of a Unicode string.
+\end{funcdesc}

diff --git a/Doc/lib/emailencoders.tex b/Doc/lib/emailencoders.tex
index 4b4e637..cd54d68 100644
--- a/Doc/lib/emailencoders.tex
+++ b/Doc/lib/emailencoders.tex

@@ -17,7 +17,7 @@
 Here are the encoding functions provided:
 
 \begin{funcdesc}{encode_quopri}{msg}
-Encodes the payload into quoted-Printable form and sets the
+Encodes the payload into quoted-printable form and sets the
 \mailheader{Content-Transfer-Encoding} header to
 \code{quoted-printable}\footnote{Note that encoding with
 \method{encode_quopri()} also encodes all tabs and space characters in

diff --git a/Doc/lib/emailgenerator.tex b/Doc/lib/emailgenerator.tex
index 03fee9f..01c12d0 100644
--- a/Doc/lib/emailgenerator.tex
+++ b/Doc/lib/emailgenerator.tex

@@ -24,12 +24,12 @@
 The constructor for the \class{Generator} class takes a file-like
 object called \var{outfp} for an argument.  \var{outfp} must support
 the \method{write()} method and be usable as the output file in a
-Python 2.0 extended print statement.
+Python extended print statement.
 
 Optional \var{mangle_from_} is a flag that, when \code{True}, puts a
 \samp{>} character in front of any line in the body that starts exactly as
-\samp{From } (i.e. \code{From} followed by a space at the front of the
-line).  This is the only guaranteed portable way to avoid having such
+\samp{From }, i.e. \code{From} followed by a space at the beginning of the
+line.  This is the only guaranteed portable way to avoid having such
 lines be mistaken for a Unix mailbox format envelope header separator (see
 \ulink{WHY THE CONTENT-LENGTH FORMAT IS BAD}
 {http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html}
@@ -48,10 +48,10 @@
 
 The other public \class{Generator} methods are:
 
-\begin{methoddesc}[Generator]{flatten()}{msg\optional{, unixfrom}}
+\begin{methoddesc}[Generator]{flatten}{msg\optional{, unixfrom}}
 Print the textual representation of the message object structure rooted at
 \var{msg} to the output file specified when the \class{Generator}
-instance was created.  Sub-objects are visited depth-first and the
+instance was created.  Subparts are visited depth-first and the
 resulting text will be properly MIME encoded.
 
 Optional \var{unixfrom} is a flag that forces the printing of the
@@ -60,7 +60,7 @@
 standard one is crafted.  By default, this is set to \code{False} to
 inhibit the printing of the envelope delimiter.
 
-Note that for sub-objects, no envelope header is ever printed.
+Note that for subparts, no envelope header is ever printed.
 
 \versionadded{2.2.2}
 \end{methoddesc}
@@ -99,16 +99,20 @@
 \class{Generator} base class.
 
 If the subpart is not of main type \mimetype{text}, optional \var{fmt}
-is a format string that is used instead of the message
-payload.  \var{fmt} is expanded with the following keywords (in
-\samp{\%(keyword)s} format):
+is a format string that is used instead of the message payload.
+\var{fmt} is expanded with the following keywords, \samp{\%(keyword)s}
+format:
 
-type       : Full MIME type of the non-\mimetype{text} part
-maintype   : Main MIME type of the non-\mimetype{text} part
-subtype    : Sub-MIME type of the non-\mimetype{text} part
-filename   : Filename of the non-\mimetype{text} part
-description: Description associated with the non-\mimetype{text} part
-encoding   : Content transfer encoding of the non-\mimetype{text} part
+\begin{itemize}
+\item \code{type} -- Full MIME type of the non-\mimetype{text} part
+\item \code{maintype} -- Main MIME type of the non-\mimetype{text} part
+\item \code{subtype} -- Sub-MIME type of the non-\mimetype{text} part
+\item \code{filename} -- Filename of the non-\mimetype{text} part
+\item \code{description} -- Description associated with the
+      non-\mimetype{text} part
+\item \code{encoding} -- Content transfer encoding of the
+      non-\mimetype{text} part
+\end{itemize}
 
 The default value for \var{fmt} is \code{None}, meaning
 

diff --git a/Doc/lib/emailheaders.tex b/Doc/lib/emailheaders.tex
index 172e5d6..66eb716 100644
--- a/Doc/lib/emailheaders.tex
+++ b/Doc/lib/emailheaders.tex

@@ -3,7 +3,7 @@
 
 \rfc{2822} is the base standard that describes the format of email
 messages.  It derives from the older \rfc{822} standard which came
-into widespread at a time when most email was composed of \ASCII{}
+into widespread use at a time when most email was composed of \ASCII{}
 characters only.  \rfc{2822} is a specification written assuming email
 contains only 7-bit \ASCII{} characters.
 
@@ -19,10 +19,9 @@
 
 If you want to include non-\ASCII{} characters in your email headers,
 say in the \mailheader{Subject} or \mailheader{To} fields, you should
-use the \class{Header} class (in module \module{email.Header} and
-assign the field in the \class{Message} object to an instance of
-\class{Header} instead of using a string for the header value.  For
-example:
+use the \class{Header} class and assign the field in the
+\class{Message} object to an instance of \class{Header} instead of
+using a string for the header value.  For example:
 
 \begin{verbatim}
 >>> from email.Message import Message
@@ -50,7 +49,8 @@
 
 \begin{classdesc}{Header}{\optional{s\optional{, charset\optional{,
     maxlinelen\optional{, header_name\optional{, continuation_ws}}}}}}
-Create a MIME-compliant header that can contain many character sets.
+Create a MIME-compliant header that can contain strings in different
+character sets.
 
 Optional \var{s} is the initial header value.  If \code{None} (the
 default), the initial header value is not set.  You can later append
@@ -74,7 +74,7 @@
 default value for \var{header_name} is \code{None}, meaning it is not
 taken into account for the first line of a long, split header.
 
-Optional \var{continuation_ws} must be RFC 2822 compliant folding
+Optional \var{continuation_ws} must be \rfc{2822}-compliant folding
 whitespace, and is usually either a space or a hard tab character.
 This character will be prepended to continuation lines.
 \end{classdesc}
@@ -89,7 +89,7 @@
 constructor is used.
 
 \var{s} may be a byte string or a Unicode string.  If it is a byte
-string (i.e. \code{isinstance(s, StringType)} is true), then
+string (i.e. \code{isinstance(s, str)} is true), then
 \var{charset} is the encoding of that byte string, and a
 \exception{UnicodeError} will be raised if the string cannot be
 decoded with that character set.
@@ -113,7 +113,7 @@
 
 \begin{methoddesc}[Header]{__str__}{}
 A synonym for \method{Header.encode()}.  Useful for
-\code{str(aHeader)} calls.
+\code{str(aHeader)}.
 \end{methoddesc}
 
 \begin{methoddesc}[Header]{__unicode__}{}
@@ -165,245 +165,3 @@
 \var{header_name}, and \var{continuation_ws} are as in the
 \class{Header} constructor.
 \end{funcdesc}
-
-\declaremodule{standard}{email.Charset}
-\modulesynopsis{Character Sets}
-
-This module provides a class \class{Charset} for representing
-character sets and character set conversions in email messages, as
-well as a character set registry and several convenience methods for
-manipulating this registry.  Instances of \class{Charset} are used in
-several other modules within the \module{email} package.
-
-\versionadded{2.2.2}
-
-\begin{classdesc}{Charset}{\optional{input_charset}}
-Map character sets to their email properties.
-
-This class provides information about the requirements imposed on
-email for a specific character set.  It also provides convenience
-routines for converting between character sets, given the availability
-of the applicable codecs.  Given a character set, it will do its best
-to provide information on how to use that character set in an email
-message in an RFC-compliant way.
-
-Certain character sets must be encoded with quoted-printable or base64
-when used in email headers or bodies.  Certain character sets must be
-converted outright, and are not allowed in email.
-
-Optional \var{input_charset} is as described below.  After being alias
-normalized it is also used as a lookup into the registry of character
-sets to find out the header encoding, body encoding, and output
-conversion codec to be used for the character set.  For example, if
-\var{input_charset} is \code{iso-8859-1}, then headers and bodies will
-be encoded using quoted-printable and no output conversion codec is
-necessary.  If \var{input_charset} is \code{euc-jp}, then headers will
-be encoded with base64, bodies will not be encoded, but output text
-will be converted from the \code{euc-jp} character set to the
-\code{iso-2022-jp} character set.
-\end{classdesc}
-
-\class{Charset} instances have the following data attributes:
-
-\begin{datadesc}{input_charset}
-The initial character set specified.  Common aliases are converted to
-their \emph{official} email names (e.g. \code{latin_1} is converted to
-\code{iso-8859-1}).  Defaults to 7-bit \code{us-ascii}.
-\end{datadesc}
-
-\begin{datadesc}{header_encoding}
-If the character set must be encoded before it can be used in an
-email header, this attribute will be set to \code{Charset.QP} (for
-quoted-printable), \code{Charset.BASE64} (for base64 encoding), or
-\code{Charset.SHORTEST} for the shortest of QP or BASE64 encoding.
-Otherwise, it will be \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{body_encoding}
-Same as \var{header_encoding}, but describes the encoding for the
-mail message's body, which indeed may be different than the header
-encoding.  \code{Charset.SHORTEST} is not allowed for
-\var{body_encoding}.
-\end{datadesc}
-
-\begin{datadesc}{output_charset}
-Some character sets must be converted before the can be used in
-email headers or bodies.  If the \var{input_charset} is one of
-them, this attribute will contain the name of the character set
-output will be converted to.  Otherwise, it will be \code{None}.
-\end{datadesc}
-
-\begin{datadesc}{input_codec}
-The name of the Python codec used to convert the \var{input_charset} to
-Unicode.  If no conversion codec is necessary, this attribute will be
-\code{None}.
-\end{datadesc}
-
-\begin{datadesc}{output_codec}
-The name of the Python codec used to convert Unicode to the
-\var{output_charset}.  If no conversion codec is necessary, this
-attribute will have the same value as the \var{input_codec}.
-\end{datadesc}
-
-\class{Charset} instances also have the following methods:
-
-\begin{methoddesc}[Charset]{get_body_encoding}{}
-Return the content transfer encoding used for body encoding.
-
-This is either the string \samp{quoted-printable} or \samp{base64}
-depending on the encoding used, or it is a function, in which case you
-should call the function with a single argument, the Message object
-being encoded.  The function should then set the
-\mailheader{Content-Transfer-Encoding} header itself to whatever is
-appropriate.
-
-Returns the string \samp{quoted-printable} if
-\var{body_encoding} is \code{QP}, returns the string
-\samp{base64} if \var{body_encoding} is \code{BASE64}, and returns the
-string \samp{7bit} otherwise.
-\end{methoddesc}
-
-\begin{methoddesc}{convert}{s}
-Convert the string \var{s} from the \var{input_codec} to the
-\var{output_codec}.
-\end{methoddesc}
-
-\begin{methoddesc}{to_splittable}{s}
-Convert a possibly multibyte string to a safely splittable format.
-\var{s} is the string to split.
-
-Uses the \var{input_codec} to try and convert the string to Unicode,
-so it can be safely split on character boundaries (even for multibyte
-characters).
-
-Returns the string as-is if it isn't known how to convert \var{s} to
-Unicode with the \var{input_charset}.
-
-Characters that could not be converted to Unicode will be replaced
-with the Unicode replacement character \character{U+FFFD}.
-\end{methoddesc}
-
-\begin{methoddesc}{from_splittable}{ustr\optional{, to_output}}
-Convert a splittable string back into an encoded string.  \var{ustr}
-is a Unicode string to ``unsplit''.
-
-This method uses the proper codec to try and convert the string from
-Unicode back into an encoded format.  Return the string as-is if it is
-not Unicode, or if it could not be converted from Unicode.
-
-Characters that could not be converted from Unicode will be replaced
-with an appropriate character (usually \character{?}).
-
-If \var{to_output} is \code{True} (the default), uses
-\var{output_codec} to convert to an 
-encoded format.  If \var{to_output} is \code{False}, it uses
-\var{input_codec}.
-\end{methoddesc}
-
-\begin{methoddesc}{get_output_charset}{}
-Return the output character set.
-
-This is the \var{output_charset} attribute if that is not \code{None},
-otherwise it is \var{input_charset}.
-\end{methoddesc}
-
-\begin{methoddesc}{encoded_header_len}{}
-Return the length of the encoded header string, properly calculating
-for quoted-printable or base64 encoding.
-\end{methoddesc}
-
-\begin{methoddesc}{header_encode}{s\optional{, convert}}
-Header-encode the string \var{s}.
-
-If \var{convert} is \code{True}, the string will be converted from the
-input charset to the output charset automatically.  This is not useful
-for multibyte character sets, which have line length issues (multibyte
-characters must be split on a character, not a byte boundary); use the
-higher-level \class{Header} class to deal with these issues (see
-\refmodule{email.Header}).  \var{convert} defaults to \code{False}.
-
-The type of encoding (base64 or quoted-printable) will be based on
-the \var{header_encoding} attribute.
-\end{methoddesc}
-
-\begin{methoddesc}{body_encode}{s\optional{, convert}}
-Body-encode the string \var{s}.
-
-If \var{convert} is \code{True} (the default), the string will be
-converted from the input charset to output charset automatically.
-Unlike \method{header_encode()}, there are no issues with byte
-boundaries and multibyte charsets in email bodies, so this is usually
-pretty safe.
-
-The type of encoding (base64 or quoted-printable) will be based on
-the \var{body_encoding} attribute.
-\end{methoddesc}
-
-The \class{Charset} class also provides a number of methods to support
-standard operations and built-in functions.
-
-\begin{methoddesc}[Charset]{__str__}{}
-Returns \var{input_charset} as a string coerced to lower case.
-\end{methoddesc}
-
-\begin{methoddesc}[Charset]{__eq__}{other}
-This method allows you to compare two \class{Charset} instances for equality.
-\end{methoddesc}
-
-\begin{methoddesc}[Header]{__ne__}{other}
-This method allows you to compare two \class{Charset} instances for inequality.
-\end{methoddesc}
-
-The \module{email.Charset} module also provides the following
-functions for adding new entries to the global character set, alias,
-and codec registries:
-
-\begin{funcdesc}{add_charset}{charset\optional{, header_enc\optional{,
-    body_enc\optional{, output_charset}}}}
-Add character properties to the global registry.
-
-\var{charset} is the input character set, and must be the canonical
-name of a character set.
-
-Optional \var{header_enc} and \var{body_enc} is either
-\code{Charset.QP} for quoted-printable, \code{Charset.BASE64} for
-base64 encoding, \code{Charset.SHORTEST} for the shortest of qp or
-base64 encoding, or \code{None} for no encoding.  \code{SHORTEST} is
-only valid for \var{header_enc}.  It describes how message headers and
-message bodies in the input charset are to be encoded.  Default is no
-encoding.
-
-Optional \var{output_charset} is the character set that the output
-should be in.  Conversions will proceed from input charset, to
-Unicode, to the output charset when the method
-\method{Charset.convert()} is called.  The default is to output in the
-same character set as the input.
-
-Both \var{input_charset} and \var{output_charset} must have Unicode
-codec entries in the module's character set-to-codec mapping; use
-\function{add_codec(charset, codecname)} to add codecs the module does
-not know about.  See the \refmodule{codecs} module's documentation for
-more information.
-
-The global character set registry is kept in the module global
-dictionary \code{CHARSETS}.
-\end{funcdesc}
-
-\begin{funcdesc}{add_alias}{alias, canonical}
-Add a character set alias.  \var{alias} is the alias name,
-e.g. \code{latin-1}.  \var{canonical} is the character set's canonical
-name, e.g. \code{iso-8859-1}.
-
-The global charset alias registry is kept in the module global
-dictionary \code{ALIASES}.
-\end{funcdesc}
-
-\begin{funcdesc}{add_codec}{charset, codecname}
-Add a codec that map characters in the given character set to and from
-Unicode.
-
-\var{charset} is the canonical name of a character set.
-\var{codecname} is the name of a Python codec, as appropriate for the
-second argument to the \function{unicode()} built-in, or to the
-\method{encode()} method of a Unicode string.
-\end{funcdesc}

diff --git a/Doc/lib/emailmessage.tex b/Doc/lib/emailmessage.tex
index 271619d..d76e7fd 100644
--- a/Doc/lib/emailmessage.tex
+++ b/Doc/lib/emailmessage.tex

@@ -33,9 +33,9 @@
 \end{classdesc}
 
 \begin{methoddesc}[Message]{as_string}{\optional{unixfrom}}
-Return the entire formatted message as a string.  Optional
-\var{unixfrom}, when true, specifies to include the \emph{Unix-From}
-envelope header; it defaults to \code{False}.
+Return the entire message flatten as a string.  When optional
+\var{unixfrom} is \code{True}, the envelope header is included in the
+returned string.  \var{unixfrom} defaults to \code{False}.
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{__str__}{}
@@ -59,7 +59,7 @@
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{attach}{payload}
-Add the given payload to the current payload, which must be
+Add the given \var{payload} to the current payload, which must be
 \code{None} or a list of \class{Message} objects before the call.
 After the call, the payload will always be a list of \class{Message}
 objects.  If you want to set the payload to a scalar object (e.g. a
@@ -95,7 +95,7 @@
 \begin{methoddesc}[Message]{set_payload}{payload\optional{, charset}}
 Set the entire message object's payload to \var{payload}.  It is the
 client's responsibility to ensure the payload invariants.  Optional
-\var{charset} sets the message's default character set (see
+\var{charset} sets the message's default character set; see
 \method{set_charset()} for details.
 
 \versionchanged[\var{charset} argument added]{2.2.2}
@@ -103,7 +103,7 @@
 
 \begin{methoddesc}[Message]{set_charset}{charset}
 Set the character set of the payload to \var{charset}, which can
-either be a \class{Charset} instance (see \refmodule{email.Charset}, a
+either be a \class{Charset} instance (see \refmodule{email.Charset}), a
 string naming a character set,
 or \code{None}.  If it is a string, it will be converted to a
 \class{Charset} instance.  If \var{charset} is \code{None}, the
@@ -128,14 +128,18 @@
 \end{methoddesc}
 
 The following methods implement a mapping-like interface for accessing
-the message object's \rfc{2822} headers.  Note that there are some
+the message's \rfc{2822} headers.  Note that there are some
 semantic differences between these methods and a normal mapping
 (i.e. dictionary) interface.  For example, in a dictionary there are
 no duplicate keys, but here there may be duplicate message headers.  Also,
 in dictionaries there is no guaranteed order to the keys returned by
-\method{keys()}, but in a \class{Message} object, there is an explicit
-order.  These semantic differences are intentional and are biased
-toward maximal convenience.
+\method{keys()}, but in a \class{Message} object, headers are always
+returned in the order they appeared in the original message, or were
+added to the message later.  Any header deleted and then re-added are
+always appended to the end of the header list.
+
+These semantic differences are intentional and are biased toward
+maximal convenience.
 
 Note that in all cases, any envelope header present in the message is
 not included in the mapping interface.
@@ -175,8 +179,7 @@
 Note that this does \emph{not} overwrite or delete any existing header
 with the same name.  If you want to ensure that the new header is the
 only one present in the message with field name
-\var{name}, first use \method{__delitem__()} to delete all named
-fields, e.g.:
+\var{name}, delete the field first, e.g.:
 
 \begin{verbatim}
 del msg['subject']
@@ -196,27 +199,16 @@
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{keys}{}
-Return a list of all the message's header field names.  These keys
-will be sorted in the order in which they appeared in the original
-message, or were added to the message and may contain
-duplicates.  Any fields deleted and then subsequently re-added are
-always appended to the end of the header list.
+Return a list of all the message's header field names.
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{values}{}
-Return a list of all the message's field values.  These will be sorted
-in the order in which they appeared in the original message, or were
-added to the message, and may contain
-duplicates.  Any fields deleted and then subsequently re-added are
-always appended to the end of the header list.
+Return a list of all the message's field values.
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{items}{}
 Return a list of 2-tuples containing all the message's field headers
-and values.  These will be sorted in the order in which they appeared
-in the original message, or were added to the message, and may contain
-duplicates.  Any fields deleted and then subsequently re-added are
-always appended to the end of the header list.
+and values.
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{get}{name\optional{, failobj}}
@@ -228,11 +220,7 @@
 Here are some additional useful methods:
 
 \begin{methoddesc}[Message]{get_all}{name\optional{, failobj}}
-Return a list of all the values for the field named \var{name}.  These
-will be sorted in the order in which they appeared in the original
-message, or were added to the message.  Any fields deleted and then
-subsequently re-added are always appended to the end of the list.
-
+Return a list of all the values for the field named \var{name}.
 If there are no such named headers in the message, \var{failobj} is
 returned (defaults to \code{None}).
 \end{methoddesc}
@@ -351,10 +339,10 @@
 Parameter keys are always compared case insensitively.  The return
 value can either be a string, or a 3-tuple if the parameter was
 \rfc{2231} encoded.  When it's a 3-tuple, the elements of the value are of
-the form \samp{(CHARSET, LANGUAGE, VALUE)}, where \var{LANGUAGE} may
+the form \code{(CHARSET, LANGUAGE, VALUE)}, where \code{LANGUAGE} may
 be the empty string.  Your application should be prepared to deal with
-3-tuple return values, which it can convert the parameter to a Unicode
-string like so:
+3-tuple return values, which it can convert to a Unicode string like
+so:
 
 \begin{verbatim}
 param = msg.get_param('foo')
@@ -363,7 +351,7 @@
 \end{verbatim}
 
 In any case, the parameter value (either the returned string, or the
-\var{VALUE} item in the 3-tuple) is always unquoted, unless
+\code{VALUE} item in the 3-tuple) is always unquoted, unless
 \var{unquote} is set to \code{False}.
 
 \versionchanged[\var{unquote} argument added, and 3-tuple return value
@@ -398,7 +386,7 @@
 \mailheader{Content-Type} header.  The header will be re-written in
 place without the parameter or its value.  All values will be quoted
 as necessary unless \var{requote} is \code{False} (the default is
-\code{True}).  Optional \var{header} specifies an alterative to
+\code{True}).  Optional \var{header} specifies an alternative to
 \mailheader{Content-Type}.
 
 \versionadded{2.2.2}
@@ -417,8 +405,8 @@
 will be quoted (the default).
 
 An alternative header can be specified in the \var{header} argument.
-When the \mailheader{Content-Type} header is set, we'll always also
-add a \mailheader{MIME-Version} header.
+When the \mailheader{Content-Type} header is set a
+\mailheader{MIME-Version} header is also added.
 
 \versionadded{2.2.2}
 \end{methoddesc}
@@ -440,11 +428,10 @@
 \end{methoddesc}
 
 \begin{methoddesc}[Message]{set_boundary}{boundary}
-Set the \code{boundary} parameter of the \mailheader{Content-Type} header
-to \var{boundary}.  \method{set_boundary()} will always quote
-\var{boundary} so you should not quote it yourself.  A
-\exception{HeaderParseError} is raised if the message object has no
-\mailheader{Content-Type} header.
+Set the \code{boundary} parameter of the \mailheader{Content-Type}
+header to \var{boundary}.  \method{set_boundary()} will always quote
+\var{boundary} if necessary.  A \exception{HeaderParseError} is raised
+if the message object has no \mailheader{Content-Type} header.
 
 Note that using this method is subtly different than deleting the old
 \mailheader{Content-Type} header and adding a new one with the new boundary
@@ -459,9 +446,9 @@
 header.  If there is no \mailheader{Content-Type} header, or if that
 header has no \code{charset} parameter, \var{failobj} is returned.
 
-Note that this method differs from \method{get_charset} which returns
-the \class{Charset} instance for the default encoding of the message
-body.
+Note that this method differs from \method{get_charset()} which
+returns the \class{Charset} instance for the default encoding of the
+message body.
 
 \versionadded{2.2.2}
 \end{methoddesc}
@@ -484,15 +471,15 @@
 The \method{walk()} method is an all-purpose generator which can be
 used to iterate over all the parts and subparts of a message object
 tree, in depth-first traversal order.  You will typically use
-\method{walk()} as the iterator in a \code{for ... in} loop; each
+\method{walk()} as the iterator in a \code{for} loop; each
 iteration returns the next subpart.
 
-Here's an example that prints the MIME type of every part of a message
-object tree:
+Here's an example that prints the MIME type of every part of a
+multipart message structure:
 
 \begin{verbatim}
 >>> for part in msg.walk():
->>>     print part.get_type('text/plain')
+>>>     print part.get_content_type()
 multipart/report
 text/plain
 message/delivery-status

diff --git a/Doc/lib/emailmimebase.tex b/Doc/lib/emailmimebase.tex
index 97c3eda..6bbd5dd 100644
--- a/Doc/lib/emailmimebase.tex
+++ b/Doc/lib/emailmimebase.tex

@@ -1,10 +1,10 @@
 Ordinarily, you get a message object structure by passing a file or
-some text to a parser, which parses the text and returns the root of
-the message object structure.  However you can also build a complete
-object structure from scratch, or even individual \class{Message}
-objects by hand.  In fact, you can also take an existing structure and
-add new \class{Message} objects, move them around, etc.  This makes a
-very convenient interface for slicing-and-dicing MIME messages.
+some text to a parser, which parses the text and returns the root
+message object.  However you can also build a complete message
+structure from scratch, or even individual \class{Message} objects by
+hand.  In fact, you can also take an existing structure and add new
+\class{Message} objects, move them around, etc.  This makes a very
+convenient interface for slicing-and-dicing MIME messages.
 
 You can create a new object structure by creating \class{Message}
 instances, adding attachments and all the appropriate headers manually.
@@ -99,7 +99,7 @@
 It should use \method{get_payload()} and \method{set_payload()} to
 change the payload to encoded form.  It should also add any
 \mailheader{Content-Transfer-Encoding} or other headers to the message
-object as necessary.  The default encoding is \emph{Base64}.  See the
+object as necessary.  The default encoding is base64.  See the
 \refmodule{email.Encoders} module for a list of the built-in encoders.
 
 \var{_params} are passed straight through to the base class constructor.
@@ -124,7 +124,7 @@
 It should use \method{get_payload()} and \method{set_payload()} to
 change the payload to encoded form.  It should also add any
 \mailheader{Content-Transfer-Encoding} or other headers to the message
-object as necessary.  The default encoding is \emph{Base64}.  See the
+object as necessary.  The default encoding is base64.  See the
 \refmodule{email.Encoders} module for a list of the built-in encoders.
 
 \var{_params} are passed straight through to the \class{MIMEBase}

diff --git a/Doc/lib/emailparser.tex b/Doc/lib/emailparser.tex
index b5d9900..62a5a6f 100644
--- a/Doc/lib/emailparser.tex
+++ b/Doc/lib/emailparser.tex

@@ -54,7 +54,7 @@
 boundaries are missing, or when messages contain other formatting
 problems, the \class{Parser} will raise a
 \exception{MessageParseError}.  However, when lax parsing is enabled,
-the \class{Parser} will attempt to workaround such broken formatting
+the \class{Parser} will attempt to work around such broken formatting
 to produce a usable message structure (this doesn't mean
 \exception{MessageParseError}s are never raised; some ill-formatted
 messages just can't be parsed).  The \var{strict} flag defaults to
@@ -73,14 +73,12 @@
 on file-like objects.
 
 The text contained in \var{fp} must be formatted as a block of \rfc{2822}
-style headers and header continuation lines, optionally preceeded by a
+style headers and header continuation lines, optionally preceded by a
 envelope header.  The header block is terminated either by the
 end of the data or by a blank line.  Following the header block is the
 body of the message (which may contain MIME-encoded subparts).
 
-Optional \var{headersonly} is a flag specifying whether to stop
-parsing after reading the headers or not.  The default is \code{False},
-meaning it parses the entire contents of the file.
+Optional \var{headersonly} is as with the \method{parse()} method.
 
 \versionchanged[The \var{headersonly} flag was added]{2.2.2}
 \end{methoddesc}
@@ -104,7 +102,7 @@
 package namespace.
 
 \begin{funcdesc}{message_from_string}{s\optional{, _class\optional{, strict}}}
-Return a message object tree from a string.  This is exactly
+Return a message object structure from a string.  This is exactly
 equivalent to \code{Parser().parsestr(s)}.  Optional \var{_class} and
 \var{strict} are interpreted as with the \class{Parser} class constructor.
 
@@ -112,9 +110,10 @@
 \end{funcdesc}
 
 \begin{funcdesc}{message_from_file}{fp\optional{, _class\optional{, strict}}}
-Return a message object tree from an open file object.  This is exactly
-equivalent to \code{Parser().parse(fp)}.  Optional \var{_class} and
-\var{strict} are interpreted as with the \class{Parser} class constructor.
+Return a message object structure tree from an open file object.  This
+is exactly equivalent to \code{Parser().parse(fp)}.  Optional
+\var{_class} and \var{strict} are interpreted as with the
+\class{Parser} class constructor.
 
 \versionchanged[The \var{strict} flag was added]{2.2.2}
 \end{funcdesc}
@@ -138,9 +137,10 @@
       \method{get_payload()} method will return a string object.
 \item All \mimetype{multipart} type messages will be parsed as a
       container message object with a list of sub-message objects for
-      their payload.  These messages will return \code{True} for
-      \method{is_multipart()} and their \method{get_payload()} method
-      will return a list of \class{Message} instances.
+      their payload.  The outer container message will return
+      \code{True} for \method{is_multipart()} and their
+      \method{get_payload()} method will return the list of
+      \class{Message} subparts.
 \item Most messages with a content type of \mimetype{message/*}
       (e.g. \mimetype{message/deliver-status} and
       \mimetype{message/rfc822}) will also be parsed as container

diff --git a/Doc/lib/emailutil.tex b/Doc/lib/emailutil.tex
index e2ff752..80f0acf 100644
--- a/Doc/lib/emailutil.tex
+++ b/Doc/lib/emailutil.tex

@@ -6,7 +6,7 @@
 
 \begin{funcdesc}{quote}{str}
 Return a new string with backslashes in \var{str} replaced by two
-backslashes and double quotes replaced by backslash-double quote.
+backslashes, and double quotes replaced by backslash-double quote.
 \end{funcdesc}
 
 \begin{funcdesc}{unquote}{str}
@@ -85,7 +85,7 @@
 \end{funcdesc}
 
 \begin{funcdesc}{formatdate}{\optional{timeval\optional{, localtime}}}
-Returns a date string as per Internet standard \rfc{2822}, e.g.:
+Returns a date string as per \rfc{2822}, e.g.:
 
 \begin{verbatim}
 Fri, 09 Nov 2001 01:08:47 -0000
commit	5db478fa29299416f8475445f2584b20d8e534ed	[log] [tgz]
author	Barry Warsaw <barry@python.org>	Tue Oct 01 04:33:16 2002 +0000
committer	Barry Warsaw <barry@python.org>	Tue Oct 01 04:33:16 2002 +0000
tree	f01a56123be3884f6466ba3898bcf965771b2e87
parent	cc3a6df506db57d614225b3657b4e97efc078970 [diff]