Merge p3yk branch with the trunk up to revision 45595. This breaks a fair
number of tests, all because of the codecs/_multibytecodecs issue described
here (it's not a Py3K issue, just something Py3K discovers):
http://mail.python.org/pipermail/python-dev/2006-April/064051.html
Hye-Shik Chang promised to look for a fix, so no need to fix it here. The
tests that are expected to break are:
test_codecencodings_cn
test_codecencodings_hk
test_codecencodings_jp
test_codecencodings_kr
test_codecencodings_tw
test_codecs
test_multibytecodec
This merge fixes an actual test failure (test_weakref) in this branch,
though, so I believe merging is the right thing to do anyway.
diff --git a/Doc/lib/libcodecs.tex b/Doc/lib/libcodecs.tex
index 1806ef0..8a2417e 100644
--- a/Doc/lib/libcodecs.tex
+++ b/Doc/lib/libcodecs.tex
@@ -112,6 +112,7 @@
Raises a \exception{LookupError} in case the encoding cannot be found or the
codec doesn't support an incremental encoder.
+\versionadded{2.5}
\end{funcdesc}
\begin{funcdesc}{getincrementaldecoder}{encoding}
@@ -120,6 +121,7 @@
Raises a \exception{LookupError} in case the encoding cannot be found or the
codec doesn't support an incremental decoder.
+\versionadded{2.5}
\end{funcdesc}
\begin{funcdesc}{getreader}{encoding}
@@ -150,7 +152,7 @@
continue. The encoder will encode the replacement and continue encoding
the original input at the specified position. Negative position values
will be treated as being relative to the end of the input string. If the
-resulting position is out of bound an IndexError will be raised.
+resulting position is out of bound an \exception{IndexError} will be raised.
Decoding and translating works similar, except \exception{UnicodeDecodeError}
or \exception{UnicodeTranslateError} will be passed to the handler and
@@ -229,12 +231,14 @@
Uses an incremental encoder to iteratively encode the input provided by
\var{iterable}. This function is a generator. \var{errors} (as well as
any other keyword argument) is passed through to the incremental encoder.
+\versionadded{2.5}
\end{funcdesc}
\begin{funcdesc}{iterdecode}{iterable, encoding\optional{, errors}}
Uses an incremental decoder to iteratively decode the input provided by
\var{iterable}. This function is a generator. \var{errors} (as well as
any other keyword argument) is passed through to the incremental encoder.
+\versionadded{2.5}
\end{funcdesc}
The module also provides the following constants which are useful
@@ -355,6 +359,8 @@
\subsubsection{IncrementalEncoder Objects \label{incremental-encoder-objects}}
+\versionadded{2.5}
+
The \class{IncrementalEncoder} class is used for encoding an input in multiple
steps. It defines the following methods which every incremental encoder must
define in order to be compatible to the Python codec registry.
@@ -437,6 +443,10 @@
Decodes \var{object} (taking the current state of the decoder into account)
and returns the resulting decoded object. If this is the last call to
\method{decode} \var{final} must be true (the default is false).
+ If \var{final} is true the decoder must decode the input completely and must
+ flush all buffers. If this isn't possible (e.g. because of incomplete byte
+ sequences at the end of the input) it must initiate error handling just like
+ in the stateless case (which might raise an exception).
\end{methoddesc}
\begin{methoddesc}{reset}{}
@@ -690,10 +700,10 @@
The simplest method is to map the codepoints 0-255 to the bytes
\code{0x0}-\code{0xff}. This means that a unicode object that contains
codepoints above \code{U+00FF} can't be encoded with this method (which
-is called \code{'latin-1'} or \code{'iso-8859-1'}). unicode.encode() will
-raise a UnicodeEncodeError that looks like this: \samp{UnicodeEncodeError:
-'latin-1' codec can't encode character u'\e u1234' in position 3: ordinal
-not in range(256)}.
+is called \code{'latin-1'} or \code{'iso-8859-1'}).
+\function{unicode.encode()} will raise a \exception{UnicodeEncodeError}
+that looks like this: \samp{UnicodeEncodeError: 'latin-1' codec can't
+encode character u'\e u1234' in position 3: ordinal not in range(256)}.
There's another group of encodings (the so called charmap encodings)
that choose a different subset of all unicode code points and how
@@ -1220,7 +1230,7 @@
\lineiv{rot_13}
{rot13}
- {byte string}
+ {Unicode string}
{Returns the Caesar-cypher encryption of the operand}
\lineiv{string_escape}