Doc/whatsnew/whatsnew23.tex - platform/external/python/cpython3 - Gitiles

 \documentclass{howto}
 % $Id$

 % TODO:
 %   Go through and get the contributor's name for all the various changes

 \title{What's New in Python 2.3}
 \release{0.01}
 \author{A.M. Kuchling}
 \authoraddress{\email{akuchlin@mems-exchange.org}}

 \begin{document}
 \maketitle
 \tableofcontents

 %\section{Introduction \label{intro}}

 {\large This article is a draft, and is currently up to date for some
 random version of the CVS tree around May 26 2002.  Please send any
 additions, comments or errata to the author.}

 This article explains the new features in Python 2.3.  The tentative
 release date of Python 2.3 is currently scheduled for August 30 2002.

 This article doesn't attempt to provide a complete specification of
 the new features, but instead provides a convenient overview.  For
 full details, you should refer to the documentation for Python 2.3,
 such as the
 \citetitle[http://www.python.org/doc/2.3/lib/lib.html]{Python Library
 Reference} and the
 \citetitle[http://www.python.org/doc/2.3/ref/ref.html]{Python
 Reference Manual}.  If you want to understand the complete
 implementation and design rationale for a change, refer to the PEP for
 a particular new feature.


 %======================================================================
 \section{PEP 255: Simple Generators\label{section-generators}}

 In Python 2.2, generators were added as an optional feature, to be
 enabled by a \code{from __future__ import generators} directive.  In
 2.3 generators no longer need to be specially enabled, and are now
 always present; this means that \keyword{yield} is now always a
 keyword.  The rest of this section is a copy of the description of
 generators from the ``What's New in Python 2.2'' document; if you read
 it when 2.2 came out, you can skip the rest of this section.

 You're doubtless familiar with how function calls work in Python or C.
 When you call a function, it gets a private namespace where its local
 variables are created.  When the function reaches a \keyword{return}
 statement, the local variables are destroyed and the resulting value
 is returned to the caller.  A later call to the same function will get
 a fresh new set of local variables. But, what if the local variables
 weren't thrown away on exiting a function?  What if you could later
 resume the function where it left off?  This is what generators
 provide; they can be thought of as resumable functions.

 Here's the simplest example of a generator function:

 \begin{verbatim}
 def generate_ints(N):
     for i in range(N):
         yield i
 \end{verbatim}

 A new keyword, \keyword{yield}, was introduced for generators.  Any
 function containing a \keyword{yield} statement is a generator
 function; this is detected by Python's bytecode compiler which
 compiles the function specially as a result.

 When you call a generator function, it doesn't return a single value;
 instead it returns a generator object that supports the iterator
 protocol.  On executing the \keyword{yield} statement, the generator
 outputs the value of \code{i}, similar to a \keyword{return}
 statement.  The big difference between \keyword{yield} and a
 \keyword{return} statement is that on reaching a \keyword{yield} the
 generator's state of execution is suspended and local variables are
 preserved.  On the next call to the generator's \code{.next()} method,
 the function will resume executing immediately after the
 \keyword{yield} statement.  (For complicated reasons, the
 \keyword{yield} statement isn't allowed inside the \keyword{try} block
 of a \code{try...finally} statement; read \pep{255} for a full
 explanation of the interaction between \keyword{yield} and
 exceptions.)

 Here's a sample usage of the \function{generate_ints} generator:

 \begin{verbatim}
 >>> gen = generate_ints(3)
 >>> gen
 <generator object at 0x8117f90>
 >>> gen.next()
 0
 >>> gen.next()
 1
 >>> gen.next()
 2
 >>> gen.next()
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "<stdin>", line 2, in generate_ints
 StopIteration
 \end{verbatim}

 You could equally write \code{for i in generate_ints(5)}, or
 \code{a,b,c = generate_ints(3)}.

 Inside a generator function, the \keyword{return} statement can only
 be used without a value, and signals the end of the procession of
 values; afterwards the generator cannot return any further values.
 \keyword{return} with a value, such as \code{return 5}, is a syntax
 error inside a generator function.  The end of the generator's results
 can also be indicated by raising \exception{StopIteration} manually,
 or by just letting the flow of execution fall off the bottom of the
 function.

 You could achieve the effect of generators manually by writing your
 own class and storing all the local variables of the generator as
 instance variables.  For example, returning a list of integers could
 be done by setting \code{self.count} to 0, and having the
 \method{next()} method increment \code{self.count} and return it.
 However, for a moderately complicated generator, writing a
 corresponding class would be much messier.
 \file{Lib/test/test_generators.py} contains a number of more
 interesting examples.  The simplest one implements an in-order
 traversal of a tree using generators recursively.

 \begin{verbatim}
 # A recursive generator that generates Tree leaves in in-order.
 def inorder(t):
     if t:
         for x in inorder(t.left):
             yield x
         yield t.label
         for x in inorder(t.right):
             yield x
 \end{verbatim}

 Two other examples in \file{Lib/test/test_generators.py} produce
 solutions for the N-Queens problem (placing $N$ queens on an $NxN$
 chess board so that no queen threatens another) and the Knight's Tour
 (a route that takes a knight to every square of an $NxN$ chessboard
 without visiting any square twice).

 The idea of generators comes from other programming languages,
 especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
 idea of generators is central.  In Icon, every
 expression and function call behaves like a generator.  One example
 from ``An Overview of the Icon Programming Language'' at
 \url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
 what this looks like:

 \begin{verbatim}
 sentence := "Store it in the neighboring harbor"
 if (i := find("or", sentence)) > 5 then write(i)
 \end{verbatim}

 In Icon the \function{find()} function returns the indexes at which the
 substring ``or'' is found: 3, 23, 33.  In the \keyword{if} statement,
 \code{i} is first assigned a value of 3, but 3 is less than 5, so the
 comparison fails, and Icon retries it with the second value of 23.  23
 is greater than 5, so the comparison now succeeds, and the code prints
 the value 23 to the screen.

 Python doesn't go nearly as far as Icon in adopting generators as a
 central concept.  Generators are considered a new part of the core
 Python language, but learning or using them isn't compulsory; if they
 don't solve any problems that you have, feel free to ignore them.
 One novel feature of Python's interface as compared to
 Icon's is that a generator's state is represented as a concrete object
 (the iterator) that can be passed around to other functions or stored
 in a data structure.

 \begin{seealso}

 \seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
 Peters, Magnus Lie Hetland.  Implemented mostly by Neil Schemenauer
 and Tim Peters, with other fixes from the Python Labs crew.}

 \end{seealso}


 %======================================================================
 \section{PEP 278: Universal Newline Support}

 The three major operating systems used today are Microsoft Windows,
 Apple's Macintosh OS, and the various \UNIX\ derivatives.  A minor
 irritation is that these three platforms all use different characters
 to mark the ends of lines in text files.  \UNIX\ uses character 10,
 the ASCII linefeed, while MacOS uses character 13, the ASCII carriage
 return, and Windows uses a two-character sequence of a carriage return
 plus a newline.

 Python's file objects can now support end of line conventions other
 than the one followed by the platform on which Python is running.
 Opening a file with the mode \samp{U} or \samp{rU} will open a file
 for reading in universal newline mode.  All three line ending
 conventions will be translated to a \samp{\e n} in the strings
 returned by the various file methods such as \method{read()} and
 \method{readline()}.

 Universal newline support is also used when importing modules and when
 executing a file with the \function{execfile()} function.  This means
 that Python modules can be shared between all three operating systems
 without needing to convert the line-endings.

 This feature can be disabled at compile-time by specifying
 \longprogramopt{without-universal-newlines} when running Python's
 \file{configure} script.

 \begin{seealso}

 \seepep{278}{Universal Newline Support}{Written
 and implemented by Jack Jansen.}

 \end{seealso}


 %======================================================================
 \section{PEP 279: The \function{enumerate()} Built-in Function}

 A new built-in function, \function{enumerate()}, will make
 certain loops a bit clearer.  \code{enumerate(thing)}, where
 \var{thing} is either an iterator or a sequence, returns a iterator
 that will return \code{(0, \var{thing[0]})}, \code{(1,
 \var{thing[1]})}, \code{(2, \var{thing[2]})}, and so forth.  Fairly
 often you'll see code to change every element of a list that looks
 like this:

 \begin{verbatim}
 for i in range(len(L)):
     item = L[i]
     # ... compute some result based on item ...
     L[i] = result
 \end{verbatim}

 This can be rewritten using \function{enumerate()} as:

 \begin{verbatim}
 for i, item in enumerate(L):
     # ... compute some result based on item ...
     L[i] = result
 \end{verbatim}


 \begin{seealso}

 \seepep{279}{The enumerate() built-in function}{Written
 by Raymond D. Hettinger.}

 \end{seealso}


 %======================================================================
 \section{PEP 285: The \class{bool} Type\label{section-bool}}

 A Boolean type was added to Python 2.3.  Two new constants were added
 to the \module{__builtin__} module, \constant{True} and
 \constant{False}.  The type object for this new type is named
 \class{bool}; the constructor for it takes any Python value and
 converts it to \constant{True} or \constant{False}.

 \begin{verbatim}
 >>> bool(1)
 True
 >>> bool(0)
 False
 >>> bool([])
 False
 >>> bool( (1,) )
 True
 \end{verbatim}

 Most of the standard library modules and built-in functions have been
 changed to return Booleans.

 \begin{verbatim}
 >>> obj = []
 >>> hasattr(obj, 'append')
 True
 >>> isinstance(obj, list)
 True
 >>> isinstance(obj, tuple)
 False
 \end{verbatim}

 Python's Booleans were added with the primary goal of making code
 clearer.  For example, if you're reading a function and encounter the
 statement \code{return 1}, you might wonder whether the \samp{1}
 represents a truth value, or whether it's an index, or whether it's a
 coefficient that multiplies some other quantity.  If the statement is
 \code{return True}, however, the meaning of the return value is quite
 clearly a truth value.

 Python's Booleans were not added for the sake of strict type-checking.
 A very strict language such as Pascal would also prevent you
 performing arithmetic with Booleans, and would require that the
 expression in an \keyword{if} statement always evaluate to a Boolean.
 Python is not this strict, and it never will be.  (\pep{285}
 explicitly says so.)  So you can still use any expression in an
 \keyword{if}, even ones that evaluate to a list or tuple or some
 random object, and the Boolean type is a subclass of the
 \class{int} class, so arithmetic using a Boolean still works.

 \begin{verbatim}
 >>> True + 1
 2
 >>> False + 1
 1
 >>> False * 75
 0
 >>> True * 75
 75
 \end{verbatim}

 To sum up \constant{True} and \constant{False} in a sentence: they're
 alternative ways to spell the integer values 1 and 0, with the single
 difference that \function{str()} and \function{repr()} return the
 strings \samp{True} and \samp{False} instead of \samp{1} and \samp{0}.

 \begin{seealso}

 \seepep{285}{Adding a bool type}{Written and implemented by GvR.}

 \end{seealso}


 %======================================================================
 %\section{Other Language Changes}

 %Here are the changes that Python 2.3 makes to the core language.

 %\begin{itemize}
 %\item The \keyword{yield} statement is now always a keyword, as
 %described in section~\ref{section-generators}.

 %\item Two new constants, \constant{True} and \constant{False} were
 %added along with the built-in \class{bool} type, as described in
 %section~\ref{section-bool}.

 %\item
 %\end{itemize}


 %======================================================================
 \section{Specialized Object Allocator (pymalloc)\label{section-pymalloc}}

 An experimental feature added to Python 2.1 was a specialized object
 allocator called pymalloc, written by Vladimir Marangozov.  Pymalloc
 was intended to be faster than the system \function{malloc()} and have
 less memory overhead.  The allocator uses C's \function{malloc()}
 function to get large pools of memory, and then fulfills smaller
 memory requests from these pools.

 In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
 enabled by default; you had to explicitly turn it on by providing the
 \longprogramopt{with-pymalloc} option to the \program{configure}
 script.  In 2.3, pymalloc has had further enhancements and is now
 enabled by default; you'll have to supply
 \longprogramopt{without-pymalloc} to disable it.

 This change is transparent to code written in Python; however,
 pymalloc may expose bugs in C extensions.  Authors of C extension
 modules should test their code with the object allocator enabled,
 because some incorrect code may cause core dumps at runtime.  There
 are a bunch of memory allocation functions in Python's C API that have
 previously been just aliases for the C library's \function{malloc()}
 and \function{free()}, meaning that if you accidentally called
 mismatched functions, the error wouldn't be noticeable.  When the
 object allocator is enabled, these functions aren't aliases of
 \function{malloc()} and \function{free()} any more, and calling the
 wrong function to free memory will get you a core dump.  For example,
 if memory was allocated using \function{PyMem_New()}, it has to be
 freed using \function{PyMem_Del()}, not \function{free()}.  A few
 modules included with Python fell afoul of this and had to be fixed;
 doubtless there are more third-party modules that will have the same
 problem.

 As part of this change, the confusing multiple interfaces for
 allocating memory have been consolidated down into two APIs.
 Memory allocated with one API must not be freed with the other API.

 \begin{itemize}
   \item To allocate and free an undistinguished chunk of memory using
   Python's allocator, use
   \cfunction{PyMem_Malloc()}, \cfunction{PyMem_Realloc()}, and
   \cfunction{PyMem_Free()}.

   \item In rare cases you may want to avoid using Python's allocator
   in order to allocate a chunk of memory;
   use \cfunction{PyObject_Malloc}, \cfunction{PyObject_Realloc},
   and \cfunction{PyObject_Free}.

   \item To allocate and free Python objects,
   use \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
   \cfunction{PyObject_Del()}.

 \end{itemize}

 Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
 debugging features to catch memory overwrites and doubled frees in
 both extension modules and in the interpreter itself.  To enable this
 support, turn on the Python interpreter's debugging code by running
 \program{configure} with \longprogramopt{with-pydebug}.

 \begin{seealso}

 \seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
 {For the full details of the pymalloc implementation, see
 the comments at the top of the file \file{Objects/obmalloc.c} in the
 Python source code.  The above link points to the file within the
 SourceForge CVS browser.}

 \end{seealso}

 %======================================================================
 \section{New and Improved Modules}

 As usual, Python's standard modules had a number of enhancements and
 bug fixes.  Here's a partial list; consult the \file{Misc/NEWS} file
 in the source tree, or the CVS logs, for a more complete list.

 \begin{itemize}

 \item One minor but far-reaching change is that the names of extension
 types defined by the modules included with Python now contain the
 module and a \samp{.} in front of the type name.  For example, in
 Python 2.2, if you created a socket and printed its
 \member{__class__}, you'd get this output:

 \begin{verbatim}
 >>> s = socket.socket()
 >>> s.__class__
 <type 'socket'>
 \end{verbatim}

 In 2.3, you get this:
 \begin{verbatim}
 >>> s.__class__
 <type '_socket.socket'>
 \end{verbatim}

 \item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
 string methods now have an optional argument for specifying the
 characters to strip.  The default is still to remove all whitespace
 characters:

 \begin{verbatim}
 >>> '   abc '.strip()
 'abc'
 >>> '><><abc<><><>'.strip('<>')
 'abc'
 >>> '><><abc<><><>\n'.strip('<>')
 'abc<><><>\n'
 >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
 u'\u4001abc'
 >>>
 \end{verbatim}

 \item Another new string method is \method{zfill()}, originally a
 function in the \module{string} module.  \method{zfill()} pads a
 numeric string with zeros on the left until it's the specified width.
 Note that the \code{\%} operator is still more flexible and powerful
 than \method{zfill()}.

 \begin{verbatim}
 >>> '45'.zfill(4)
 '0045'
 >>> '12345'.zfill(4)
 '12345'
 >>> 'goofy'.zfill(6)
 '0goofy'
 \end{verbatim}

 \item Dictionaries have a new method, method{pop(\var{key})}, that
 returns the value corresponding to \var{key} and removes that
 key/value pair from the dictionary.  \method{pop()} will raise a
 \exception{KeyError} if the requsted key isn't present in the
 dictionary:

 \begin{verbatim}
 >>> d = {1:2}
 >>> d
 {1: 2}
 >>> d.pop(4)
 Traceback (most recent call last):
   File ``<stdin>'', line 1, in ?
 KeyError: 4
 >>> d.pop(1)
 2
 >>> d.pop(1)
 Traceback (most recent call last):
   File ``<stdin>'', line 1, in ?
 KeyError: pop(): dictionary is empty
 >>> d
 {}
 >>>
 \end{verbatim}

 \item Two new functions in the \module{math} module,
 \function{degrees(\var{rads})} and \function{radians(\var{degs})},
 convert between radians and degrees.  Other functions in the
 \module{math} module such as
 \function{math.sin()} and \function{math.cos()} have always required
 input values measured in radians. (Contributed by Raymond Hettinger.)

 \item Two new functions, \function{killpg()} and \function{mknod()},
 were added to the \module{posix} module that underlies the \module{os}
 module.

 \item Two new binary packagers were added to the Distutils.
 \code{bdist_pkgtool} builds \file{.pkg} files to use with Solaris
 \program{pkgtool}, and \code{bdist_sdux} builds \program{swinstall}
 packages for use on HP-UX.  (Contributed by Mark Alexander.)

 \item The \module{array} module now supports arrays of Unicode
 characters using the \samp{u} format character.  Arrays also
 now support using the \code{+=} assignment operator to add another array's
 contents, and the \code{*=} assignment operator to repeat an array.
 (Contributed by Jason Orendorff.)

 \item The \module{grp} module now returns enhanced tuples:

 \begin{verbatim}
 >>> import grp
 >>> g = grp.getgrnam('amk')
 >>> g.gr_name, g.gr_gid
 ('amk', 500)
 \end{verbatim}

 \item The \module{readline} module also gained a number of new
 functions: \function{get_history_item()},
 \function{get_current_history_length()}, and \function{redisplay()}.

 \end{itemize}


 % ======================================================================
 \section{Build and C API Changes}

 Changes to Python's build process, and to the C API, include:

 \begin{itemize}

 \item Python can now optionally be built as a shared library
 (\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
 when running Python's \file{configure} script.  (Contributed by Ondrej
 Palkovsky.)

 \item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
 that
 uses it should be changed to use \code{PyArg_ParseTuple(args, "")}
 instead.

 \item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
 char *\var{key})} was added
 as shorthand for
 \code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key})}.

 \item The source code for the Expat XML parser is now included with
 the Python source, so the \module{pyexpat} module is no longer
 dependent on having a system library containing Expat.

 \item File objects now manage their internal string buffer
 differently by increasing it exponentially when needed.
 This results in the benchmark tests in \file{Lib/test/test_bufio.py}
 speeding up from 57 seconds to 1.7 seconds, according to one
 measurement.

 \item XXX Introduce two new flag bits that can be set in a PyMethodDef method
 descriptor, as used for the tp_methods slot of a type.  These new flag
 bits are both optional, and mutually exclusive.  Most methods will not
 use either.  These flags are used to create special method types which
 exist in the same namespace as normal methods without having to use
 tedious construction code to insert the new special method objects in
 the type's tp_dict after PyType_Ready() has been called.

 If METH_CLASS is specified, the method will represent a class method
 like that returned by the classmethod() built-in.

 If METH_STATIC is specified, the method will represent a static method
 like that returned by the staticmethod() built-in.

 These flags may not be used in the PyMethodDef table for modules since
 these special method types are not meaningful in that case; a
 ValueError will be raised if these flags are found in that context.

 \end{itemize}

 \subsection{Port-Specific Changes}

 XXX write this

 XXX OS/2 EMX port

 XXX MacOS: Weaklink most toolbox modules, improving backward
 compatibility. Modules will no longer fail to load if a single routine
 is missing on the curent OS version, in stead calling the missing
 routine will raise an exception.  Should finally fix 531398. 2.2.1
 candidate.  Also blacklisted some constants with definitions that
 were not Python-compatible.

 XXX Checked in Sean Reifschneider's RPM spec file and patches.


 %======================================================================
 \section{Other Changes and Fixes}

 Finally, there are various miscellaneous fixes:

 \begin{itemize}

 \item The tools used to build the documentation now work under Cygwin
 as well as \UNIX.

 \end{itemize}

 %======================================================================
 \section{Acknowledgements \label{acks}}

 The author would like to thank the following people for offering
 suggestions, corrections and assistance with various drafts of this
 article: Fred~L. Drake, Jr., Detlef Lannert.

 \end{document}