| \documentclass{howto} |
| \usepackage{distutils} |
| % $Id$ |
| |
| \title{What's New in Python 2.4} |
| \release{0.0} |
| \author{A.M.\ Kuchling} |
| \authoraddress{ |
| \strong{Python Software Foundation}\\ |
| Email: \email{amk@amk.ca} |
| } |
| |
| \begin{document} |
| \maketitle |
| \tableofcontents |
| |
| This article explains the new features in Python 2.4. No release date |
| for Python 2.4 has been set; expect that this will happen mid-2004. |
| |
| While Python 2.3 was primarily a library development release, Python |
| 2.4 may extend the core language and interpreter in |
| as-yet-undetermined ways. |
| |
| This article doesn't attempt to provide a complete specification of |
| the new features, but instead provides a convenient overview. For |
| full details, you should refer to the documentation for Python 2.4, |
| such as the \citetitle[../lib/lib.html]{Python Library Reference} and |
| the \citetitle[../ref/ref.html]{Python Reference Manual}. |
| If you want to understand the complete implementation and design |
| rationale, refer to the PEP for a particular new feature. |
| |
| |
| %====================================================================== |
| \section{PEP 218: Built-In Set Objects} |
| |
| Two new built-in types, \function{set(iterable)} and |
| \function{frozenset(iterable)} provide high speed data types for |
| membership testing, for eliminating duplicates from sequences, and |
| for mathematical operations like unions, intersections, differences, |
| and symmetric differences. |
| |
| \begin{verbatim} |
| >>> a = set('abracadabra') # form a set from a string |
| >>> 'z' in a # fast membership testing |
| False |
| >>> a # unique letters in a |
| set(['a', 'r', 'b', 'c', 'd']) |
| >>> ''.join(a) # convert back into a string |
| 'arbcd' |
| |
| >>> b = set('alacazam') # form a second set |
| >>> a - b # letters in a but not in b |
| set(['r', 'd', 'b']) |
| >>> a | b # letters in either a or b |
| set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) |
| >>> a & b # letters in both a and b |
| set(['a', 'c']) |
| >>> a ^ b # letters in a or b but not both |
| set(['r', 'd', 'b', 'm', 'z', 'l']) |
| |
| >>> a.add('z') # add a new element |
| >>> a.update('wxy') # add multiple new elements |
| >>> a |
| set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z']) |
| >>> a.remove('x') # take one element out |
| >>> a |
| set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z']) |
| \end{verbatim} |
| |
| The type \function{frozenset()} is an immutable version of \function{set()}. |
| Since it is immutable and hashable, it may be used as a dictionary key or |
| as a member of another set. Accordingly, it does not have methods |
| like \method{add()} and \method{remove()} which could alter its contents. |
| |
| % XXX what happens to the sets module? |
| % The current thinking is that the sets module will be left alone. |
| % That way, existing code will continue to run without alteration. |
| % Also, the module provides an autoconversion feature not supported by set() |
| % and frozenset(). |
| |
| \begin{seealso} |
| \seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by |
| Greg Wilson and ultimately implemented by Raymond Hettinger.} |
| \end{seealso} |
| |
| %====================================================================== |
| \section{PEP 237: Unifying Long Integers and Integers} |
| |
| XXX write this. |
| |
| %====================================================================== |
| \section{PEP 322: Reverse Iteration} |
| |
| A new built-in function, \function{reversed(seq)}, takes a sequence |
| and returns an iterator that returns the elements of the sequence |
| in reverse order. |
| |
| \begin{verbatim} |
| >>> for i in reversed(xrange(1,4)): |
| ... print i |
| ... |
| 3 |
| 2 |
| 1 |
| \end{verbatim} |
| |
| Compared to extended slicing, \code{range(1,4)[::-1]}, \function{reversed()} |
| is easier to read, runs faster, and uses substantially less memory. |
| |
| Note that \function{reversed()} only accepts sequences, not arbitrary |
| iterators. If you want to reverse an iterator, first convert it to |
| a list with \function{list()}. |
| |
| \begin{verbatim} |
| >>> input= open('/etc/passwd', 'r') |
| >>> for line in reversed(list(input)): |
| ... print line |
| ... |
| root:*:0:0:System Administrator:/var/root:/bin/tcsh |
| ... |
| \end{verbatim} |
| |
| \begin{seealso} |
| \seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.} |
| |
| \end{seealso} |
| |
| |
| %====================================================================== |
| \section{Other Language Changes} |
| |
| Here are all of the changes that Python 2.4 makes to the core Python |
| language. |
| |
| \begin{itemize} |
| |
| \item The string methods, \method{ljust()}, \method{rjust()}, and |
| \method{center()} now take an optional argument for specifying a |
| fill character other than a space. |
| |
| \item Strings also gained an \method{rsplit()} method that |
| works like the \method{split()} method but splits from the end of |
| the string. |
| |
| \begin{verbatim} |
| >>> 'a b c'.split(None, 1) |
| ['a', 'b c'] |
| >>> 'a b c'.rsplit(None, 1) |
| ['a b', 'c'] |
| \end{verbatim} |
| |
| % Consider replacing the above example with one that is less |
| % abstract and more suggestive of why the function is useful: |
| % |
| % >>> 'www.python.org'.split('.', 1) |
| % ['www', 'python.org'] |
| % >>> 'www.python.org'.rsplit('.', 1) |
| % ['www.python', 'org'] |
| |
| \item The \method{sort()} method of lists gained three keyword |
| arguments, \var{cmp}, \var{key}, and \var{reverse}. These arguments |
| make some common usages of \method{sort()} simpler. All are optional. |
| |
| \var{cmp} is the same as the previous single argument to |
| \method{sort()}; if provided, the value should be a comparison |
| function that takes two arguments and returns -1, 0, or +1 depending |
| on how the arguments compare. |
| |
| \var{key} should be a single-argument function that takes a list |
| element and returns a comparison key for the element. The list is |
| then sorted using the comparison keys. The following example sorts a |
| list case-insensitively: |
| |
| \begin{verbatim} |
| >>> L = ['A', 'b', 'c', 'D'] |
| >>> L.sort() # Case-sensitive sort |
| >>> L |
| ['A', 'D', 'b', 'c'] |
| >>> L.sort(key=lambda x: x.lower()) |
| >>> L |
| ['A', 'b', 'c', 'D'] |
| >>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower())) |
| >>> L |
| ['A', 'b', 'c', 'D'] |
| \end{verbatim} |
| |
| The last example, which uses the \var{cmp} parameter, is the old way |
| to perform a case-insensitive sort. It works but is slower than |
| using a \var{key} parameter. Using \var{key} results in calling the |
| \method{lower()} method once for each element in the list while using |
| \var{cmp} will call the method twice for each comparison. |
| |
| For simple key functions and comparison functions, it is often |
| possible to avoid a \keyword{lambda} expression by using an unbound |
| method instead. For example, the above case-insensitive sort is best |
| coded as: |
| |
| \begin{verbatim} |
| >>> L.sort(key=str.lower) |
| >>> L |
| ['A', 'b', 'c', 'D'] |
| \end{verbatim} |
| |
| The \var{reverse} parameter should have a Boolean value. If the value is |
| \constant{True}, the list will be sorted into reverse order. Instead |
| of \code{L.sort(lambda x,y: cmp(y.score, x.score))}, you can now write: |
| \code{L.sort(key = lambda x: x.score, reverse=True)}. |
| |
| The results of sorting are now guaranteed to be stable. This means |
| that two entries with equal keys will be returned in the same order as |
| they were input. For example, you can sort a list of people by name, |
| and then sort the list by age, resulting in a list sorted by age where |
| people with the same age are in name-sorted order. |
| |
| \item There is a new built-in function \function{sorted(iterable)} that works |
| like the in-place \method{list.sort()} method but has been made suitable |
| for use in expressions. The differences are: |
| \begin{itemize} |
| \item the input may be any iterable; |
| \item a newly formed copy is sorted, leaving the original intact; and |
| \item the expression returns the new sorted copy |
| \end{itemize} |
| |
| \begin{verbatim} |
| >>> L = [9,7,8,3,2,4,1,6,5] |
| >>> [10+i for i in sorted(L)] # usable in a list comprehension |
| [11, 12, 13, 14, 15, 16, 17, 18, 19] |
| >>> L = [9,7,8,3,2,4,1,6,5] # original is left unchanged |
| [9,7,8,3,2,4,1,6,5] |
| |
| >>> sorted('Monte Python') # any iterable may be an input |
| [' ', 'M', 'P', 'e', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y'] |
| |
| >>> # List the contents of a dict sorted by key values |
| >>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5) |
| >>> for k, v in sorted(colormap.iteritems()): |
| ... print k, v |
| ... |
| black 4 |
| blue 2 |
| green 3 |
| red 1 |
| yellow 5 |
| |
| \end{verbatim} |
| |
| \item The \function{zip()} built-in function and \function{itertools.izip()} |
| now return an empty list instead of raising a \exception{TypeError} |
| exception if called with no arguments. This makes them more |
| suitable for use with variable length argument lists: |
| |
| \begin{verbatim} |
| >>> def transpose(array): |
| ... return zip(*array) |
| ... |
| >>> transpose([(1,2,3), (4,5,6)]) |
| [(1, 4), (2, 5), (3, 6)] |
| >>> transpose([]) |
| [] |
| \end{verbatim} |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| \subsection{Optimizations} |
| |
| \begin{itemize} |
| |
| \item \function{list()}, \function{tuple()}, \function{map()}, |
| \function{filter()}, and \function{zip()} now run several times |
| faster with non-sequence arguments that supply a \method{__len__()} |
| method. Previously, the pre-sizing optimization only applied to |
| sequence arguments. |
| |
| \item The methods \method{list.__getitem__()}, |
| \method{dict.__getitem__()}, and \method{dict.__contains__()} are |
| are now implemented as \class{method_descriptor} objects rather |
| than \class{wrapper_descriptor} objects. This form of optimized |
| access doubles their performance and makes them more suitable for |
| use as arguments to functionals: |
| \samp{map(mydict.__getitem__, keylist)}. |
| |
| \end{itemize} |
| |
| The net result of the 2.4 optimizations is that Python 2.4 runs the |
| pystone benchmark around XX\% faster than Python 2.3 and YY\% faster |
| than Python 2.2. |
| |
| |
| %====================================================================== |
| \section{New, Improved, and Deprecated Modules} |
| |
| As usual, Python's standard library received a number of enhancements and |
| bug fixes. Here's a partial list of the most notable changes, sorted |
| alphabetically by module name. Consult the |
| \file{Misc/NEWS} file in the source tree for a more |
| complete list of changes, or look through the CVS logs for all the |
| details. |
| |
| \begin{itemize} |
| |
| \item The \module{curses} modules now supports the ncurses extension |
| \function{use_default_colors()}. On platforms where the terminal |
| supports transparency, this makes it possible to use a transparent background. |
| (Contributed by J\"org Lehmann.) |
| |
| \item The \module{bisect} module now has an underlying C implementation |
| for improved performance. |
| (Contributed by Dmitry Vasiliev.) |
| |
| \item The CJKCodecs collections of East Asian codecs, maintained |
| by Hye-Shik Chang, was integrated into 2.4. |
| The new encodings are: |
| |
| \begin{itemize} |
| \item Chinese (PRC): gb2312, gbk, gb18030, hz |
| \item Chinese (ROC): big5, cp950 |
| \item Japanese: cp932, shift-jis, shift-jisx0213, euc-jp, |
| euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2, |
| iso-2022-jp-3, iso-2022-jp-ext |
| \item Korean: cp949, euc-kr, johab, iso-2022-kr |
| \end{itemize} |
| |
| |
| \item The \module{heapq} module has been converted to C. The resulting |
| ten-fold improvement in speed makes the module suitable for handling |
| high volumes of data. |
| |
| \item The \module{imaplib} module now supports IMAP's THREAD command. |
| (Contributed by Yves Dionne.) |
| |
| \item The \module{itertools} module gained a |
| \function{groupby(\var{iterable}\optional{, \var{func}})} function, |
| inspired by the GROUP BY clause from SQL. |
| \var{iterable} returns a succession of elements, and the optional |
| \var{func} is a function that takes an element and returns a key |
| value; if omitted, the key is simply the element itself. |
| \function{groupby()} then groups the elements into subsequences |
| which have matching values of the key, and returns a series of 2-tuples |
| containing the key value and an iterator over the subsequence. |
| |
| Here's an example. The \var{key} function simply returns whether a |
| number is even or odd, so the result of \function{groupby()} is to |
| return consecutive runs of odd or even numbers. |
| |
| \begin{verbatim} |
| >>> import itertools |
| >>> L = [2,4,6, 7,8,9,11, 12, 14] |
| >>> for key_val, it in itertools.groupby(L, lambda x: x % 2): |
| ... print key_val, list(it) |
| ... |
| 0 [2, 4, 6] |
| 1 [7] |
| 0 [8] |
| 1 [9, 11] |
| 0 [12, 14] |
| >>> |
| \end{verbatim} |
| |
| Like its SQL counterpart, \function{groupby()} is typically used with |
| sorted input. The logic for \function{groupby()} is similar to the |
| \UNIX{} \code{uniq} filter which makes it handy for eliminating, |
| counting, or identifying duplicate elements: |
| |
| \begin{verbatim} |
| >>> word = 'abracadabra' |
| >>> letters = sorted(word) # Turn string into a sorted list of letters |
| >>> letters |
| ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r'] |
| >>> [k for k, g in groupby(letters)] # List unique letters |
| ['a', 'b', 'c', 'd', 'r'] |
| >>> [(k, len(list(g))) for k, g in groupby(letters)] # Count letter occurences |
| [('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)] |
| >>> [k for k, g in groupby(letters) if len(list(g)) > 1] # List duplicated letters |
| ['a', 'b', 'r'] |
| \end{verbatim} |
| |
| \item \module{itertools} also gained a function named |
| \function{tee(\var{iterator}, \var{N})} that returns \var{N} independent |
| iterators that replicate \var{iterator}. If \var{N} is omitted, the |
| default is 2. |
| |
| \begin{verbatim} |
| >>> L = [1,2,3] |
| >>> i1, i2 = itertools.tee(L) |
| >>> i1,i2 |
| (<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>) |
| >>> list(i1) # Run the first iterator to exhaustion |
| [1, 2, 3] |
| >>> list(i2) # Run the second iterator to exhaustion |
| [1, 2, 3] |
| >\end{verbatim} |
| |
| Note that \function{tee()} has to keep copies of the values returned |
| by the iterator; in the worst case, it may need to keep all of them. |
| This should therefore be used carefully if the leading iterator |
| can run far ahead of the trailing iterator in a long stream of inputs. |
| If the separation is large, then it becomes preferable to use |
| \function{list()} instead. When the iterators track closely with one |
| another, \function{tee()} is ideal. Possible applications include |
| bookmarking, windowing, or lookahead iterators. |
| |
| \item A new \function{getsid()} function was added to the |
| \module{posix} module that underlies the \module{os} module. |
| (Contributed by J. Raynor.) |
| |
| \item The \module{operator} module gained two new functions, |
| \function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}. |
| Both functions return callables that take a single argument and return |
| the corresponding attribute or item; these callables make excellent |
| data extractors when used with \function{map()} or \function{sorted()}. |
| For example: |
| |
| \begin{verbatim} |
| >>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)] |
| >>> map(operator.itemgetter(0), L) |
| ['c', 'd', 'a', 'b'] |
| >>> map(operator.itemgetter(1), L) |
| [2, 1, 4, 3] |
| >>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item |
| [('d', 1), ('c', 2), ('b', 3), ('a', 4)] |
| \end{verbatim} |
| |
| \item The \module{random} module has a new method called \method{getrandbits(N)} |
| which returns an N-bit long integer. This method supports the existing |
| \method{randrange()} method, making it possible to efficiently generate |
| arbitrarily large random numbers. |
| |
| \item The regular expression language accepted by the \module{re} module |
| was extended with simple conditional expressions, written as |
| \code{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a |
| numeric group ID or a group name defined with \code{(?P<group>...)} |
| earlier in the expression. If the specified group matched, the |
| regular expression pattern \var{A} will be tested against the string; if |
| the group didn't match, the pattern \var{B} will be used instead. |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| % whole new modules get described in \subsections here |
| |
| |
| % ====================================================================== |
| \section{Build and C API Changes} |
| |
| Changes to Python's build process and to the C API include: |
| |
| \begin{itemize} |
| |
| \item Three new convenience macros were added for common return |
| values from extension functions: \csimplemacro{Py_RETURN_NONE}, |
| \csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}. |
| |
| \item A new function, \cfunction{PyTuple_Pack(N, obj1, obj2, ..., |
| objN)}, constructs tuples from a variable length argument list of |
| Python objects. |
| |
| \item A new function, \cfunction{PyDict_Contains(d, k)}, implements |
| fast dictionary lookups without masking exceptions raised during the |
| look-up process. |
| |
| \item A new method flag, \code{METH_COEXISTS}, allows a function |
| defined in slots to co-exist with a PyCFunction having the same name. |
| This can halve the access to time to a method such as |
| \method{set.__contains__()} |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| \subsection{Port-Specific Changes} |
| |
| \begin{itemize} |
| |
| \item The Windows port now builds under MSVC++ 7.1 as well as version 6. |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| \section{Other Changes and Fixes \label{section-other}} |
| |
| As usual, there were a bunch of other improvements and bugfixes |
| scattered throughout the source tree. A search through the CVS change |
| logs finds there were XXX patches applied and YYY bugs fixed between |
| Python 2.3 and 2.4. Both figures are likely to be underestimates. |
| |
| Some of the more notable changes are: |
| |
| \begin{itemize} |
| |
| \item The \module{timeit} module now automatically disables periodic |
| garbarge collection during the timing loop. This change makes |
| consecutive timings more comparable. |
| |
| \item The \module{base64} module now has more complete RFC 3548 support |
| for Base64, Base32, and Base16 encoding and decoding, including |
| optional case folding and optional alternative alphabets. |
| (Contributed by Barry Warsaw.) |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| \section{Porting to Python 2.4} |
| |
| This section lists previously described changes that may require |
| changes to your code: |
| |
| \begin{itemize} |
| |
| \item The \function{zip()} built-in function and \function{itertools.izip()} |
| now return an empty list instead of raising a \exception{TypeError} |
| exception if called with no arguments. |
| |
| \item \function{dircache.listdir()} now passes exceptions to the caller |
| instead of returning empty lists. |
| |
| \end{itemize} |
| |
| |
| %====================================================================== |
| \section{Acknowledgements \label{acks}} |
| |
| The author would like to thank the following people for offering |
| suggestions, corrections and assistance with various drafts of this |
| article: Raymond Hettinger. |
| |
| \end{document} |