blob: b25caeabba53215997aa2c488a7ec1d6f44c1e22 [file] [log] [blame]
Fred Drakeed0fa3d2003-07-30 19:14:09 +00001\documentclass{howto}
2\usepackage{distutils}
3% $Id$
4
5\title{What's New in Python 2.4}
6\release{0.0}
7\author{A.M.\ Kuchling}
Fred Drakeb914ef02004-01-02 06:57:50 +00008\authoraddress{
9 \strong{Python Software Foundation}\\
10 Email: \email{amk@amk.ca}
11}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000012
13\begin{document}
14\maketitle
15\tableofcontents
16
17This article explains the new features in Python 2.4. No release date
Raymond Hettingerd4462302003-11-26 17:52:45 +000018for Python 2.4 has been set; expect that this will happen mid-2004.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000019
20While Python 2.3 was primarily a library development release, Python
212.4 may extend the core language and interpreter in
22as-yet-undetermined ways.
23
24This article doesn't attempt to provide a complete specification of
25the new features, but instead provides a convenient overview. For
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000026full details, you should refer to the documentation for Python 2.4,
27such as the \citetitle[../lib/lib.html]{Python Library Reference} and
28the \citetitle[../ref/ref.html]{Python Reference Manual}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000029If you want to understand the complete implementation and design
30rationale, refer to the PEP for a particular new feature.
31
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000032
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000033%======================================================================
34\section{PEP 218: Built-In Set Objects}
35
Fred Drake56fcc232004-05-06 02:55:35 +000036Two new built-in types, \function{set(\var{iterable})} and
37\function{frozenset(\var{iterable})} provide high speed data types for
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000038membership testing, for eliminating duplicates from sequences, and
39for mathematical operations like unions, intersections, differences,
40and symmetric differences.
41
42\begin{verbatim}
43>>> a = set('abracadabra') # form a set from a string
44>>> 'z' in a # fast membership testing
45False
46>>> a # unique letters in a
47set(['a', 'r', 'b', 'c', 'd'])
48>>> ''.join(a) # convert back into a string
49'arbcd'
Raymond Hettingerd4462302003-11-26 17:52:45 +000050
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000051>>> b = set('alacazam') # form a second set
52>>> a - b # letters in a but not in b
53set(['r', 'd', 'b'])
54>>> a | b # letters in either a or b
55set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
56>>> a & b # letters in both a and b
57set(['a', 'c'])
58>>> a ^ b # letters in a or b but not both
59set(['r', 'd', 'b', 'm', 'z', 'l'])
Raymond Hettingerd4462302003-11-26 17:52:45 +000060
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000061>>> a.add('z') # add a new element
62>>> a.update('wxy') # add multiple new elements
63>>> a
64set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
65>>> a.remove('x') # take one element out
66>>> a
67set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
68\end{verbatim}
69
70The type \function{frozenset()} is an immutable version of \function{set()}.
71Since it is immutable and hashable, it may be used as a dictionary key or
72as a member of another set. Accordingly, it does not have methods
73like \method{add()} and \method{remove()} which could alter its contents.
74
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000075% XXX what happens to the sets module?
Raymond Hettingered54d912003-12-31 01:59:18 +000076% The current thinking is that the sets module will be left alone.
77% That way, existing code will continue to run without alteration.
78% Also, the module provides an autoconversion feature not supported by set()
79% and frozenset().
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000080
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000081\begin{seealso}
82\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by
83Greg Wilson and ultimately implemented by Raymond Hettinger.}
84\end{seealso}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000085
86%======================================================================
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000087\section{PEP 237: Unifying Long Integers and Integers}
88
89XXX write this.
90
91%======================================================================
Raymond Hettinger354433a2004-05-19 08:20:33 +000092\section{PEP 229: Generator Expressions}
93
94Generator expressions create in-line generators using a syntax similar
95to list comprehensions but with parenthesis instead of the surrounding
96brackets.
97
98Genexps allow simple generators to be constructed without a separate function
99definition. Writing:
100
101\begin{verbatim}
102 g = (tgtexp for var1 in exp1 for var2 in exp2 if exp3)
103\end{verbatim}
104
105is equivalent to:
106
107\begin{verbatim}
108 def _generator(exp):
109 for var1 in exp:
110 for var2 in exp2:
111 if exp3:
112 yield tgtexp
113 g = _generator(exp1)
114 del _generator
115\end{verbatim}
116
117The advantage over full generator definitions is in economy of
118expression. Their advantage over list comprehensions is in saving
119memory by creating data only when it is needed rather than forming
120a whole list is memory all at once. Applications using memory
121friendly generator expressions may scale-up to high volumes of data
122more readily than with list comprehensions.
123
124Generator expressions are intended to be used inside functions
125such as \function{sum()}, \function{min()}, \function{set()}, and
126\function{dict()}. These functions consume their data all at once
127and would not benefit from having a full list instead of a generator
Raymond Hettinger170a6222004-05-19 19:45:19 +0000128as an input:
Raymond Hettinger354433a2004-05-19 08:20:33 +0000129
130\begin{verbatim}
131>>> sum(i*i for i in range(10))
132285
133
Raymond Hettinger170a6222004-05-19 19:45:19 +0000134>>> sorted(set(i*i for i in xrange(-20, 20) if i%2==1)) # odd squares
135[1, 9, 25, 49, 81, 121, 169, 225, 289, 361]
Raymond Hettinger354433a2004-05-19 08:20:33 +0000136
Raymond Hettinger170a6222004-05-19 19:45:19 +0000137>>> from itertools import izip
Raymond Hettinger354433a2004-05-19 08:20:33 +0000138>>> xvec = [10, 20, 30]
139>>> yvec = [7, 5, 3]
Raymond Hettinger170a6222004-05-19 19:45:19 +0000140>>> sum(x*y for x,y in izip(xvec, yvec)) # dot product
Raymond Hettinger354433a2004-05-19 08:20:33 +0000141260
142
Raymond Hettinger170a6222004-05-19 19:45:19 +0000143>>> from math import pi, sin
144>>> sine_table = dict((x, sin(x*pi/180)) for x in xrange(0, 91))
145
146>>> unique_words = set(word for line in page for word in line.split())
147
148>>> valedictorian = max((student.gpa, student.name) for student in graduates)
149
Raymond Hettinger354433a2004-05-19 08:20:33 +0000150\end{verbatim}
151
152These examples show the intended use for generator expressions
153in situations where the values get consumed immediately after the
154generator is created. In these situations, they operate like
155memory efficient versions of list comprehensions.
156
157For more complex uses of generators, it is strongly recommended that
158the traditional full generator definitions be used instead. In a
159generator expression, the first for-loop expression is evaluated
160as soon as the expression is defined while the other expressions do
161not get evaluated until the generator is run. This nuance is never
162an issue when the generator is used immediately. If it is not used
163right away, then it is better to write a full generator definition
164which more clearly reveals when the expressions are evaluated and is
165more obvious about the visibility and lifetime of its looping variables.
166
167\begin{seealso}
168\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and
169implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.}
170\end{seealso}
171
172%======================================================================
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000173\section{PEP 322: Reverse Iteration}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000174
Fred Drake56fcc232004-05-06 02:55:35 +0000175A new built-in function, \function{reversed(\var{seq})}, takes a sequence
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000176and returns an iterator that returns the elements of the sequence
177in reverse order.
178
179\begin{verbatim}
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000180>>> for i in reversed(xrange(1,4)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000181... print i
182...
1833
1842
1851
186\end{verbatim}
187
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000188Compared to extended slicing, \code{range(1,4)[::-1]}, \function{reversed()}
189is easier to read, runs faster, and uses substantially less memory.
190
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000191Note that \function{reversed()} only accepts sequences, not arbitrary
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000192iterators. If you want to reverse an iterator, first convert it to
193a list with \function{list()}.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000194
195\begin{verbatim}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000196>>> input= open('/etc/passwd', 'r')
197>>> for line in reversed(list(input)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000198... print line
199...
200root:*:0:0:System Administrator:/var/root:/bin/tcsh
201 ...
202\end{verbatim}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000203
Andrew M. Kuchlingf7a6b672003-11-08 16:05:37 +0000204\begin{seealso}
205\seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.}
206
207\end{seealso}
208
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000209
210%======================================================================
211\section{Other Language Changes}
212
213Here are all of the changes that Python 2.4 makes to the core Python
214language.
215
216\begin{itemize}
Raymond Hettingerd4462302003-11-26 17:52:45 +0000217
Raymond Hettinger31017ae2004-03-04 08:25:44 +0000218\item The \method{dict.update()} method now accepts the same
219argument forms as the \class{dict} constructor. This includes any
220mapping, any iterable of key/value pairs, and/or keyword arguments.
221
Raymond Hettingerd4462302003-11-26 17:52:45 +0000222\item The string methods, \method{ljust()}, \method{rjust()}, and
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000223\method{center()} now take an optional argument for specifying a
Raymond Hettingerd4462302003-11-26 17:52:45 +0000224fill character other than a space.
225
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000226\item Strings also gained an \method{rsplit()} method that
Raymond Hettingered54d912003-12-31 01:59:18 +0000227works like the \method{split()} method but splits from the end of
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000228the string.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000229
230\begin{verbatim}
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000231>>> 'www.python.org'.split('.', 1)
232['www', 'python.org']
233'www.python.org'.rsplit('.', 1)
234['www.python', 'org']
235\end{verbatim}
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000236
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000237\item The \method{sort()} method of lists gained three keyword
238arguments, \var{cmp}, \var{key}, and \var{reverse}. These arguments
239make some common usages of \method{sort()} simpler. All are optional.
240
241\var{cmp} is the same as the previous single argument to
242\method{sort()}; if provided, the value should be a comparison
243function that takes two arguments and returns -1, 0, or +1 depending
244on how the arguments compare.
245
246\var{key} should be a single-argument function that takes a list
247element and returns a comparison key for the element. The list is
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000248then sorted using the comparison keys. The following example sorts a
249list case-insensitively:
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000250
251\begin{verbatim}
252>>> L = ['A', 'b', 'c', 'D']
253>>> L.sort() # Case-sensitive sort
254>>> L
255['A', 'D', 'b', 'c']
256>>> L.sort(key=lambda x: x.lower())
257>>> L
258['A', 'b', 'c', 'D']
259>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower()))
260>>> L
261['A', 'b', 'c', 'D']
262\end{verbatim}
263
264The last example, which uses the \var{cmp} parameter, is the old way
Raymond Hettingered54d912003-12-31 01:59:18 +0000265to perform a case-insensitive sort. It works but is slower than
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000266using a \var{key} parameter. Using \var{key} results in calling the
267\method{lower()} method once for each element in the list while using
268\var{cmp} will call the method twice for each comparison.
269
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000270For simple key functions and comparison functions, it is often
271possible to avoid a \keyword{lambda} expression by using an unbound
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000272method instead. For example, the above case-insensitive sort is best
273coded as:
274
275\begin{verbatim}
276>>> L.sort(key=str.lower)
277>>> L
278['A', 'b', 'c', 'D']
279\end{verbatim}
280
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000281The \var{reverse} parameter should have a Boolean value. If the value is
282\constant{True}, the list will be sorted into reverse order. Instead
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000283of \code{L.sort(lambda x,y: cmp(y.score, x.score))}, you can now write:
284\code{L.sort(key = lambda x: x.score, reverse=True)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000285
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000286The results of sorting are now guaranteed to be stable. This means
287that two entries with equal keys will be returned in the same order as
288they were input. For example, you can sort a list of people by name,
289and then sort the list by age, resulting in a list sorted by age where
290people with the same age are in name-sorted order.
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000291
Fred Drake56fcc232004-05-06 02:55:35 +0000292\item There is a new built-in function
293\function{sorted(\var{iterable})} that works like the in-place
294\method{list.sort()} method but has been made suitable for use in
295expressions. The differences are:
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000296 \begin{itemize}
Raymond Hettinger7d1dd042003-11-12 16:42:10 +0000297 \item the input may be any iterable;
298 \item a newly formed copy is sorted, leaving the original intact; and
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000299 \item the expression returns the new sorted copy
300 \end{itemize}
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000301
302\begin{verbatim}
303>>> L = [9,7,8,3,2,4,1,6,5]
Raymond Hettinger64958a12003-12-17 20:43:33 +0000304>>> [10+i for i in sorted(L)] # usable in a list comprehension
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000305[11, 12, 13, 14, 15, 16, 17, 18, 19]
306>>> L = [9,7,8,3,2,4,1,6,5] # original is left unchanged
307[9,7,8,3,2,4,1,6,5]
Raymond Hettingerd4462302003-11-26 17:52:45 +0000308
Raymond Hettinger64958a12003-12-17 20:43:33 +0000309>>> sorted('Monte Python') # any iterable may be an input
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000310[' ', 'M', 'P', 'e', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y']
Raymond Hettingerd4462302003-11-26 17:52:45 +0000311
312>>> # List the contents of a dict sorted by key values
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000313>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5)
Raymond Hettinger64958a12003-12-17 20:43:33 +0000314>>> for k, v in sorted(colormap.iteritems()):
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000315... print k, v
316...
317black 4
318blue 2
319green 3
320red 1
321yellow 5
322
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000323\end{verbatim}
324
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000325\item The \function{zip()} built-in function and \function{itertools.izip()}
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000326 now return an empty list instead of raising a \exception{TypeError}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000327 exception if called with no arguments. This makes them more
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000328 suitable for use with variable length argument lists:
329
330\begin{verbatim}
331>>> def transpose(array):
332... return zip(*array)
333...
334>>> transpose([(1,2,3), (4,5,6)])
335[(1, 4), (2, 5), (3, 6)]
336>>> transpose([])
337[]
338\end{verbatim}
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000339
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000340\end{itemize}
341
342
343%======================================================================
344\subsection{Optimizations}
345
346\begin{itemize}
347
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000348\item The inner loops for \class{list} and \class{tuple} slicing
Raymond Hettingerade08ea2004-03-18 09:48:12 +0000349 were optimized and now run about one-third faster. The inner
350 loops were also optimized for \class{dict} with performance
351 boosts to \method{keys()}, \method{values()}, \method{items()},
Fred Drake9de0a2b2004-03-20 08:13:32 +0000352\method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}.
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000353
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000354\item The machinery for growing and shrinking lists was optimized
Raymond Hettingerab517d22004-02-14 18:34:46 +0000355 for speed and for space efficiency. Small lists (under eight elements)
356 never over-allocate by more than three elements. Large lists do not
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000357 over-allocate by more than 1/8th. Appending and popping from lists
358 now runs faster due to more efficient code paths and less frequent
359 use of the underlying system realloc(). List comprehensions also
360 benefit. The amount of improvement varies between systems and shows
361 the greatest improvement on systems with poor realloc() implementations.
Raymond Hettinger79b5cf12004-02-17 10:46:32 +0000362 \method{list.extend()} was also optimized and no longer converts its
363 argument into a temporary list prior to extending the base list.
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000364
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000365\item \function{list()}, \function{tuple()}, \function{map()},
366 \function{filter()}, and \function{zip()} now run several times
367 faster with non-sequence arguments that supply a \method{__len__()}
368 method. Previously, the pre-sizing optimization only applied to
369 sequence arguments.
370
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000371\item The methods \method{list.__getitem__()},
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000372 \method{dict.__getitem__()}, and \method{dict.__contains__()} are
373 are now implemented as \class{method_descriptor} objects rather
374 than \class{wrapper_descriptor} objects. This form of optimized
375 access doubles their performance and makes them more suitable for
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000376 use as arguments to functionals:
377 \samp{map(mydict.__getitem__, keylist)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000378
Raymond Hettingerdd80f762004-03-07 07:31:06 +0000379\item Added an newcode opcode, \code{LIST_APPEND}, that simplifies
380 the generated bytecode for list comprehensions and speeds them up
381 by about a third.
382
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000383\end{itemize}
384
385The net result of the 2.4 optimizations is that Python 2.4 runs the
386pystone benchmark around XX\% faster than Python 2.3 and YY\% faster
387than Python 2.2.
388
389
390%======================================================================
391\section{New, Improved, and Deprecated Modules}
392
393As usual, Python's standard library received a number of enhancements and
394bug fixes. Here's a partial list of the most notable changes, sorted
395alphabetically by module name. Consult the
396\file{Misc/NEWS} file in the source tree for a more
397complete list of changes, or look through the CVS logs for all the
398details.
399
400\begin{itemize}
401
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000402\item The \module{curses} modules now supports the ncurses extension
403 \function{use_default_colors()}. On platforms where the terminal
404 supports transparency, this makes it possible to use a transparent background.
405 (Contributed by J\"org Lehmann.)
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000406
Raymond Hettinger0c410272004-01-05 10:13:35 +0000407\item The \module{bisect} module now has an underlying C implementation
408 for improved performance.
409 (Contributed by Dmitry Vasiliev.)
410
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000411\item The CJKCodecs collections of East Asian codecs, maintained
412by Hye-Shik Chang, was integrated into 2.4.
413The new encodings are:
414
415\begin{itemize}
416 \item Chinese (PRC): gb2312, gbk, gb18030, hz
417 \item Chinese (ROC): big5, cp950
418 \item Japanese: cp932, shift-jis, shift-jisx0213, euc-jp,
419euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2,
420 iso-2022-jp-3, iso-2022-jp-ext
421 \item Korean: cp949, euc-kr, johab, iso-2022-kr
422\end{itemize}
423
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000424\item There is a new \module{collections} module for
425 various specialized collection datatypes.
426 Currently it contains just one type, \class{deque},
427 a double-ended queue that supports efficiently adding and removing
428 elements from either end.
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000429
430\begin{verbatim}
431>>> from collections import deque
432>>> d = deque('ghi') # make a new deque with three items
433>>> d.append('j') # add a new entry to the right side
434>>> d.appendleft('f') # add a new entry to the left side
435>>> d # show the representation of the deque
436deque(['f', 'g', 'h', 'i', 'j'])
437>>> d.pop() # return and remove the rightmost item
438'j'
439>>> d.popleft() # return and remove the leftmost item
440'f'
441>>> list(d) # list the contents of the deque
442['g', 'h', 'i']
443>>> 'h' in d # search the deque
444True
445\end{verbatim}
446
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000447Several modules now take advantage of \class{collections.deque} for
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000448improved performance: \module{Queue}, \module{mutex}, \module{shlex}
449\module{threading}, and \module{pydoc}.
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000450
Fred Drake9f15b5c2004-05-18 04:30:00 +0000451\item The \module{ConfigParser} classes have been enhanced slightly.
452 The \method{read()} method now returns a list of the files that
453 were successfully parsed, and the \method{set()} method raises
454 \exception{TypeError} if passed a \var{value} argument that isn't a
455 string.
456
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000457\item The \module{heapq} module has been converted to C. The resulting
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000458 tenfold improvement in speed makes the module suitable for handling
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000459 high volumes of data.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000460
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000461\item The \module{imaplib} module now supports IMAP's THREAD command.
462(Contributed by Yves Dionne.)
463
Andrew M. Kuchlingad809552003-12-06 23:19:23 +0000464\item The \module{itertools} module gained a
465 \function{groupby(\var{iterable}\optional{, \var{func}})} function,
466 inspired by the GROUP BY clause from SQL.
467 \var{iterable} returns a succession of elements, and the optional
468 \var{func} is a function that takes an element and returns a key
469 value; if omitted, the key is simply the element itself.
470 \function{groupby()} then groups the elements into subsequences
471 which have matching values of the key, and returns a series of 2-tuples
472 containing the key value and an iterator over the subsequence.
473
474Here's an example. The \var{key} function simply returns whether a
475number is even or odd, so the result of \function{groupby()} is to
476return consecutive runs of odd or even numbers.
477
478\begin{verbatim}
479>>> import itertools
480>>> L = [2,4,6, 7,8,9,11, 12, 14]
481>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
482... print key_val, list(it)
483...
4840 [2, 4, 6]
4851 [7]
4860 [8]
4871 [9, 11]
4880 [12, 14]
489>>>
490\end{verbatim}
491
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000492Like its SQL counterpart, \function{groupby()} is typically used with
493sorted input. The logic for \function{groupby()} is similar to the
494\UNIX{} \code{uniq} filter which makes it handy for eliminating,
495counting, or identifying duplicate elements:
496
497\begin{verbatim}
498>>> word = 'abracadabra'
Raymond Hettingered54d912003-12-31 01:59:18 +0000499>>> letters = sorted(word) # Turn string into a sorted list of letters
Raymond Hettinger64958a12003-12-17 20:43:33 +0000500>>> letters
Andrew M. Kuchling4612bc52003-12-16 20:59:37 +0000501['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000502>>> [k for k, g in groupby(letters)] # List unique letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000503['a', 'b', 'c', 'd', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000504>>> [(k, len(list(g))) for k, g in groupby(letters)] # Count letter occurences
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000505[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
Raymond Hettingered54d912003-12-31 01:59:18 +0000506>>> [k for k, g in groupby(letters) if len(list(g)) > 1] # List duplicated letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000507['a', 'b', 'r']
508\end{verbatim}
509
Raymond Hettingered54d912003-12-31 01:59:18 +0000510\item \module{itertools} also gained a function named
511\function{tee(\var{iterator}, \var{N})} that returns \var{N} independent
512iterators that replicate \var{iterator}. If \var{N} is omitted, the
513default is 2.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000514
515\begin{verbatim}
516>>> L = [1,2,3]
517>>> i1, i2 = itertools.tee(L)
518>>> i1,i2
519(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
Raymond Hettingered54d912003-12-31 01:59:18 +0000520>>> list(i1) # Run the first iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000521[1, 2, 3]
Raymond Hettingered54d912003-12-31 01:59:18 +0000522>>> list(i2) # Run the second iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000523[1, 2, 3]
524>\end{verbatim}
525
526Note that \function{tee()} has to keep copies of the values returned
Raymond Hettingered54d912003-12-31 01:59:18 +0000527by the iterator; in the worst case, it may need to keep all of them.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000528This should therefore be used carefully if the leading iterator
Raymond Hettingered54d912003-12-31 01:59:18 +0000529can run far ahead of the trailing iterator in a long stream of inputs.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000530If the separation is large, then it becomes preferable to use
Raymond Hettingered54d912003-12-31 01:59:18 +0000531\function{list()} instead. When the iterators track closely with one
532another, \function{tee()} is ideal. Possible applications include
533bookmarking, windowing, or lookahead iterators.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000534
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000535\item A new \function{getsid()} function was added to the
536\module{posix} module that underlies the \module{os} module.
537(Contributed by J. Raynor.)
538
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000539\item The \module{operator} module gained two new functions,
540\function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}.
541Both functions return callables that take a single argument and return
Raymond Hettingered54d912003-12-31 01:59:18 +0000542the corresponding attribute or item; these callables make excellent
543data extractors when used with \function{map()} or \function{sorted()}.
544For example:
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000545
546\begin{verbatim}
Raymond Hettingered54d912003-12-31 01:59:18 +0000547>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000548>>> map(operator.itemgetter(0), L)
549['c', 'd', 'a', 'b']
550>>> map(operator.itemgetter(1), L)
Raymond Hettingered54d912003-12-31 01:59:18 +0000551[2, 1, 4, 3]
552>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
553[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000554\end{verbatim}
555
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000556\item The \module{random} module has a new method called \method{getrandbits(N)}
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000557 which returns an N-bit long integer. This method supports the existing
558 \method{randrange()} method, making it possible to efficiently generate
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000559 arbitrarily large random numbers.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000560
561\item The regular expression language accepted by the \module{re} module
562 was extended with simple conditional expressions, written as
563 \code{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a
564 numeric group ID or a group name defined with \code{(?P<group>...)}
565 earlier in the expression. If the specified group matched, the
566 regular expression pattern \var{A} will be tested against the string; if
567 the group didn't match, the pattern \var{B} will be used instead.
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000568
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000569\end{itemize}
570
571
572%======================================================================
573% whole new modules get described in \subsections here
574
575
576% ======================================================================
577\section{Build and C API Changes}
578
579Changes to Python's build process and to the C API include:
580
581\begin{itemize}
582
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000583 \item Three new convenience macros were added for common return
584 values from extension functions: \csimplemacro{Py_RETURN_NONE},
585 \csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}.
586
Fred Drakece3caf22004-02-12 18:13:12 +0000587 \item A new function, \cfunction{PyTuple_Pack(\var{N}, \var{obj1},
588 \var{obj2}, ..., \var{objN})}, constructs tuples from a variable
589 length argument list of Python objects.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000590
Fred Drakece3caf22004-02-12 18:13:12 +0000591 \item A new function, \cfunction{PyDict_Contains(\var{d}, \var{k})},
592 implements fast dictionary lookups without masking exceptions raised
593 during the look-up process.
Raymond Hettingerd4462302003-11-26 17:52:45 +0000594
Fred Drakece3caf22004-02-12 18:13:12 +0000595 \item A new method flag, \constant{METH_COEXISTS}, allows a function
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000596 defined in slots to co-exist with a PyCFunction having the same name.
597 This can halve the access to time to a method such as
598 \method{set.__contains__()}
599
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000600\end{itemize}
601
602
603%======================================================================
604\subsection{Port-Specific Changes}
605
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000606\begin{itemize}
607
608\item The Windows port now builds under MSVC++ 7.1 as well as version 6.
609
610\end{itemize}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000611
612
613%======================================================================
614\section{Other Changes and Fixes \label{section-other}}
615
616As usual, there were a bunch of other improvements and bugfixes
617scattered throughout the source tree. A search through the CVS change
618logs finds there were XXX patches applied and YYY bugs fixed between
619Python 2.3 and 2.4. Both figures are likely to be underestimates.
620
621Some of the more notable changes are:
622
623\begin{itemize}
624
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000625\item The \module{timeit} module now automatically disables periodic
626 garbarge collection during the timing loop. This change makes
627 consecutive timings more comparable.
628
629\item The \module{base64} module now has more complete RFC 3548 support
630 for Base64, Base32, and Base16 encoding and decoding, including
631 optional case folding and optional alternative alphabets.
632 (Contributed by Barry Warsaw.)
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000633
634\end{itemize}
635
636
637%======================================================================
638\section{Porting to Python 2.4}
639
640This section lists previously described changes that may require
641changes to your code:
642
643\begin{itemize}
644
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000645\item The \function{zip()} built-in function and \function{itertools.izip()}
646 now return an empty list instead of raising a \exception{TypeError}
647 exception if called with no arguments.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000648
649\item \function{dircache.listdir()} now passes exceptions to the caller
650 instead of returning empty lists.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000651
Fred Drake56fcc232004-05-06 02:55:35 +0000652\item \function{LexicalHandler.startDTD()} used to receive public and
653 system ID in the wrong order. This has been corrected; applications
654 relying on the wrong order need to be fixed.
Martin v. Löwis456ab1d2004-05-06 01:54:36 +0000655
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000656\end{itemize}
657
658
659%======================================================================
660\section{Acknowledgements \label{acks}}
661
662The author would like to thank the following people for offering
663suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000664article: Raymond Hettinger.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000665
666\end{document}