blob: fd60621f1e49cf12d07400e6bd348fe867850125 [file] [log] [blame]
Fred Drakeed0fa3d2003-07-30 19:14:09 +00001\documentclass{howto}
2\usepackage{distutils}
3% $Id$
4
5\title{What's New in Python 2.4}
6\release{0.0}
7\author{A.M.\ Kuchling}
Fred Drakeb914ef02004-01-02 06:57:50 +00008\authoraddress{
9 \strong{Python Software Foundation}\\
10 Email: \email{amk@amk.ca}
11}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000012
13\begin{document}
14\maketitle
15\tableofcontents
16
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +000017This article explains the new features in Python 2.4. The release
18date is expected to be around September 2004.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000019
20While Python 2.3 was primarily a library development release, Python
212.4 may extend the core language and interpreter in
22as-yet-undetermined ways.
23
24This article doesn't attempt to provide a complete specification of
25the new features, but instead provides a convenient overview. For
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000026full details, you should refer to the documentation for Python 2.4,
27such as the \citetitle[../lib/lib.html]{Python Library Reference} and
28the \citetitle[../ref/ref.html]{Python Reference Manual}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000029If you want to understand the complete implementation and design
30rationale, refer to the PEP for a particular new feature.
31
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000032
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000033%======================================================================
34\section{PEP 218: Built-In Set Objects}
35
Fred Drake56fcc232004-05-06 02:55:35 +000036Two new built-in types, \function{set(\var{iterable})} and
37\function{frozenset(\var{iterable})} provide high speed data types for
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000038membership testing, for eliminating duplicates from sequences, and
39for mathematical operations like unions, intersections, differences,
40and symmetric differences.
41
42\begin{verbatim}
43>>> a = set('abracadabra') # form a set from a string
44>>> 'z' in a # fast membership testing
45False
46>>> a # unique letters in a
47set(['a', 'r', 'b', 'c', 'd'])
48>>> ''.join(a) # convert back into a string
49'arbcd'
Raymond Hettingerd4462302003-11-26 17:52:45 +000050
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000051>>> b = set('alacazam') # form a second set
52>>> a - b # letters in a but not in b
53set(['r', 'd', 'b'])
54>>> a | b # letters in either a or b
55set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
56>>> a & b # letters in both a and b
57set(['a', 'c'])
58>>> a ^ b # letters in a or b but not both
59set(['r', 'd', 'b', 'm', 'z', 'l'])
Raymond Hettingerd4462302003-11-26 17:52:45 +000060
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000061>>> a.add('z') # add a new element
62>>> a.update('wxy') # add multiple new elements
63>>> a
64set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
65>>> a.remove('x') # take one element out
66>>> a
67set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
68\end{verbatim}
69
70The type \function{frozenset()} is an immutable version of \function{set()}.
71Since it is immutable and hashable, it may be used as a dictionary key or
72as a member of another set. Accordingly, it does not have methods
73like \method{add()} and \method{remove()} which could alter its contents.
74
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000075% XXX what happens to the sets module?
Raymond Hettingered54d912003-12-31 01:59:18 +000076% The current thinking is that the sets module will be left alone.
77% That way, existing code will continue to run without alteration.
78% Also, the module provides an autoconversion feature not supported by set()
79% and frozenset().
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000080
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000081\begin{seealso}
82\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by
83Greg Wilson and ultimately implemented by Raymond Hettinger.}
84\end{seealso}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000085
86%======================================================================
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000087\section{PEP 237: Unifying Long Integers and Integers}
88
89XXX write this.
90
91%======================================================================
Raymond Hettinger354433a2004-05-19 08:20:33 +000092\section{PEP 229: Generator Expressions}
93
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +000094Now, simple generators can be coded succinctly as expressions using a syntax
95like list comprehensions but with parentheses instead of brackets. These
96expressions are designed for situations where the generator is used right
97away by an enclosing function. Generator expressions are more compact but
Hye-Shik Chang2d6783b2004-06-04 09:33:18 +000098less versatile than full generator definitions and they tend to be more memory
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +000099friendly than equivalent list comprehensions.
Raymond Hettinger354433a2004-05-19 08:20:33 +0000100
101\begin{verbatim}
102 g = (tgtexp for var1 in exp1 for var2 in exp2 if exp3)
103\end{verbatim}
104
105is equivalent to:
106
107\begin{verbatim}
Raymond Hettinger956e4f72004-05-20 22:59:26 +0000108 def __gen(exp):
Raymond Hettinger354433a2004-05-19 08:20:33 +0000109 for var1 in exp:
110 for var2 in exp2:
111 if exp3:
112 yield tgtexp
Raymond Hettinger956e4f72004-05-20 22:59:26 +0000113 g = __gen(iter(exp1))
114 del __gen
Raymond Hettinger354433a2004-05-19 08:20:33 +0000115\end{verbatim}
116
117The advantage over full generator definitions is in economy of
118expression. Their advantage over list comprehensions is in saving
119memory by creating data only when it is needed rather than forming
120a whole list is memory all at once. Applications using memory
121friendly generator expressions may scale-up to high volumes of data
122more readily than with list comprehensions.
123
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +0000124Generator expressions are best used in functions that consume their
125data all at once and would not benefit from having a full list instead
126of a generator as an input:
Raymond Hettinger354433a2004-05-19 08:20:33 +0000127
128\begin{verbatim}
129>>> sum(i*i for i in range(10))
130285
131
Raymond Hettinger170a6222004-05-19 19:45:19 +0000132>>> sorted(set(i*i for i in xrange(-20, 20) if i%2==1)) # odd squares
133[1, 9, 25, 49, 81, 121, 169, 225, 289, 361]
Raymond Hettinger354433a2004-05-19 08:20:33 +0000134
Raymond Hettinger170a6222004-05-19 19:45:19 +0000135>>> from itertools import izip
Raymond Hettinger354433a2004-05-19 08:20:33 +0000136>>> xvec = [10, 20, 30]
137>>> yvec = [7, 5, 3]
Raymond Hettinger170a6222004-05-19 19:45:19 +0000138>>> sum(x*y for x,y in izip(xvec, yvec)) # dot product
Raymond Hettinger354433a2004-05-19 08:20:33 +0000139260
140
Raymond Hettinger170a6222004-05-19 19:45:19 +0000141>>> from math import pi, sin
142>>> sine_table = dict((x, sin(x*pi/180)) for x in xrange(0, 91))
143
144>>> unique_words = set(word for line in page for word in line.split())
145
146>>> valedictorian = max((student.gpa, student.name) for student in graduates)
147
Raymond Hettinger354433a2004-05-19 08:20:33 +0000148\end{verbatim}
149
Raymond Hettinger354433a2004-05-19 08:20:33 +0000150For more complex uses of generators, it is strongly recommended that
151the traditional full generator definitions be used instead. In a
152generator expression, the first for-loop expression is evaluated
153as soon as the expression is defined while the other expressions do
154not get evaluated until the generator is run. This nuance is never
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +0000155an issue when the generator is used immediately; however, if it is not
156used right away, a full generator definition would be much more clear
157about when the sub-expressions are evaluated and would be more obvious
158about the visibility and lifetime of the variables.
Raymond Hettinger354433a2004-05-19 08:20:33 +0000159
160\begin{seealso}
161\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and
162implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.}
163\end{seealso}
164
165%======================================================================
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000166\section{PEP 322: Reverse Iteration}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000167
Fred Drake56fcc232004-05-06 02:55:35 +0000168A new built-in function, \function{reversed(\var{seq})}, takes a sequence
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000169and returns an iterator that returns the elements of the sequence
170in reverse order.
171
172\begin{verbatim}
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000173>>> for i in reversed(xrange(1,4)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000174... print i
175...
1763
1772
1781
179\end{verbatim}
180
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000181Compared to extended slicing, \code{range(1,4)[::-1]}, \function{reversed()}
182is easier to read, runs faster, and uses substantially less memory.
183
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000184Note that \function{reversed()} only accepts sequences, not arbitrary
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000185iterators. If you want to reverse an iterator, first convert it to
186a list with \function{list()}.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000187
188\begin{verbatim}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000189>>> input= open('/etc/passwd', 'r')
190>>> for line in reversed(list(input)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000191... print line
192...
193root:*:0:0:System Administrator:/var/root:/bin/tcsh
194 ...
195\end{verbatim}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000196
Andrew M. Kuchlingf7a6b672003-11-08 16:05:37 +0000197\begin{seealso}
198\seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.}
199
200\end{seealso}
201
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000202
203%======================================================================
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000204\section{PEP 327: Decimal Data Type}
205
206A new module, \module{decimal}, offers a \class{Decimal} data type for
207decimal floating point arithmetic. Compared to the built-in \class{float}
208type implemented with binary floating point, the new class is especially
209useful for financial applications and other uses which require exact
210decimal representation, control over precision, control over rounding
211to meet legal or regulatory requirements, tracking of significant
212decimal places, or for applications where the user expects the results
213to match hand calculations done the way they were taught in school.
214
215For example, calculating a 5% tax on a 70 cent phone charge gives
216different results in decimal floating point and binary floating point
217with the difference being significant when rounding to the nearest
218cent:
219
220\begin{verbatim}
221>>> from decimal import *
222>>> Decimal('0.70') * Decimal('1.05')
223Decimal("0.7350")
224>>> .70 * 1.05
2250.73499999999999999
226\end{verbatim}
227
228Note that the \class{Decimal} result keeps a trailing zero, automatically
229inferring four place significance from two digit mulitiplicands. A key
230goal is to reproduce the mathematics we do by hand and avoid the tricky
231issues that arise when decimal numbers cannot be represented exactly in
232binary floating point.
233
234Exact representation enables the \class{Decimal} class to perform
235modulo calculations and equality tests that would fail in binary
236floating point:
237
238\begin{verbatim}
239>>> Decimal('1.00') % Decimal('.10')
240Decimal("0.00")
241>>> 1.00 % 0.10
2420.09999999999999995
243
244>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
245True
246>>> sum([0.1]*10) == 1.0
247False
248\end{verbatim}
249
250The \module{decimal} module also allows arbitrarily large precisions to be
251set for calculation:
252
253\begin{verbatim}
254>>> getcontext().prec = 24
255>>> Decimal(1) / Decimal(7)
256Decimal("0.142857142857142857142857")
257\end{verbatim}
258
259\begin{seealso}
260\seepep{327}{Decimal Data Type}{Written by Facundo Batista and implemented
261 by Eric Price, Facundo Bastista, Raymond Hettinger, Aahz, and Tim Peters.}
262\end{seealso}
263
264
265%======================================================================
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000266\section{Other Language Changes}
267
268Here are all of the changes that Python 2.4 makes to the core Python
269language.
270
271\begin{itemize}
Raymond Hettingerd4462302003-11-26 17:52:45 +0000272
Raymond Hettinger31017ae2004-03-04 08:25:44 +0000273\item The \method{dict.update()} method now accepts the same
274argument forms as the \class{dict} constructor. This includes any
275mapping, any iterable of key/value pairs, and/or keyword arguments.
276
Raymond Hettingerd4462302003-11-26 17:52:45 +0000277\item The string methods, \method{ljust()}, \method{rjust()}, and
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000278\method{center()} now take an optional argument for specifying a
Raymond Hettingerd4462302003-11-26 17:52:45 +0000279fill character other than a space.
280
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000281\item Strings also gained an \method{rsplit()} method that
Raymond Hettingered54d912003-12-31 01:59:18 +0000282works like the \method{split()} method but splits from the end of
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000283the string.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000284
285\begin{verbatim}
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000286>>> 'www.python.org'.split('.', 1)
287['www', 'python.org']
288'www.python.org'.rsplit('.', 1)
289['www.python', 'org']
290\end{verbatim}
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000291
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000292\item The \method{sort()} method of lists gained three keyword
293arguments, \var{cmp}, \var{key}, and \var{reverse}. These arguments
294make some common usages of \method{sort()} simpler. All are optional.
295
296\var{cmp} is the same as the previous single argument to
297\method{sort()}; if provided, the value should be a comparison
298function that takes two arguments and returns -1, 0, or +1 depending
299on how the arguments compare.
300
301\var{key} should be a single-argument function that takes a list
302element and returns a comparison key for the element. The list is
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000303then sorted using the comparison keys. The following example sorts a
304list case-insensitively:
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000305
306\begin{verbatim}
307>>> L = ['A', 'b', 'c', 'D']
308>>> L.sort() # Case-sensitive sort
309>>> L
310['A', 'D', 'b', 'c']
311>>> L.sort(key=lambda x: x.lower())
312>>> L
313['A', 'b', 'c', 'D']
314>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower()))
315>>> L
316['A', 'b', 'c', 'D']
317\end{verbatim}
318
319The last example, which uses the \var{cmp} parameter, is the old way
Raymond Hettingered54d912003-12-31 01:59:18 +0000320to perform a case-insensitive sort. It works but is slower than
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000321using a \var{key} parameter. Using \var{key} results in calling the
322\method{lower()} method once for each element in the list while using
323\var{cmp} will call the method twice for each comparison.
324
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000325For simple key functions and comparison functions, it is often
326possible to avoid a \keyword{lambda} expression by using an unbound
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000327method instead. For example, the above case-insensitive sort is best
328coded as:
329
330\begin{verbatim}
331>>> L.sort(key=str.lower)
332>>> L
333['A', 'b', 'c', 'D']
334\end{verbatim}
335
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000336The \var{reverse} parameter should have a Boolean value. If the value is
337\constant{True}, the list will be sorted into reverse order. Instead
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000338of \code{L.sort(lambda x,y: cmp(y.score, x.score))}, you can now write:
339\code{L.sort(key = lambda x: x.score, reverse=True)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000340
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000341The results of sorting are now guaranteed to be stable. This means
342that two entries with equal keys will be returned in the same order as
343they were input. For example, you can sort a list of people by name,
344and then sort the list by age, resulting in a list sorted by age where
345people with the same age are in name-sorted order.
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000346
Fred Drake56fcc232004-05-06 02:55:35 +0000347\item There is a new built-in function
348\function{sorted(\var{iterable})} that works like the in-place
349\method{list.sort()} method but has been made suitable for use in
350expressions. The differences are:
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000351 \begin{itemize}
Raymond Hettinger7d1dd042003-11-12 16:42:10 +0000352 \item the input may be any iterable;
353 \item a newly formed copy is sorted, leaving the original intact; and
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000354 \item the expression returns the new sorted copy
355 \end{itemize}
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000356
357\begin{verbatim}
358>>> L = [9,7,8,3,2,4,1,6,5]
Raymond Hettinger64958a12003-12-17 20:43:33 +0000359>>> [10+i for i in sorted(L)] # usable in a list comprehension
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000360[11, 12, 13, 14, 15, 16, 17, 18, 19]
361>>> L = [9,7,8,3,2,4,1,6,5] # original is left unchanged
362[9,7,8,3,2,4,1,6,5]
Raymond Hettingerd4462302003-11-26 17:52:45 +0000363
Raymond Hettinger64958a12003-12-17 20:43:33 +0000364>>> sorted('Monte Python') # any iterable may be an input
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000365[' ', 'M', 'P', 'e', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y']
Raymond Hettingerd4462302003-11-26 17:52:45 +0000366
367>>> # List the contents of a dict sorted by key values
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000368>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5)
Raymond Hettinger64958a12003-12-17 20:43:33 +0000369>>> for k, v in sorted(colormap.iteritems()):
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000370... print k, v
371...
372black 4
373blue 2
374green 3
375red 1
376yellow 5
377
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000378\end{verbatim}
379
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000380\item The \function{zip()} built-in function and \function{itertools.izip()}
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000381 now return an empty list instead of raising a \exception{TypeError}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000382 exception if called with no arguments. This makes them more
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000383 suitable for use with variable length argument lists:
384
385\begin{verbatim}
386>>> def transpose(array):
387... return zip(*array)
388...
389>>> transpose([(1,2,3), (4,5,6)])
390[(1, 4), (2, 5), (3, 6)]
391>>> transpose([])
392[]
393\end{verbatim}
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000394
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000395\end{itemize}
396
397
398%======================================================================
399\subsection{Optimizations}
400
401\begin{itemize}
402
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000403\item The inner loops for \class{list} and \class{tuple} slicing
Raymond Hettingerade08ea2004-03-18 09:48:12 +0000404 were optimized and now run about one-third faster. The inner
405 loops were also optimized for \class{dict} with performance
406 boosts to \method{keys()}, \method{values()}, \method{items()},
Fred Drake9de0a2b2004-03-20 08:13:32 +0000407\method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}.
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000408
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000409\item The machinery for growing and shrinking lists was optimized
Raymond Hettingerab517d22004-02-14 18:34:46 +0000410 for speed and for space efficiency. Small lists (under eight elements)
411 never over-allocate by more than three elements. Large lists do not
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000412 over-allocate by more than 1/8th. Appending and popping from lists
413 now runs faster due to more efficient code paths and less frequent
414 use of the underlying system realloc(). List comprehensions also
415 benefit. The amount of improvement varies between systems and shows
416 the greatest improvement on systems with poor realloc() implementations.
Raymond Hettinger79b5cf12004-02-17 10:46:32 +0000417 \method{list.extend()} was also optimized and no longer converts its
418 argument into a temporary list prior to extending the base list.
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000419
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000420\item \function{list()}, \function{tuple()}, \function{map()},
421 \function{filter()}, and \function{zip()} now run several times
422 faster with non-sequence arguments that supply a \method{__len__()}
423 method. Previously, the pre-sizing optimization only applied to
424 sequence arguments.
425
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000426\item The methods \method{list.__getitem__()},
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000427 \method{dict.__getitem__()}, and \method{dict.__contains__()} are
428 are now implemented as \class{method_descriptor} objects rather
429 than \class{wrapper_descriptor} objects. This form of optimized
430 access doubles their performance and makes them more suitable for
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000431 use as arguments to functionals:
432 \samp{map(mydict.__getitem__, keylist)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000433
Fred Draked6d35d92004-06-03 13:31:22 +0000434\item Added a new opcode, \code{LIST_APPEND}, that simplifies
Raymond Hettingerdd80f762004-03-07 07:31:06 +0000435 the generated bytecode for list comprehensions and speeds them up
436 by about a third.
437
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000438\end{itemize}
439
440The net result of the 2.4 optimizations is that Python 2.4 runs the
441pystone benchmark around XX\% faster than Python 2.3 and YY\% faster
442than Python 2.2.
443
444
445%======================================================================
446\section{New, Improved, and Deprecated Modules}
447
448As usual, Python's standard library received a number of enhancements and
449bug fixes. Here's a partial list of the most notable changes, sorted
450alphabetically by module name. Consult the
451\file{Misc/NEWS} file in the source tree for a more
452complete list of changes, or look through the CVS logs for all the
453details.
454
455\begin{itemize}
456
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000457\item The \module{curses} modules now supports the ncurses extension
Fred Draked6d35d92004-06-03 13:31:22 +0000458 \function{use_default_colors()}. On platforms where the terminal
459 supports transparency, this makes it possible to use a transparent
460 background. (Contributed by J\"org Lehmann.)
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000461
Raymond Hettinger0c410272004-01-05 10:13:35 +0000462\item The \module{bisect} module now has an underlying C implementation
463 for improved performance.
464 (Contributed by Dmitry Vasiliev.)
465
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000466\item The CJKCodecs collections of East Asian codecs, maintained
467by Hye-Shik Chang, was integrated into 2.4.
468The new encodings are:
469
470\begin{itemize}
471 \item Chinese (PRC): gb2312, gbk, gb18030, hz
472 \item Chinese (ROC): big5, cp950
473 \item Japanese: cp932, shift-jis, shift-jisx0213, euc-jp,
474euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2,
475 iso-2022-jp-3, iso-2022-jp-ext
476 \item Korean: cp949, euc-kr, johab, iso-2022-kr
477\end{itemize}
478
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000479\item There is a new \module{collections} module for
480 various specialized collection datatypes.
481 Currently it contains just one type, \class{deque},
482 a double-ended queue that supports efficiently adding and removing
483 elements from either end.
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000484
485\begin{verbatim}
486>>> from collections import deque
487>>> d = deque('ghi') # make a new deque with three items
488>>> d.append('j') # add a new entry to the right side
489>>> d.appendleft('f') # add a new entry to the left side
490>>> d # show the representation of the deque
491deque(['f', 'g', 'h', 'i', 'j'])
492>>> d.pop() # return and remove the rightmost item
493'j'
494>>> d.popleft() # return and remove the leftmost item
495'f'
496>>> list(d) # list the contents of the deque
497['g', 'h', 'i']
498>>> 'h' in d # search the deque
499True
500\end{verbatim}
501
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000502Several modules now take advantage of \class{collections.deque} for
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000503improved performance: \module{Queue}, \module{mutex}, \module{shlex}
504\module{threading}, and \module{pydoc}.
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000505
Fred Drake9f15b5c2004-05-18 04:30:00 +0000506\item The \module{ConfigParser} classes have been enhanced slightly.
507 The \method{read()} method now returns a list of the files that
508 were successfully parsed, and the \method{set()} method raises
509 \exception{TypeError} if passed a \var{value} argument that isn't a
510 string.
511
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000512\item The \module{heapq} module has been converted to C. The resulting
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000513 tenfold improvement in speed makes the module suitable for handling
Raymond Hettinger33ecffb2004-06-10 05:03:17 +0000514 high volumes of data. In addition, the module has two new functions
515 \function{nlargest()} and \function{nsmallest()} that use heaps to
516 find the largest or smallest n values in a dataset without the
517 expense of a full sort.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000518
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000519\item The \module{imaplib} module now supports IMAP's THREAD command.
520(Contributed by Yves Dionne.)
521
Andrew M. Kuchlingad809552003-12-06 23:19:23 +0000522\item The \module{itertools} module gained a
523 \function{groupby(\var{iterable}\optional{, \var{func}})} function,
524 inspired by the GROUP BY clause from SQL.
525 \var{iterable} returns a succession of elements, and the optional
526 \var{func} is a function that takes an element and returns a key
527 value; if omitted, the key is simply the element itself.
528 \function{groupby()} then groups the elements into subsequences
529 which have matching values of the key, and returns a series of 2-tuples
530 containing the key value and an iterator over the subsequence.
531
532Here's an example. The \var{key} function simply returns whether a
533number is even or odd, so the result of \function{groupby()} is to
534return consecutive runs of odd or even numbers.
535
536\begin{verbatim}
537>>> import itertools
538>>> L = [2,4,6, 7,8,9,11, 12, 14]
539>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
540... print key_val, list(it)
541...
5420 [2, 4, 6]
5431 [7]
5440 [8]
5451 [9, 11]
5460 [12, 14]
547>>>
548\end{verbatim}
549
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000550Like its SQL counterpart, \function{groupby()} is typically used with
551sorted input. The logic for \function{groupby()} is similar to the
552\UNIX{} \code{uniq} filter which makes it handy for eliminating,
553counting, or identifying duplicate elements:
554
555\begin{verbatim}
556>>> word = 'abracadabra'
Raymond Hettingered54d912003-12-31 01:59:18 +0000557>>> letters = sorted(word) # Turn string into a sorted list of letters
Raymond Hettinger64958a12003-12-17 20:43:33 +0000558>>> letters
Andrew M. Kuchling4612bc52003-12-16 20:59:37 +0000559['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000560>>> [k for k, g in groupby(letters)] # List unique letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000561['a', 'b', 'c', 'd', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000562>>> [(k, len(list(g))) for k, g in groupby(letters)] # Count letter occurences
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000563[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
Raymond Hettingered54d912003-12-31 01:59:18 +0000564>>> [k for k, g in groupby(letters) if len(list(g)) > 1] # List duplicated letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000565['a', 'b', 'r']
566\end{verbatim}
567
Raymond Hettingered54d912003-12-31 01:59:18 +0000568\item \module{itertools} also gained a function named
569\function{tee(\var{iterator}, \var{N})} that returns \var{N} independent
570iterators that replicate \var{iterator}. If \var{N} is omitted, the
571default is 2.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000572
573\begin{verbatim}
574>>> L = [1,2,3]
575>>> i1, i2 = itertools.tee(L)
576>>> i1,i2
577(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
Raymond Hettingered54d912003-12-31 01:59:18 +0000578>>> list(i1) # Run the first iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000579[1, 2, 3]
Raymond Hettingered54d912003-12-31 01:59:18 +0000580>>> list(i2) # Run the second iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000581[1, 2, 3]
582>\end{verbatim}
583
584Note that \function{tee()} has to keep copies of the values returned
Raymond Hettingered54d912003-12-31 01:59:18 +0000585by the iterator; in the worst case, it may need to keep all of them.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000586This should therefore be used carefully if the leading iterator
Raymond Hettingered54d912003-12-31 01:59:18 +0000587can run far ahead of the trailing iterator in a long stream of inputs.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000588If the separation is large, then it becomes preferable to use
Raymond Hettingered54d912003-12-31 01:59:18 +0000589\function{list()} instead. When the iterators track closely with one
590another, \function{tee()} is ideal. Possible applications include
591bookmarking, windowing, or lookahead iterators.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000592
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000593\item A new \function{getsid()} function was added to the
594\module{posix} module that underlies the \module{os} module.
595(Contributed by J. Raynor.)
596
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000597\item The \module{operator} module gained two new functions,
598\function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}.
599Both functions return callables that take a single argument and return
Raymond Hettingered54d912003-12-31 01:59:18 +0000600the corresponding attribute or item; these callables make excellent
601data extractors when used with \function{map()} or \function{sorted()}.
602For example:
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000603
604\begin{verbatim}
Raymond Hettingered54d912003-12-31 01:59:18 +0000605>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000606>>> map(operator.itemgetter(0), L)
607['c', 'd', 'a', 'b']
608>>> map(operator.itemgetter(1), L)
Raymond Hettingered54d912003-12-31 01:59:18 +0000609[2, 1, 4, 3]
610>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
611[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000612\end{verbatim}
613
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000614\item The \module{random} module has a new method called \method{getrandbits(N)}
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000615 which returns an N-bit long integer. This method supports the existing
616 \method{randrange()} method, making it possible to efficiently generate
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000617 arbitrarily large random numbers.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000618
619\item The regular expression language accepted by the \module{re} module
620 was extended with simple conditional expressions, written as
621 \code{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a
622 numeric group ID or a group name defined with \code{(?P<group>...)}
623 earlier in the expression. If the specified group matched, the
624 regular expression pattern \var{A} will be tested against the string; if
625 the group didn't match, the pattern \var{B} will be used instead.
Raymond Hettinger874ebd52004-05-31 03:15:02 +0000626
627\item The \module{weakref} module now supports a wider variety of objects
628 including Python functions, class instances, sets, frozensets, deques,
629 arrays, files, sockets, and regular expression pattern objects.
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000630
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000631\end{itemize}
632
633
634%======================================================================
635% whole new modules get described in \subsections here
636
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000637\subsection{cookielib}
638
639The \module{cookielib} library supports client-side handling for HTTP
640cookies, just as the \module{Cookie} provides server-side cookie
641support in CGI scripts. This library manages cookies in a way similar
642to web browsers. Cookies are stored in cookie jars; the library
643transparently stores cookies offered by the web server in the cookie
644jar, and fetches the cookie from the jar when connecting to the
645server. Similar to web browsers, policy objects control whether
646cookies are accepted or not.
647
648In order to store cookies across sessions, two implementations of
649cookie jars are provided: one that stores cookies in the Netscape
650format, so applications can use the Mozilla or Lynx cookie jars, and
651one that stores cookies in the same format as the Perl libwww libary.
652
653\module{urllib2} has been changed to interact with \module{cookielib}:
654\class{HTTPCookieProcessor} manages a cookie jar that is used when
655accessing URLs.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000656
657% ======================================================================
658\section{Build and C API Changes}
659
660Changes to Python's build process and to the C API include:
661
662\begin{itemize}
663
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000664 \item Three new convenience macros were added for common return
665 values from extension functions: \csimplemacro{Py_RETURN_NONE},
666 \csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}.
667
Fred Drakece3caf22004-02-12 18:13:12 +0000668 \item A new function, \cfunction{PyTuple_Pack(\var{N}, \var{obj1},
669 \var{obj2}, ..., \var{objN})}, constructs tuples from a variable
670 length argument list of Python objects.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000671
Fred Drakece3caf22004-02-12 18:13:12 +0000672 \item A new function, \cfunction{PyDict_Contains(\var{d}, \var{k})},
673 implements fast dictionary lookups without masking exceptions raised
674 during the look-up process.
Raymond Hettingerd4462302003-11-26 17:52:45 +0000675
Fred Drakece3caf22004-02-12 18:13:12 +0000676 \item A new method flag, \constant{METH_COEXISTS}, allows a function
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000677 defined in slots to co-exist with a PyCFunction having the same name.
678 This can halve the access to time to a method such as
679 \method{set.__contains__()}
680
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000681\end{itemize}
682
683
684%======================================================================
685\subsection{Port-Specific Changes}
686
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000687\begin{itemize}
688
689\item The Windows port now builds under MSVC++ 7.1 as well as version 6.
690
691\end{itemize}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000692
693
694%======================================================================
695\section{Other Changes and Fixes \label{section-other}}
696
697As usual, there were a bunch of other improvements and bugfixes
698scattered throughout the source tree. A search through the CVS change
699logs finds there were XXX patches applied and YYY bugs fixed between
700Python 2.3 and 2.4. Both figures are likely to be underestimates.
701
702Some of the more notable changes are:
703
704\begin{itemize}
705
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000706\item The \module{timeit} module now automatically disables periodic
707 garbarge collection during the timing loop. This change makes
708 consecutive timings more comparable.
709
710\item The \module{base64} module now has more complete RFC 3548 support
711 for Base64, Base32, and Base16 encoding and decoding, including
712 optional case folding and optional alternative alphabets.
713 (Contributed by Barry Warsaw.)
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000714
715\end{itemize}
716
717
718%======================================================================
719\section{Porting to Python 2.4}
720
721This section lists previously described changes that may require
722changes to your code:
723
724\begin{itemize}
725
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000726\item The \function{zip()} built-in function and \function{itertools.izip()}
727 now return an empty list instead of raising a \exception{TypeError}
728 exception if called with no arguments.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000729
730\item \function{dircache.listdir()} now passes exceptions to the caller
731 instead of returning empty lists.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000732
Fred Drake56fcc232004-05-06 02:55:35 +0000733\item \function{LexicalHandler.startDTD()} used to receive public and
734 system ID in the wrong order. This has been corrected; applications
735 relying on the wrong order need to be fixed.
Martin v. Löwis456ab1d2004-05-06 01:54:36 +0000736
Michael W. Hudson3151e182004-06-03 13:36:42 +0000737\item \function{fcntl.ioctl} now warns if the mutate arg is omitted
Guido van Rossum6dfed6c2004-06-03 13:56:05 +0000738 and relevant.
Martin v. Löwis77ca6c42004-06-03 12:47:26 +0000739
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000740\end{itemize}
741
742
743%======================================================================
744\section{Acknowledgements \label{acks}}
745
746The author would like to thank the following people for offering
747suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000748article: Raymond Hettinger.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000749
750\end{document}