blob: 6482cb055397de0a406ddcd59b32a28c5af2744c [file] [log] [blame]
Fred Drakeed0fa3d2003-07-30 19:14:09 +00001\documentclass{howto}
2\usepackage{distutils}
3% $Id$
4
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +00005% Don't write extensive text for new sections; I'll do that.
6% Feel free to add commented-out reminders of things that need
7% to be covered. --amk
8
Fred Drakeed0fa3d2003-07-30 19:14:09 +00009\title{What's New in Python 2.4}
10\release{0.0}
11\author{A.M.\ Kuchling}
Fred Drakeb914ef02004-01-02 06:57:50 +000012\authoraddress{
13 \strong{Python Software Foundation}\\
14 Email: \email{amk@amk.ca}
15}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000016
17\begin{document}
18\maketitle
19\tableofcontents
20
Raymond Hettinger6e1fd2f2004-05-19 22:30:25 +000021This article explains the new features in Python 2.4. The release
22date is expected to be around September 2004.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000023
24While Python 2.3 was primarily a library development release, Python
252.4 may extend the core language and interpreter in
26as-yet-undetermined ways.
27
28This article doesn't attempt to provide a complete specification of
29the new features, but instead provides a convenient overview. For
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000030full details, you should refer to the documentation for Python 2.4,
31such as the \citetitle[../lib/lib.html]{Python Library Reference} and
32the \citetitle[../ref/ref.html]{Python Reference Manual}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +000033If you want to understand the complete implementation and design
34rationale, refer to the PEP for a particular new feature.
35
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000036
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000037%======================================================================
38\section{PEP 218: Built-In Set Objects}
39
Fred Drake56fcc232004-05-06 02:55:35 +000040Two new built-in types, \function{set(\var{iterable})} and
41\function{frozenset(\var{iterable})} provide high speed data types for
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000042membership testing, for eliminating duplicates from sequences, and
43for mathematical operations like unions, intersections, differences,
44and symmetric differences.
45
46\begin{verbatim}
47>>> a = set('abracadabra') # form a set from a string
48>>> 'z' in a # fast membership testing
49False
50>>> a # unique letters in a
51set(['a', 'r', 'b', 'c', 'd'])
52>>> ''.join(a) # convert back into a string
53'arbcd'
Raymond Hettingerd4462302003-11-26 17:52:45 +000054
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000055>>> b = set('alacazam') # form a second set
56>>> a - b # letters in a but not in b
57set(['r', 'd', 'b'])
58>>> a | b # letters in either a or b
59set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
60>>> a & b # letters in both a and b
61set(['a', 'c'])
62>>> a ^ b # letters in a or b but not both
63set(['r', 'd', 'b', 'm', 'z', 'l'])
Raymond Hettingerd4462302003-11-26 17:52:45 +000064
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000065>>> a.add('z') # add a new element
66>>> a.update('wxy') # add multiple new elements
67>>> a
68set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
69>>> a.remove('x') # take one element out
70>>> a
71set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])
72\end{verbatim}
73
74The type \function{frozenset()} is an immutable version of \function{set()}.
75Since it is immutable and hashable, it may be used as a dictionary key or
76as a member of another set. Accordingly, it does not have methods
77like \method{add()} and \method{remove()} which could alter its contents.
78
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000079% XXX what happens to the sets module?
Raymond Hettingered54d912003-12-31 01:59:18 +000080% The current thinking is that the sets module will be left alone.
81% That way, existing code will continue to run without alteration.
82% Also, the module provides an autoconversion feature not supported by set()
83% and frozenset().
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000084
Raymond Hettinger7e0282f2003-11-24 07:14:54 +000085\begin{seealso}
86\seepep{218}{Adding a Built-In Set Object Type}{Originally proposed by
87Greg Wilson and ultimately implemented by Raymond Hettinger.}
88\end{seealso}
Fred Drakeed0fa3d2003-07-30 19:14:09 +000089
90%======================================================================
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +000091\section{PEP 237: Unifying Long Integers and Integers}
92
93XXX write this.
94
95%======================================================================
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +000096\section{PEP 289: Generator Expressions}
Raymond Hettinger354433a2004-05-19 08:20:33 +000097
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +000098The iterator feature introduced in Python 2.2 makes it easier to write
99programs that loop through large data sets without having the entire
100data set in memory at one time. Programmers can use iterators and the
101\module{itertools} module to write code in a fairly functional style.
102
103The fly in the ointment has been list comprehensions, because they
104produce a Python list object containing all of the items, unavoidably
105pulling them all into memory. When trying to write a program using the functional approach, it would be natural to write something like:
Raymond Hettinger354433a2004-05-19 08:20:33 +0000106
107\begin{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000108links = [link for link in get_all_links() if not link.followed]
109for link in links:
110 ...
Raymond Hettinger354433a2004-05-19 08:20:33 +0000111\end{verbatim}
112
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000113instead of
Raymond Hettinger354433a2004-05-19 08:20:33 +0000114
115\begin{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000116for link in get_all_links():
117 if link.followed:
118 continue
119 ...
120\end{verbatim}
Raymond Hettinger354433a2004-05-19 08:20:33 +0000121
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000122The first form is more concise and perhaps more readable, but if
123you're dealing with a large number of link objects the second form
124would have to be used.
Raymond Hettinger354433a2004-05-19 08:20:33 +0000125
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000126Generator expressions work similarly to list comprehensions but don't
127materialize the entire list; instead they create a generator that will
128return elements one by one. The above example could be written as:
Raymond Hettinger354433a2004-05-19 08:20:33 +0000129
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000130\begin{verbatim}
131links = (link for link in get_all_links() if not link.followed)
132for link in links:
133 ...
134\end{verbatim}
Raymond Hettinger170a6222004-05-19 19:45:19 +0000135
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000136Generator expressions always have to be written inside parentheses, as
137in the above example. The parentheses signalling a function call also
138count, so if you want to create a iterator that will be immediately
139passed to a function you could write:
Raymond Hettinger170a6222004-05-19 19:45:19 +0000140
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000141\begin{verbatim}
142print sum(obj.count for obj in list_all_objects())
143\end{verbatim}
Raymond Hettinger170a6222004-05-19 19:45:19 +0000144
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000145There are some small differences from list comprehensions. Most
146notably, the loop variable (\var{obj} in the above example) is not
147accessible outside of the generator expression. List comprehensions
148leave the variable assigned to its last value; future versions of
149Python will change this, making list comprehensions match generator
150expressions in this respect.
Raymond Hettinger354433a2004-05-19 08:20:33 +0000151
152\begin{seealso}
153\seepep{289}{Generator Expressions}{Proposed by Raymond Hettinger and
154implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.}
155\end{seealso}
156
157%======================================================================
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000158\section{PEP 322: Reverse Iteration}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000159
Fred Drake56fcc232004-05-06 02:55:35 +0000160A new built-in function, \function{reversed(\var{seq})}, takes a sequence
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000161and returns an iterator that returns the elements of the sequence
162in reverse order.
163
164\begin{verbatim}
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000165>>> for i in reversed(xrange(1,4)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000166... print i
167...
1683
1692
1701
171\end{verbatim}
172
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000173Compared to extended slicing, \code{range(1,4)[::-1]}, \function{reversed()}
174is easier to read, runs faster, and uses substantially less memory.
175
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000176Note that \function{reversed()} only accepts sequences, not arbitrary
Raymond Hettingerbc3cba22003-11-12 16:39:30 +0000177iterators. If you want to reverse an iterator, first convert it to
178a list with \function{list()}.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000179
180\begin{verbatim}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000181>>> input= open('/etc/passwd', 'r')
182>>> for line in reversed(list(input)):
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000183... print line
184...
185root:*:0:0:System Administrator:/var/root:/bin/tcsh
186 ...
187\end{verbatim}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000188
Andrew M. Kuchlingf7a6b672003-11-08 16:05:37 +0000189\begin{seealso}
190\seepep{322}{Reverse Iteration}{Written and implemented by Raymond Hettinger.}
191
192\end{seealso}
193
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000194
195%======================================================================
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000196\section{PEP 327: Decimal Data Type}
197
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000198Python has always supported floating-point (FP) numbers as a data
199type, based on the underlying C \ctype{double} type. However, while
200most programming languages provide a floating-point type, most people
201(even programmers) are unaware that computing with floating-point
202numbers entails certain unavoidable inaccuracies. The new decimal
203type provides a way to avoid these inaccuracies.
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000204
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000205\subsection{Why is Decimal needed?}
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000206
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000207The limitations arise from the representation used for floating-point numbers.
208FP numbers are made up of three components:
209
210\begin{itemize}
211\item The sign, which is -1 or +1.
212\item The mantissa, which is a single-digit binary number
213followed by a fractional part. For example, \code{1.01} in base-2 notation
214is \code{1 + 0/2 + 1/4}, or 1.25 in decimal notation.
215\item The exponent, which tells where the decimal point is located in the number represented.
216\end{itemize}
217
218For example, the number 1.25 has sign +1, mantissa 1.01 (in binary),
219and exponent of 0 (the decimal point doesn't need to be shifted). The
220number 5 has the same sign and mantissa, but the exponent is 2
221because the mantissa is multiplied by 4 (2 to the power of the exponent 2).
222
223Modern systems usually provide floating-point support that conforms to
224a relevant standard called IEEE 754. C's \ctype{double} type is
225usually implemented as a 64-bit IEEE 754 number, which uses 52 bits of
226space for the mantissa. This means that numbers can only be specified
227to 52 bits of precision. If you're trying to represent numbers whose
228expansion repeats endlessly, the expansion is cut off after 52 bits.
229Unfortunately, most software needs to produce output in base 10, and
230base 10 often gives rise to such repeating decimals. For example, 1.1
231decimal is binary \code{1.0001100110011 ...}; .1 = 1/16 + 1/32 + 1/256
232plus an infinite number of additional terms. IEEE 754 has to chop off
233that infinitely repeated decimal after 52 digits, so the
234representation is slightly inaccurate.
235
236Sometimes you can see this inaccuracy when the number is printed:
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000237\begin{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000238>>> 1.1
2391.1000000000000001
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000240\end{verbatim}
241
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000242The inaccuracy isn't always visible when you print the number because
243the FP-to-decimal-string conversion is provided by the C library, and
244most C libraries try to produce sensible output, but the inaccuracy is
245still there and subsequent operations can magnify the error.
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000246
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000247For many applications this doesn't matter. If I'm plotting points and
248displaying them on my monitor, the difference between 1.1 and
2491.1000000000000001 is too small to be visible. Reports often limit
250output to a certain number of decimal places, and if you round the
251number to two or three or even eight decimal places, the error is
252never apparent. However, for applications where it does matter,
253it's a lot of work to implement your own custom arithmetic routines.
254
255\subsection{The \class{Decimal} type}
256
257A new module, \module{decimal}, was added to Python's standard library.
258It contains two classes, \class{Decimal} and \class{Context}.
259\class{Decimal} instances represent numbers, and
260\class{Context} instances are used to wrap up various settings such as the precision and default rounding mode.
261
262\class{Decimal} instances, like regular Python integers and FP numbers, are immutable; once they've been created, you can't change the value it represents.
263\class{Decimal} instances can be created from integers or strings:
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000264
265\begin{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000266>>> import decimal
267>>> decimal.Decimal(1972)
268Decimal("1972")
269>>> decimal.Decimal("1.1")
270Decimal("1.1")
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000271\end{verbatim}
272
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000273You can also provide tuples containing the sign, mantissa represented
274as a tuple of decimal digits, and exponent:
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000275
276\begin{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000277>>> decimal.Decimal((1, (1, 4, 7, 5), -2))
278Decimal("-14.75")
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000279\end{verbatim}
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000280
281Cautionary note: the sign bit is a Boolean value, so 0 is positive and 1 is negative.
282
283Floating-point numbers posed a bit of a problem: should the FP number
284representing 1.1 turn into the decimal number for exactly 1.1, or for
2851.1 plus whatever inaccuracies are introduced? The decision was to
286leave such a conversion out of the API. Instead, you should convert
287the floating-point number into a string using the desired precision and
288pass the string to the \class{Decimal} constructor:
289
290\begin{verbatim}
291>>> f = 1.1
292>>> decimal.Decimal(str(f))
293Decimal("1.1")
294>>> decimal.Decimal(repr(f))
295Decimal("1.1000000000000001")
296\end{verbatim}
297
298Once you have \class{Decimal} instances, you can perform the usual
299mathematical operations on them. One limitation: exponentiation
300requires an integer exponent:
301
302\begin{verbatim}
303>>> a = decimal.Decimal('35.72')
304>>> b = decimal.Decimal('1.73')
305>>> a+b
306Decimal("37.45")
307>>> a-b
308Decimal("33.99")
309>>> a*b
310Decimal("61.7956")
311>>> a/b
312Decimal("20.6473988")
313>>> a ** 2
314Decimal("1275.9184")
315>>> a ** b
316Decimal("NaN")
317\end{verbatim}
318
319You can combine \class{Decimal} instances with integers, but not with
320floating-point numbers:
321
322\begin{verbatim}
323>>> a + 4
324Decimal("39.72")
325>>> a + 4.5
326Traceback (most recent call last):
327 ...
328TypeError: You can interact Decimal only with int, long or Decimal data types.
329>>>
330\end{verbatim}
331
332\class{Decimal} numbers can be used with the \module{math} and
333\module{cmath} modules, though you'll get back a regular
334floating-point number and not a \class{Decimal}. Instances also have a \method{sqrt()} method:
335
336\begin{verbatim}
337>>> import math, cmath
338>>> d = decimal.Decimal('123456789012.345')
339>>> math.sqrt(d)
340351364.18288201344
341>>> cmath.sqrt(-d)
342351364.18288201344j
343>>> d.sqrt()
344Decimal(``351364.1828820134592177245001'')
345\end{verbatim}
346
347
348\subsection{The \class{Context} type}
349
350Instances of the \class{Context} class encapsulate several settings for
351decimal operations:
352
353\begin{itemize}
354 \item \member{prec} is the precision, the number of decimal places.
355 \item \member{rounding} specifies the rounding mode. The \module{decimal}
356 module has constants for the various possibilities:
357 \constant{ROUND_DOWN}, \constant{ROUND_CEILING}, \constant{ROUND_HALF_EVEN}, and various others.
358 \item \member{trap_enablers} is a dictionary specifying what happens on
359encountering certain error conditions: either an exception is raised or
360a value is returned. Some examples of error conditions are
361division by zero, loss of precision, and overflow.
362\end{itemize}
363
364There's a thread-local default context available by calling
365\function{getcontext()}; you can change the properties of this context
366to alter the default precision, rounding, or trap handling.
367
368\begin{verbatim}
369>>> decimal.getcontext().prec
37028
371>>> decimal.Decimal(1) / decimal.Decimal(7)
372Decimal(``0.1428571428571428571428571429'')
373>>> decimal.getcontext().prec = 9
374>>> decimal.Decimal(1) / decimal.Decimal(7)
375Decimal(``0.142857143'')
376\end{verbatim}
377
378The default action for error conditions is to return a special value
379such as infinity or not-a-number, but you can request that exceptions
380be raised:
381
382\begin{verbatim}
383>>> decimal.Decimal(1) / decimal.Decimal(0)
384Decimal(``Infinity'')
385>>> decimal.getcontext().trap_enablers[decimal.DivisionByZero] = True
386>>> decimal.Decimal(1) / decimal.Decimal(0)
387Traceback (most recent call last):
388 ...
389decimal.DivisionByZero: x / 0
390>>>
391\end{verbatim}
392
393The \class{Context} instance also has various methods for formatting
394numbers such as \method{to_eng_string()} and \method{to_sci_string()}.
395
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000396
397\begin{seealso}
398\seepep{327}{Decimal Data Type}{Written by Facundo Batista and implemented
Andrew M. Kuchlingc8f8a812004-07-04 01:26:42 +0000399 by Facundo Batista, Eric Price, Raymond Hettinger, Aahz, and Tim Peters.}
400
401\seeurl{http://research.microsoft.com/~hollasch/cgindex/coding/ieeefloat.html}
402{A more detailed overview of the IEEE-754 representation.}
403
404\seeurl{http://www.lahey.com/float.htm}
405{The article uses Fortran code to illustrate many of the problems
406that floating-point inaccuracy can cause.}
407
408\seeurl{http://www2.hursley.ibm.com/decimal/}
409{A description of a decimal-based representation. This representation
410is being proposed as a standard, and underlies the new Python decimal
411type. Much of this material was written by Mike Cowlishaw, designer of the
412REXX language.}
413
Raymond Hettinger0fff62f2004-07-01 11:52:15 +0000414\end{seealso}
415
416
417%======================================================================
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000418\section{Other Language Changes}
419
420Here are all of the changes that Python 2.4 makes to the core Python
421language.
422
423\begin{itemize}
Raymond Hettingerd4462302003-11-26 17:52:45 +0000424
Raymond Hettinger31017ae2004-03-04 08:25:44 +0000425\item The \method{dict.update()} method now accepts the same
426argument forms as the \class{dict} constructor. This includes any
427mapping, any iterable of key/value pairs, and/or keyword arguments.
428
Raymond Hettingerd4462302003-11-26 17:52:45 +0000429\item The string methods, \method{ljust()}, \method{rjust()}, and
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000430\method{center()} now take an optional argument for specifying a
Raymond Hettingerd4462302003-11-26 17:52:45 +0000431fill character other than a space.
432
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000433\item Strings also gained an \method{rsplit()} method that
Raymond Hettingered54d912003-12-31 01:59:18 +0000434works like the \method{split()} method but splits from the end of
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000435the string.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000436
437\begin{verbatim}
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000438>>> 'www.python.org'.split('.', 1)
439['www', 'python.org']
440'www.python.org'.rsplit('.', 1)
441['www.python', 'org']
442\end{verbatim}
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000443
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000444\item The \method{sort()} method of lists gained three keyword
445arguments, \var{cmp}, \var{key}, and \var{reverse}. These arguments
446make some common usages of \method{sort()} simpler. All are optional.
447
448\var{cmp} is the same as the previous single argument to
449\method{sort()}; if provided, the value should be a comparison
450function that takes two arguments and returns -1, 0, or +1 depending
451on how the arguments compare.
452
453\var{key} should be a single-argument function that takes a list
454element and returns a comparison key for the element. The list is
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000455then sorted using the comparison keys. The following example sorts a
456list case-insensitively:
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000457
458\begin{verbatim}
459>>> L = ['A', 'b', 'c', 'D']
460>>> L.sort() # Case-sensitive sort
461>>> L
462['A', 'D', 'b', 'c']
463>>> L.sort(key=lambda x: x.lower())
464>>> L
465['A', 'b', 'c', 'D']
466>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower()))
467>>> L
468['A', 'b', 'c', 'D']
469\end{verbatim}
470
471The last example, which uses the \var{cmp} parameter, is the old way
Raymond Hettingered54d912003-12-31 01:59:18 +0000472to perform a case-insensitive sort. It works but is slower than
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000473using a \var{key} parameter. Using \var{key} results in calling the
474\method{lower()} method once for each element in the list while using
475\var{cmp} will call the method twice for each comparison.
476
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000477For simple key functions and comparison functions, it is often
478possible to avoid a \keyword{lambda} expression by using an unbound
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000479method instead. For example, the above case-insensitive sort is best
480coded as:
481
482\begin{verbatim}
483>>> L.sort(key=str.lower)
484>>> L
485['A', 'b', 'c', 'D']
486\end{verbatim}
487
Andrew M. Kuchling2fb4d512003-10-21 12:31:16 +0000488The \var{reverse} parameter should have a Boolean value. If the value is
489\constant{True}, the list will be sorted into reverse order. Instead
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000490of \code{L.sort(lambda x,y: cmp(y.score, x.score))}, you can now write:
491\code{L.sort(key = lambda x: x.score, reverse=True)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000492
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000493The results of sorting are now guaranteed to be stable. This means
494that two entries with equal keys will be returned in the same order as
495they were input. For example, you can sort a list of people by name,
496and then sort the list by age, resulting in a list sorted by age where
497people with the same age are in name-sorted order.
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000498
Fred Drake56fcc232004-05-06 02:55:35 +0000499\item There is a new built-in function
500\function{sorted(\var{iterable})} that works like the in-place
501\method{list.sort()} method but has been made suitable for use in
502expressions. The differences are:
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000503 \begin{itemize}
Raymond Hettinger7d1dd042003-11-12 16:42:10 +0000504 \item the input may be any iterable;
505 \item a newly formed copy is sorted, leaving the original intact; and
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000506 \item the expression returns the new sorted copy
507 \end{itemize}
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000508
509\begin{verbatim}
510>>> L = [9,7,8,3,2,4,1,6,5]
Raymond Hettinger64958a12003-12-17 20:43:33 +0000511>>> [10+i for i in sorted(L)] # usable in a list comprehension
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000512[11, 12, 13, 14, 15, 16, 17, 18, 19]
513>>> L = [9,7,8,3,2,4,1,6,5] # original is left unchanged
514[9,7,8,3,2,4,1,6,5]
Raymond Hettingerd4462302003-11-26 17:52:45 +0000515
Raymond Hettinger64958a12003-12-17 20:43:33 +0000516>>> sorted('Monte Python') # any iterable may be an input
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000517[' ', 'M', 'P', 'e', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y']
Raymond Hettingerd4462302003-11-26 17:52:45 +0000518
519>>> # List the contents of a dict sorted by key values
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000520>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5)
Raymond Hettinger64958a12003-12-17 20:43:33 +0000521>>> for k, v in sorted(colormap.iteritems()):
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000522... print k, v
523...
524black 4
525blue 2
526green 3
527red 1
528yellow 5
529
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000530\end{verbatim}
531
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000532\item The \function{zip()} built-in function and \function{itertools.izip()}
Andrew M. Kuchling67087562003-11-26 18:03:48 +0000533 now return an empty list instead of raising a \exception{TypeError}
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000534 exception if called with no arguments. This makes them more
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000535 suitable for use with variable length argument lists:
536
537\begin{verbatim}
538>>> def transpose(array):
539... return zip(*array)
540...
541>>> transpose([(1,2,3), (4,5,6)])
542[(1, 4), (2, 5), (3, 6)]
543>>> transpose([])
544[]
545\end{verbatim}
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000546
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000547\end{itemize}
548
549
550%======================================================================
551\subsection{Optimizations}
552
553\begin{itemize}
554
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000555\item The inner loops for \class{list} and \class{tuple} slicing
Raymond Hettingerade08ea2004-03-18 09:48:12 +0000556 were optimized and now run about one-third faster. The inner
557 loops were also optimized for \class{dict} with performance
558 boosts to \method{keys()}, \method{values()}, \method{items()},
Fred Drake9de0a2b2004-03-20 08:13:32 +0000559\method{iterkeys()}, \method{itervalues()}, and \method{iteritems()}.
Raymond Hettingerb7d05db2004-03-08 07:25:05 +0000560
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000561\item The machinery for growing and shrinking lists was optimized
Raymond Hettingerab517d22004-02-14 18:34:46 +0000562 for speed and for space efficiency. Small lists (under eight elements)
563 never over-allocate by more than three elements. Large lists do not
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000564 over-allocate by more than 1/8th. Appending and popping from lists
565 now runs faster due to more efficient code paths and less frequent
566 use of the underlying system realloc(). List comprehensions also
567 benefit. The amount of improvement varies between systems and shows
568 the greatest improvement on systems with poor realloc() implementations.
Raymond Hettinger79b5cf12004-02-17 10:46:32 +0000569 \method{list.extend()} was also optimized and no longer converts its
570 argument into a temporary list prior to extending the base list.
Raymond Hettinger7a6d2972004-02-13 19:00:07 +0000571
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000572\item \function{list()}, \function{tuple()}, \function{map()},
573 \function{filter()}, and \function{zip()} now run several times
574 faster with non-sequence arguments that supply a \method{__len__()}
575 method. Previously, the pre-sizing optimization only applied to
576 sequence arguments.
577
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000578\item The methods \method{list.__getitem__()},
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000579 \method{dict.__getitem__()}, and \method{dict.__contains__()} are
580 are now implemented as \class{method_descriptor} objects rather
581 than \class{wrapper_descriptor} objects. This form of optimized
582 access doubles their performance and makes them more suitable for
Raymond Hettinger23a0f4e2004-01-05 08:15:20 +0000583 use as arguments to functionals:
584 \samp{map(mydict.__getitem__, keylist)}.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000585
Fred Draked6d35d92004-06-03 13:31:22 +0000586\item Added a new opcode, \code{LIST_APPEND}, that simplifies
Raymond Hettingerdd80f762004-03-07 07:31:06 +0000587 the generated bytecode for list comprehensions and speeds them up
588 by about a third.
589
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000590\end{itemize}
591
592The net result of the 2.4 optimizations is that Python 2.4 runs the
593pystone benchmark around XX\% faster than Python 2.3 and YY\% faster
594than Python 2.2.
595
596
597%======================================================================
598\section{New, Improved, and Deprecated Modules}
599
600As usual, Python's standard library received a number of enhancements and
601bug fixes. Here's a partial list of the most notable changes, sorted
602alphabetically by module name. Consult the
603\file{Misc/NEWS} file in the source tree for a more
604complete list of changes, or look through the CVS logs for all the
605details.
606
607\begin{itemize}
608
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000609\item The \module{curses} modules now supports the ncurses extension
Fred Draked6d35d92004-06-03 13:31:22 +0000610 \function{use_default_colors()}. On platforms where the terminal
611 supports transparency, this makes it possible to use a transparent
612 background. (Contributed by J\"org Lehmann.)
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000613
Raymond Hettinger0c410272004-01-05 10:13:35 +0000614\item The \module{bisect} module now has an underlying C implementation
615 for improved performance.
616 (Contributed by Dmitry Vasiliev.)
617
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000618\item The CJKCodecs collections of East Asian codecs, maintained
619by Hye-Shik Chang, was integrated into 2.4.
620The new encodings are:
621
622\begin{itemize}
623 \item Chinese (PRC): gb2312, gbk, gb18030, hz
624 \item Chinese (ROC): big5, cp950
625 \item Japanese: cp932, shift-jis, shift-jisx0213, euc-jp,
626euc-jisx0213, iso-2022-jp, iso-2022-jp-1, iso-2022-jp-2,
627 iso-2022-jp-3, iso-2022-jp-ext
628 \item Korean: cp949, euc-kr, johab, iso-2022-kr
629\end{itemize}
630
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000631\item There is a new \module{collections} module for
632 various specialized collection datatypes.
633 Currently it contains just one type, \class{deque},
634 a double-ended queue that supports efficiently adding and removing
635 elements from either end.
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000636
637\begin{verbatim}
638>>> from collections import deque
639>>> d = deque('ghi') # make a new deque with three items
640>>> d.append('j') # add a new entry to the right side
641>>> d.appendleft('f') # add a new entry to the left side
642>>> d # show the representation of the deque
643deque(['f', 'g', 'h', 'i', 'j'])
644>>> d.pop() # return and remove the rightmost item
645'j'
646>>> d.popleft() # return and remove the leftmost item
647'f'
648>>> list(d) # list the contents of the deque
649['g', 'h', 'i']
650>>> 'h' in d # search the deque
651True
652\end{verbatim}
653
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000654Several modules now take advantage of \class{collections.deque} for
Raymond Hettinger756b3f32004-01-29 06:37:52 +0000655improved performance: \module{Queue}, \module{mutex}, \module{shlex}
656\module{threading}, and \module{pydoc}.
Andrew M. Kuchling5303a962004-01-18 15:55:51 +0000657
Fred Drake9f15b5c2004-05-18 04:30:00 +0000658\item The \module{ConfigParser} classes have been enhanced slightly.
659 The \method{read()} method now returns a list of the files that
660 were successfully parsed, and the \method{set()} method raises
661 \exception{TypeError} if passed a \var{value} argument that isn't a
662 string.
663
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000664\item The \module{heapq} module has been converted to C. The resulting
Andrew M. Kuchlingfd0e4942004-02-09 13:23:34 +0000665 tenfold improvement in speed makes the module suitable for handling
Raymond Hettinger33ecffb2004-06-10 05:03:17 +0000666 high volumes of data. In addition, the module has two new functions
667 \function{nlargest()} and \function{nsmallest()} that use heaps to
668 find the largest or smallest n values in a dataset without the
669 expense of a full sort.
Andrew M. Kuchling1a420252003-11-08 15:58:49 +0000670
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000671\item The \module{imaplib} module now supports IMAP's THREAD command.
672(Contributed by Yves Dionne.)
673
Andrew M. Kuchlingad809552003-12-06 23:19:23 +0000674\item The \module{itertools} module gained a
675 \function{groupby(\var{iterable}\optional{, \var{func}})} function,
676 inspired by the GROUP BY clause from SQL.
677 \var{iterable} returns a succession of elements, and the optional
678 \var{func} is a function that takes an element and returns a key
679 value; if omitted, the key is simply the element itself.
680 \function{groupby()} then groups the elements into subsequences
681 which have matching values of the key, and returns a series of 2-tuples
682 containing the key value and an iterator over the subsequence.
683
684Here's an example. The \var{key} function simply returns whether a
685number is even or odd, so the result of \function{groupby()} is to
686return consecutive runs of odd or even numbers.
687
688\begin{verbatim}
689>>> import itertools
690>>> L = [2,4,6, 7,8,9,11, 12, 14]
691>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
692... print key_val, list(it)
693...
6940 [2, 4, 6]
6951 [7]
6960 [8]
6971 [9, 11]
6980 [12, 14]
699>>>
700\end{verbatim}
701
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000702Like its SQL counterpart, \function{groupby()} is typically used with
703sorted input. The logic for \function{groupby()} is similar to the
704\UNIX{} \code{uniq} filter which makes it handy for eliminating,
705counting, or identifying duplicate elements:
706
707\begin{verbatim}
708>>> word = 'abracadabra'
Raymond Hettingered54d912003-12-31 01:59:18 +0000709>>> letters = sorted(word) # Turn string into a sorted list of letters
Raymond Hettinger64958a12003-12-17 20:43:33 +0000710>>> letters
Andrew M. Kuchling4612bc52003-12-16 20:59:37 +0000711['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000712>>> [k for k, g in groupby(letters)] # List unique letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000713['a', 'b', 'c', 'd', 'r']
Raymond Hettingered54d912003-12-31 01:59:18 +0000714>>> [(k, len(list(g))) for k, g in groupby(letters)] # Count letter occurences
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000715[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
Raymond Hettingered54d912003-12-31 01:59:18 +0000716>>> [k for k, g in groupby(letters) if len(list(g)) > 1] # List duplicated letters
Raymond Hettingerfeb78c92003-12-12 13:13:47 +0000717['a', 'b', 'r']
718\end{verbatim}
719
Raymond Hettingered54d912003-12-31 01:59:18 +0000720\item \module{itertools} also gained a function named
721\function{tee(\var{iterator}, \var{N})} that returns \var{N} independent
722iterators that replicate \var{iterator}. If \var{N} is omitted, the
723default is 2.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000724
725\begin{verbatim}
726>>> L = [1,2,3]
727>>> i1, i2 = itertools.tee(L)
728>>> i1,i2
729(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
Raymond Hettingered54d912003-12-31 01:59:18 +0000730>>> list(i1) # Run the first iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000731[1, 2, 3]
Raymond Hettingered54d912003-12-31 01:59:18 +0000732>>> list(i2) # Run the second iterator to exhaustion
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000733[1, 2, 3]
734>\end{verbatim}
735
736Note that \function{tee()} has to keep copies of the values returned
Raymond Hettingered54d912003-12-31 01:59:18 +0000737by the iterator; in the worst case, it may need to keep all of them.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000738This should therefore be used carefully if the leading iterator
Raymond Hettingered54d912003-12-31 01:59:18 +0000739can run far ahead of the trailing iterator in a long stream of inputs.
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000740If the separation is large, then it becomes preferable to use
Raymond Hettingered54d912003-12-31 01:59:18 +0000741\function{list()} instead. When the iterators track closely with one
742another, \function{tee()} is ideal. Possible applications include
743bookmarking, windowing, or lookahead iterators.
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000744
Andrew M. Kuchlingdff9dbd2003-11-20 22:22:19 +0000745\item A new \function{getsid()} function was added to the
746\module{posix} module that underlies the \module{os} module.
747(Contributed by J. Raynor.)
748
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000749\item The \module{operator} module gained two new functions,
750\function{attrgetter(\var{attr})} and \function{itemgetter(\var{index})}.
751Both functions return callables that take a single argument and return
Raymond Hettingered54d912003-12-31 01:59:18 +0000752the corresponding attribute or item; these callables make excellent
753data extractors when used with \function{map()} or \function{sorted()}.
754For example:
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000755
756\begin{verbatim}
Raymond Hettingered54d912003-12-31 01:59:18 +0000757>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000758>>> map(operator.itemgetter(0), L)
759['c', 'd', 'a', 'b']
760>>> map(operator.itemgetter(1), L)
Raymond Hettingered54d912003-12-31 01:59:18 +0000761[2, 1, 4, 3]
762>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
763[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
Andrew M. Kuchling35f2b052003-12-18 13:28:13 +0000764\end{verbatim}
765
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000766\item The \module{random} module has a new method called \method{getrandbits(N)}
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000767 which returns an N-bit long integer. This method supports the existing
768 \method{randrange()} method, making it possible to efficiently generate
Andrew M. Kuchling44a31e12004-01-01 18:33:34 +0000769 arbitrarily large random numbers.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000770
771\item The regular expression language accepted by the \module{re} module
772 was extended with simple conditional expressions, written as
773 \code{(?(\var{group})\var{A}|\var{B})}. \var{group} is either a
774 numeric group ID or a group name defined with \code{(?P<group>...)}
775 earlier in the expression. If the specified group matched, the
776 regular expression pattern \var{A} will be tested against the string; if
777 the group didn't match, the pattern \var{B} will be used instead.
Raymond Hettinger874ebd52004-05-31 03:15:02 +0000778
779\item The \module{weakref} module now supports a wider variety of objects
780 including Python functions, class instances, sets, frozensets, deques,
781 arrays, files, sockets, and regular expression pattern objects.
Andrew M. Kuchling69f31eb2003-08-13 23:11:04 +0000782
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000783\end{itemize}
784
785
786%======================================================================
787% whole new modules get described in \subsections here
788
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000789\subsection{cookielib}
790
791The \module{cookielib} library supports client-side handling for HTTP
792cookies, just as the \module{Cookie} provides server-side cookie
793support in CGI scripts. This library manages cookies in a way similar
794to web browsers. Cookies are stored in cookie jars; the library
795transparently stores cookies offered by the web server in the cookie
796jar, and fetches the cookie from the jar when connecting to the
797server. Similar to web browsers, policy objects control whether
798cookies are accepted or not.
799
800In order to store cookies across sessions, two implementations of
801cookie jars are provided: one that stores cookies in the Netscape
802format, so applications can use the Mozilla or Lynx cookie jars, and
803one that stores cookies in the same format as the Perl libwww libary.
804
805\module{urllib2} has been changed to interact with \module{cookielib}:
806\class{HTTPCookieProcessor} manages a cookie jar that is used when
807accessing URLs.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000808
809% ======================================================================
810\section{Build and C API Changes}
811
812Changes to Python's build process and to the C API include:
813
814\begin{itemize}
815
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000816 \item Three new convenience macros were added for common return
817 values from extension functions: \csimplemacro{Py_RETURN_NONE},
818 \csimplemacro{Py_RETURN_TRUE}, and \csimplemacro{Py_RETURN_FALSE}.
819
Fred Drakece3caf22004-02-12 18:13:12 +0000820 \item A new function, \cfunction{PyTuple_Pack(\var{N}, \var{obj1},
821 \var{obj2}, ..., \var{objN})}, constructs tuples from a variable
822 length argument list of Python objects.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000823
Fred Drakece3caf22004-02-12 18:13:12 +0000824 \item A new function, \cfunction{PyDict_Contains(\var{d}, \var{k})},
825 implements fast dictionary lookups without masking exceptions raised
826 during the look-up process.
Raymond Hettingerd4462302003-11-26 17:52:45 +0000827
Fred Drakece3caf22004-02-12 18:13:12 +0000828 \item A new method flag, \constant{METH_COEXISTS}, allows a function
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000829 defined in slots to co-exist with a PyCFunction having the same name.
830 This can halve the access to time to a method such as
831 \method{set.__contains__()}
832
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000833\end{itemize}
834
835
836%======================================================================
837\subsection{Port-Specific Changes}
838
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000839\begin{itemize}
840
841\item The Windows port now builds under MSVC++ 7.1 as well as version 6.
842
843\end{itemize}
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000844
845
846%======================================================================
847\section{Other Changes and Fixes \label{section-other}}
848
849As usual, there were a bunch of other improvements and bugfixes
850scattered throughout the source tree. A search through the CVS change
851logs finds there were XXX patches applied and YYY bugs fixed between
852Python 2.3 and 2.4. Both figures are likely to be underestimates.
853
854Some of the more notable changes are:
855
856\begin{itemize}
857
Raymond Hettinger97ef8de2004-01-05 00:29:57 +0000858\item The \module{timeit} module now automatically disables periodic
859 garbarge collection during the timing loop. This change makes
860 consecutive timings more comparable.
861
862\item The \module{base64} module now has more complete RFC 3548 support
863 for Base64, Base32, and Base16 encoding and decoding, including
864 optional case folding and optional alternative alphabets.
865 (Contributed by Barry Warsaw.)
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000866
867\end{itemize}
868
869
870%======================================================================
871\section{Porting to Python 2.4}
872
873This section lists previously described changes that may require
874changes to your code:
875
876\begin{itemize}
877
Raymond Hettinger607c00f2003-11-12 16:27:50 +0000878\item The \function{zip()} built-in function and \function{itertools.izip()}
879 now return an empty list instead of raising a \exception{TypeError}
880 exception if called with no arguments.
Andrew M. Kuchling6aedcfc2003-10-21 12:48:23 +0000881
882\item \function{dircache.listdir()} now passes exceptions to the caller
883 instead of returning empty lists.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000884
Fred Drake56fcc232004-05-06 02:55:35 +0000885\item \function{LexicalHandler.startDTD()} used to receive public and
886 system ID in the wrong order. This has been corrected; applications
887 relying on the wrong order need to be fixed.
Martin v. Löwis456ab1d2004-05-06 01:54:36 +0000888
Michael W. Hudson3151e182004-06-03 13:36:42 +0000889\item \function{fcntl.ioctl} now warns if the mutate arg is omitted
Guido van Rossum6dfed6c2004-06-03 13:56:05 +0000890 and relevant.
Martin v. Löwis77ca6c42004-06-03 12:47:26 +0000891
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000892\end{itemize}
893
894
895%======================================================================
896\section{Acknowledgements \label{acks}}
897
898The author would like to thank the following people for offering
899suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchling981a9182003-11-13 21:33:26 +0000900article: Raymond Hettinger.
Fred Drakeed0fa3d2003-07-30 19:14:09 +0000901
902\end{document}