blob: 244d7ca87eca820a165c80b8b70277ed9a6af9a0 [file] [log] [blame]
Fred Drake03e10312002-03-26 19:17:43 +00001\documentclass{howto}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002% $Id$
3
4\title{What's New in Python 2.3}
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00005\release{0.03}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00006\author{A.M. Kuchling}
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00007\authoraddress{\email{amk@amk.ca}}
Fred Drake03e10312002-03-26 19:17:43 +00008
9\begin{document}
10\maketitle
11\tableofcontents
12
Andrew M. Kuchlingf70a0a82002-06-10 13:22:46 +000013% Optik (or whatever it gets called)
14%
Andrew M. Kuchlingc61ec522002-08-04 01:20:05 +000015% MacOS framework-related changes (section of its own, probably)
16%
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +000017% xreadlines obsolete; files are their own iterator
Andrew M. Kuchlingf70a0a82002-06-10 13:22:46 +000018
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000019%\section{Introduction \label{intro}}
20
21{\large This article is a draft, and is currently up to date for some
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +000022random version of the CVS tree from early November 2002. Please send any
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000023additions, comments or errata to the author.}
24
25This article explains the new features in Python 2.3. The tentative
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +000026release date of Python 2.3 is currently scheduled for some undefined
27time before the end of 2002.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000028
29This article doesn't attempt to provide a complete specification of
30the new features, but instead provides a convenient overview. For
31full details, you should refer to the documentation for Python 2.3,
32such as the
33\citetitle[http://www.python.org/doc/2.3/lib/lib.html]{Python Library
34Reference} and the
35\citetitle[http://www.python.org/doc/2.3/ref/ref.html]{Python
36Reference Manual}. If you want to understand the complete
37implementation and design rationale for a change, refer to the PEP for
38a particular new feature.
Fred Drake03e10312002-03-26 19:17:43 +000039
40
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000041%======================================================================
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000042\section{PEP 218: A Standard Set Datatype}
43
44The new \module{sets} module contains an implementation of a set
45datatype. The \class{Set} class is for mutable sets, sets that can
46have members added and removed. The \class{ImmutableSet} class is for
47sets that can't be modified, and can be used as dictionary keys. Sets
48are built on top of dictionaries, so the elements within a set must be
49hashable.
50
51As a simple example,
52
53\begin{verbatim}
54>>> import sets
55>>> S = sets.Set([1,2,3])
56>>> S
57Set([1, 2, 3])
58>>> 1 in S
59True
60>>> 0 in S
61False
62>>> S.add(5)
63>>> S.remove(3)
64>>> S
65Set([1, 2, 5])
66>>>
67\end{verbatim}
68
69The union and intersection of sets can be computed with the
70\method{union()} and \method{intersection()} methods, or,
71alternatively, using the bitwise operators \samp{\&} and \samp{|}.
72Mutable sets also have in-place versions of these methods,
73\method{union_update()} and \method{intersection_update()}.
74
75\begin{verbatim}
76>>> S1 = sets.Set([1,2,3])
77>>> S2 = sets.Set([4,5,6])
78>>> S1.union(S2)
79Set([1, 2, 3, 4, 5, 6])
80>>> S1 | S2 # Alternative notation
81Set([1, 2, 3, 4, 5, 6])
82>>> S1.intersection(S2)
83Set([])
84>>> S1 & S2 # Alternative notation
85Set([])
86>>> S1.union_update(S2)
87Set([1, 2, 3, 4, 5, 6])
88>>> S1
89Set([1, 2, 3, 4, 5, 6])
90>>>
91\end{verbatim}
92
93It's also possible to take the symmetric difference of two sets. This
94is the set of all elements in the union that aren't in the
95intersection. An alternative way of expressing the symmetric
96difference is that it contains all elements that are in exactly one
97set. Again, there's an in-place version, with the ungainly name
98\method{symmetric_difference_update()}.
99
100\begin{verbatim}
101>>> S1 = sets.Set([1,2,3,4])
102>>> S2 = sets.Set([3,4,5,6])
103>>> S1.symmetric_difference(S2)
104Set([1, 2, 5, 6])
105>>> S1 ^ S2
106Set([1, 2, 5, 6])
107>>>
108\end{verbatim}
109
110There are also methods, \method{issubset()} and \method{issuperset()},
111for checking whether one set is a strict subset or superset of
112another:
113
114\begin{verbatim}
115>>> S1 = sets.Set([1,2,3])
116>>> S2 = sets.Set([2,3])
117>>> S2.issubset(S1)
118True
119>>> S1.issubset(S2)
120False
121>>> S1.issuperset(S2)
122True
123>>>
124\end{verbatim}
125
126
127\begin{seealso}
128
129\seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson.
130Implemented by Greg V. Wilson, Alex Martelli, and GvR.}
131
132\end{seealso}
133
134
135
136%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000137\section{PEP 255: Simple Generators\label{section-generators}}
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000138
139In Python 2.2, generators were added as an optional feature, to be
140enabled by a \code{from __future__ import generators} directive. In
1412.3 generators no longer need to be specially enabled, and are now
142always present; this means that \keyword{yield} is now always a
143keyword. The rest of this section is a copy of the description of
144generators from the ``What's New in Python 2.2'' document; if you read
145it when 2.2 came out, you can skip the rest of this section.
146
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000147You're doubtless familiar with how function calls work in Python or C.
148When you call a function, it gets a private namespace where its local
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000149variables are created. When the function reaches a \keyword{return}
150statement, the local variables are destroyed and the resulting value
151is returned to the caller. A later call to the same function will get
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000152a fresh new set of local variables. But, what if the local variables
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000153weren't thrown away on exiting a function? What if you could later
154resume the function where it left off? This is what generators
155provide; they can be thought of as resumable functions.
156
157Here's the simplest example of a generator function:
158
159\begin{verbatim}
160def generate_ints(N):
161 for i in range(N):
162 yield i
163\end{verbatim}
164
165A new keyword, \keyword{yield}, was introduced for generators. Any
166function containing a \keyword{yield} statement is a generator
167function; this is detected by Python's bytecode compiler which
168compiles the function specially as a result.
169
170When you call a generator function, it doesn't return a single value;
171instead it returns a generator object that supports the iterator
172protocol. On executing the \keyword{yield} statement, the generator
173outputs the value of \code{i}, similar to a \keyword{return}
174statement. The big difference between \keyword{yield} and a
175\keyword{return} statement is that on reaching a \keyword{yield} the
176generator's state of execution is suspended and local variables are
177preserved. On the next call to the generator's \code{.next()} method,
178the function will resume executing immediately after the
179\keyword{yield} statement. (For complicated reasons, the
180\keyword{yield} statement isn't allowed inside the \keyword{try} block
181of a \code{try...finally} statement; read \pep{255} for a full
182explanation of the interaction between \keyword{yield} and
183exceptions.)
184
185Here's a sample usage of the \function{generate_ints} generator:
186
187\begin{verbatim}
188>>> gen = generate_ints(3)
189>>> gen
190<generator object at 0x8117f90>
191>>> gen.next()
1920
193>>> gen.next()
1941
195>>> gen.next()
1962
197>>> gen.next()
198Traceback (most recent call last):
Andrew M. Kuchling9f6e1042002-06-17 13:40:04 +0000199 File "stdin", line 1, in ?
200 File "stdin", line 2, in generate_ints
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000201StopIteration
202\end{verbatim}
203
204You could equally write \code{for i in generate_ints(5)}, or
205\code{a,b,c = generate_ints(3)}.
206
207Inside a generator function, the \keyword{return} statement can only
208be used without a value, and signals the end of the procession of
209values; afterwards the generator cannot return any further values.
210\keyword{return} with a value, such as \code{return 5}, is a syntax
211error inside a generator function. The end of the generator's results
212can also be indicated by raising \exception{StopIteration} manually,
213or by just letting the flow of execution fall off the bottom of the
214function.
215
216You could achieve the effect of generators manually by writing your
217own class and storing all the local variables of the generator as
218instance variables. For example, returning a list of integers could
219be done by setting \code{self.count} to 0, and having the
220\method{next()} method increment \code{self.count} and return it.
221However, for a moderately complicated generator, writing a
222corresponding class would be much messier.
223\file{Lib/test/test_generators.py} contains a number of more
224interesting examples. The simplest one implements an in-order
225traversal of a tree using generators recursively.
226
227\begin{verbatim}
228# A recursive generator that generates Tree leaves in in-order.
229def inorder(t):
230 if t:
231 for x in inorder(t.left):
232 yield x
233 yield t.label
234 for x in inorder(t.right):
235 yield x
236\end{verbatim}
237
238Two other examples in \file{Lib/test/test_generators.py} produce
239solutions for the N-Queens problem (placing $N$ queens on an $NxN$
240chess board so that no queen threatens another) and the Knight's Tour
241(a route that takes a knight to every square of an $NxN$ chessboard
242without visiting any square twice).
243
244The idea of generators comes from other programming languages,
245especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
246idea of generators is central. In Icon, every
247expression and function call behaves like a generator. One example
248from ``An Overview of the Icon Programming Language'' at
249\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
250what this looks like:
251
252\begin{verbatim}
253sentence := "Store it in the neighboring harbor"
254if (i := find("or", sentence)) > 5 then write(i)
255\end{verbatim}
256
257In Icon the \function{find()} function returns the indexes at which the
258substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
259\code{i} is first assigned a value of 3, but 3 is less than 5, so the
260comparison fails, and Icon retries it with the second value of 23. 23
261is greater than 5, so the comparison now succeeds, and the code prints
262the value 23 to the screen.
263
264Python doesn't go nearly as far as Icon in adopting generators as a
265central concept. Generators are considered a new part of the core
266Python language, but learning or using them isn't compulsory; if they
267don't solve any problems that you have, feel free to ignore them.
268One novel feature of Python's interface as compared to
269Icon's is that a generator's state is represented as a concrete object
270(the iterator) that can be passed around to other functions or stored
271in a data structure.
272
273\begin{seealso}
274
275\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
276Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
277and Tim Peters, with other fixes from the Python Labs crew.}
278
279\end{seealso}
280
281
282%======================================================================
Fred Drake13090e12002-08-22 16:51:08 +0000283\section{PEP 263: Source Code Encodings \label{section-encodings}}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000284
285Python source files can now be declared as being in different
286character set encodings. Encodings are declared by including a
287specially formatted comment in the first or second line of the source
288file. For example, a UTF-8 file can be declared with:
289
290\begin{verbatim}
291#!/usr/bin/env python
292# -*- coding: UTF-8 -*-
293\end{verbatim}
294
295Without such an encoding declaration, the default encoding used is
296ISO-8859-1, also known as Latin1.
297
298The encoding declaration only affects Unicode string literals; the
299text in the source code will be converted to Unicode using the
300specified encoding. Note that Python identifiers are still restricted
301to ASCII characters, so you can't have variable names that use
302characters outside of the usual alphanumerics.
303
304\begin{seealso}
305
306\seepep{263}{Defining Python Source Code Encodings}{Written by
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000307Marc-Andr\'e Lemburg and Martin von L\"owis; implemented by SUZUKI
308Hisao and Martin von L\"owis.}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000309
310\end{seealso}
311
312
313%======================================================================
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000314\section{PEP 277: Unicode file name support for Windows NT}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000315
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000316On Windows NT, 2000, and XP, the system stores file names as Unicode
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000317strings. Traditionally, Python has represented file names as byte
318strings, which is inadequate because it renders some file names
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000319inaccessible.
320
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000321Python now allows using arbitrary Unicode strings (within the
322limitations of the file system) for all functions that expect file
323names, in particular the \function{open()} built-in. If a Unicode
324string is passed to \function{os.listdir}, Python now returns a list
325of Unicode strings. A new function, \function{os.getcwdu()}, returns
326the current directory as a Unicode string.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000327
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000328Byte strings still work as file names, and Python will transparently
329convert them to Unicode using the \code{mbcs} encoding.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000330
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000331Other systems also allow Unicode strings as file names, but convert
332them to byte strings before passing them to the system which may cause
333a \exception{UnicodeError} to be raised. Applications can test whether
334arbitrary Unicode strings are supported as file names by checking
335\member{os.path.unicode_file_names}, a Boolean value.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000336
337\begin{seealso}
338
339\seepep{277}{Unicode file name support for Windows NT}{Written by Neil
340Hodgson; implemented by Neil Hodgson, Martin von L\"owis, and Mark
341Hammond.}
342
343\end{seealso}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000344
345
346%======================================================================
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000347\section{PEP 278: Universal Newline Support}
348
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000349The three major operating systems used today are Microsoft Windows,
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000350Apple's Macintosh OS, and the various \UNIX\ derivatives. A minor
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000351irritation is that these three platforms all use different characters
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000352to mark the ends of lines in text files. \UNIX\ uses character 10,
353the ASCII linefeed, while MacOS uses character 13, the ASCII carriage
354return, and Windows uses a two-character sequence of a carriage return
355plus a newline.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000356
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000357Python's file objects can now support end of line conventions other
358than the one followed by the platform on which Python is running.
359Opening a file with the mode \samp{U} or \samp{rU} will open a file
360for reading in universal newline mode. All three line ending
361conventions will be translated to a \samp{\e n} in the strings
362returned by the various file methods such as \method{read()} and
363\method{readline()}.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000364
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000365Universal newline support is also used when importing modules and when
366executing a file with the \function{execfile()} function. This means
367that Python modules can be shared between all three operating systems
368without needing to convert the line-endings.
369
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000370This feature can be disabled at compile-time by specifying
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000371\longprogramopt{without-universal-newlines} when running Python's
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000372\file{configure} script.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000373
374\begin{seealso}
375
376\seepep{278}{Universal Newline Support}{Written
377and implemented by Jack Jansen.}
378
379\end{seealso}
380
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000381
382%======================================================================
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000383\section{PEP 279: The \function{enumerate()} Built-in Function\label{section-enumerate}}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000384
385A new built-in function, \function{enumerate()}, will make
386certain loops a bit clearer. \code{enumerate(thing)}, where
387\var{thing} is either an iterator or a sequence, returns a iterator
388that will return \code{(0, \var{thing[0]})}, \code{(1,
389\var{thing[1]})}, \code{(2, \var{thing[2]})}, and so forth. Fairly
390often you'll see code to change every element of a list that looks
391like this:
392
393\begin{verbatim}
394for i in range(len(L)):
395 item = L[i]
396 # ... compute some result based on item ...
397 L[i] = result
398\end{verbatim}
399
400This can be rewritten using \function{enumerate()} as:
401
402\begin{verbatim}
403for i, item in enumerate(L):
404 # ... compute some result based on item ...
405 L[i] = result
406\end{verbatim}
407
408
409\begin{seealso}
410
411\seepep{279}{The enumerate() built-in function}{Written
412by Raymond D. Hettinger.}
413
414\end{seealso}
415
416
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000417%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000418\section{PEP 285: The \class{bool} Type\label{section-bool}}
419
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000420A Boolean type was added to Python 2.3. Two new constants were added
421to the \module{__builtin__} module, \constant{True} and
422\constant{False}. The type object for this new type is named
423\class{bool}; the constructor for it takes any Python value and
424converts it to \constant{True} or \constant{False}.
425
426\begin{verbatim}
427>>> bool(1)
428True
429>>> bool(0)
430False
431>>> bool([])
432False
433>>> bool( (1,) )
434True
435\end{verbatim}
436
437Most of the standard library modules and built-in functions have been
438changed to return Booleans.
439
440\begin{verbatim}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000441>>> obj = []
442>>> hasattr(obj, 'append')
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000443True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000444>>> isinstance(obj, list)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000445True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000446>>> isinstance(obj, tuple)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000447False
448\end{verbatim}
449
450Python's Booleans were added with the primary goal of making code
451clearer. For example, if you're reading a function and encounter the
452statement \code{return 1}, you might wonder whether the \samp{1}
453represents a truth value, or whether it's an index, or whether it's a
454coefficient that multiplies some other quantity. If the statement is
455\code{return True}, however, the meaning of the return value is quite
456clearly a truth value.
457
458Python's Booleans were not added for the sake of strict type-checking.
Andrew M. Kuchlinga2a206b2002-05-24 21:08:58 +0000459A very strict language such as Pascal would also prevent you
460performing arithmetic with Booleans, and would require that the
461expression in an \keyword{if} statement always evaluate to a Boolean.
462Python is not this strict, and it never will be. (\pep{285}
463explicitly says so.) So you can still use any expression in an
464\keyword{if}, even ones that evaluate to a list or tuple or some
465random object, and the Boolean type is a subclass of the
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000466\class{int} class, so arithmetic using a Boolean still works.
467
468\begin{verbatim}
469>>> True + 1
4702
471>>> False + 1
4721
473>>> False * 75
4740
475>>> True * 75
47675
477\end{verbatim}
478
479To sum up \constant{True} and \constant{False} in a sentence: they're
480alternative ways to spell the integer values 1 and 0, with the single
481difference that \function{str()} and \function{repr()} return the
482strings \samp{True} and \samp{False} instead of \samp{1} and \samp{0}.
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000483
484\begin{seealso}
485
486\seepep{285}{Adding a bool type}{Written and implemented by GvR.}
487
488\end{seealso}
489
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000490
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000491%======================================================================
492\section{PEP 293: Codec Error Handling Callbacks}
493
Martin v. Löwis20eae692002-10-07 19:01:07 +0000494When encoding a Unicode string into a byte string, unencodable
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000495characters may be encountered. So far, Python has allowed specifying
496the error processing as either ``strict'' (raising
497\exception{UnicodeError}), ``ignore'' (skip the character), or
498``replace'' (with question mark), defaulting to ``strict''. It may be
499desirable to specify an alternative processing of the error, e.g. by
500inserting an XML character reference or HTML entity reference into the
501converted string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000502
503Python now has a flexible framework to add additional processing
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000504strategies. New error handlers can be added with
Martin v. Löwis20eae692002-10-07 19:01:07 +0000505\function{codecs.register_error}. Codecs then can access the error
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000506handler with \function{codecs.lookup_error}. An equivalent C API has
507been added for codecs written in C. The error handler gets the
508necessary state information, such as the string being converted, the
509position in the string where the error was detected, and the target
510encoding. The handler can then either raise an exception, or return a
511replacement string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000512
513Two additional error handlers have been implemented using this
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000514framework: ``backslashreplace'' uses Python backslash quoting to
Martin v. Löwis20eae692002-10-07 19:01:07 +0000515represent the unencodable character, and ``xmlcharrefreplace'' emits
516XML character references.
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000517
518\begin{seealso}
519
520\seepep{293}{Codec Error Handling Callbacks}{Written and implemented by
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000521Walter D\"orwald.}
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000522
523\end{seealso}
524
525
526%======================================================================
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000527\section{Extended Slices\label{section-slices}}
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000528
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000529Ever since Python 1.4, the slicing syntax has supported an optional
530third ``step'' or ``stride'' argument. For example, these are all
531legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
532\code{L[::-1]}. This was added to Python included at the request of
533the developers of Numerical Python. However, the built-in sequence
534types of lists, tuples, and strings have never supported this feature,
535and you got a \exception{TypeError} if you tried it. Michael Hudson
536contributed a patch that was applied to Python 2.3 and fixed this
537shortcoming.
538
539For example, you can now easily extract the elements of a list that
540have even indexes:
Fred Drakedf872a22002-07-03 12:02:01 +0000541
542\begin{verbatim}
543>>> L = range(10)
544>>> L[::2]
545[0, 2, 4, 6, 8]
546\end{verbatim}
547
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000548Negative values also work, so you can make a copy of the same list in
549reverse order:
Fred Drakedf872a22002-07-03 12:02:01 +0000550
551\begin{verbatim}
552>>> L[::-1]
553[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
554\end{verbatim}
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000555
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000556This also works for strings:
557
558\begin{verbatim}
559>>> s='abcd'
560>>> s[::2]
561'ac'
562>>> s[::-1]
563'dcba'
564\end{verbatim}
565
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000566as well as tuples and arrays.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000567
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000568If you have a mutable sequence (i.e. a list or an array) you can
569assign to or delete an extended slice, but there are some differences
570in assignment to extended and regular slices. Assignment to a regular
571slice can be used to change the length of the sequence:
572
573\begin{verbatim}
574>>> a = range(3)
575>>> a
576[0, 1, 2]
577>>> a[1:3] = [4, 5, 6]
578>>> a
579[0, 4, 5, 6]
580\end{verbatim}
581
582but when assigning to an extended slice the list on the right hand
583side of the statement must contain the same number of items as the
584slice it is replacing:
585
586\begin{verbatim}
587>>> a = range(4)
588>>> a
589[0, 1, 2, 3]
590>>> a[::2]
591[0, 2]
592>>> a[::2] = range(0, -2, -1)
593>>> a
594[0, 1, -1, 3]
595>>> a[::2] = range(3)
596Traceback (most recent call last):
597 File "<stdin>", line 1, in ?
598ValueError: attempt to assign list of size 3 to extended slice of size 2
599\end{verbatim}
600
601Deletion is more straightforward:
602
603\begin{verbatim}
604>>> a = range(4)
605>>> a[::2]
606[0, 2]
607>>> del a[::2]
608>>> a
609[1, 3]
610\end{verbatim}
611
612One can also now pass slice objects to builtin sequences
613\method{__getitem__} methods:
614
615\begin{verbatim}
616>>> range(10).__getitem__(slice(0, 5, 2))
617[0, 2, 4]
618\end{verbatim}
619
620or use them directly in subscripts:
621
622\begin{verbatim}
623>>> range(10)[slice(0, 5, 2)]
624[0, 2, 4]
625\end{verbatim}
626
627To make implementing sequences that support extended slicing in Python
628easier, slice ojects now have a method \method{indices} which given
629the length of a sequence returns \code{(start, stop, step)} handling
630omitted and out-of-bounds indices in a manner consistent with regular
631slices (and this innocuous phrase hides a welter of confusing
632details!). The method is intended to be used like this:
633
634\begin{verbatim}
635class FakeSeq:
636 ...
637 def calc_item(self, i):
638 ...
639 def __getitem__(self, item):
640 if isinstance(item, slice):
641 return FakeSeq([self.calc_item(i)
642 in range(*item.indices(len(self)))])
643 else:
644 return self.calc_item(i)
645\end{verbatim}
646
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +0000647From this example you can also see that the builtin ``\class{slice}''
648object is now the type object for the slice type, and is no longer a
649function. This is consistent with Python 2.2, where \class{int},
650\class{str}, etc., underwent the same change.
651
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000652
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000653%======================================================================
Fred Drakedf872a22002-07-03 12:02:01 +0000654\section{Other Language Changes}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000655
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000656Here are all of the changes that Python 2.3 makes to the core Python
657language.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000658
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000659\begin{itemize}
660\item The \keyword{yield} statement is now always a keyword, as
661described in section~\ref{section-generators} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000662
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000663\item A new built-in function \function{enumerate()}
664was added, as described in section~\ref{section-enumerate} of this
665document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000666
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000667\item Two new constants, \constant{True} and \constant{False} were
668added along with the built-in \class{bool} type, as described in
669section~\ref{section-bool} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000670
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000671\item Built-in types now support the extended slicing syntax,
672as described in section~\ref{section-slices} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000673
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000674\item Dictionaries have a new method, \method{pop(\var{key})}, that
675returns the value corresponding to \var{key} and removes that
676key/value pair from the dictionary. \method{pop()} will raise a
677\exception{KeyError} if the requested key isn't present in the
678dictionary:
679
680\begin{verbatim}
681>>> d = {1:2}
682>>> d
683{1: 2}
684>>> d.pop(4)
685Traceback (most recent call last):
686 File ``stdin'', line 1, in ?
687KeyError: 4
688>>> d.pop(1)
6892
690>>> d.pop(1)
691Traceback (most recent call last):
692 File ``stdin'', line 1, in ?
693KeyError: pop(): dictionary is empty
694>>> d
695{}
696>>>
697\end{verbatim}
698
699(Patch contributed by Raymond Hettinger.)
700
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +0000701\item The \keyword{assert} statement no longer checks the \code{__debug__}
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000702flag, so you can no longer disable assertions by assigning to \code{__debug__}.
703Running Python with the \programopt{-O} switch will still generate
704code that doesn't execute any assertions.
705
706\item Most type objects are now callable, so you can use them
707to create new objects such as functions, classes, and modules. (This
708means that the \module{new} module can be deprecated in a future
709Python version, because you can now use the type objects available
710in the \module{types} module.)
711% XXX should new.py use PendingDeprecationWarning?
712For example, you can create a new module object with the following code:
713
714\begin{verbatim}
715>>> import types
716>>> m = types.ModuleType('abc','docstring')
717>>> m
718<module 'abc' (built-in)>
719>>> m.__doc__
720'docstring'
721\end{verbatim}
722
723\item
724A new warning, \exception{PendingDeprecationWarning} was added to
725indicate features which are in the process of being
726deprecated. The warning will \emph{not} be printed by default. To
727check for use of features that will be deprecated in the future,
728supply \programopt{-Walways::PendingDeprecationWarning::} on the
729command line or use \function{warnings.filterwarnings()}.
730
731\item Using \code{None} as a variable name will now result in a
732\exception{SyntaxWarning} warning. In a future version of Python,
733\code{None} may finally become a keyword.
734
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +0000735\item Python runs multithreaded programs by switching between threads
736after executing N bytecodes. The default value for N has been
737increased from 10 to 100 bytecodes, speeding up single-threaded
738applications by reducing the switching overhead. Some multithreaded
739applications may suffer slower response time, but that's easily fixed
740by setting the limit back to a lower number by calling
741\function{sys.setcheckinterval(\var{N})}.
742
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000743\item One minor but far-reaching change is that the names of extension
744types defined by the modules included with Python now contain the
745module and a \samp{.} in front of the type name. For example, in
746Python 2.2, if you created a socket and printed its
747\member{__class__}, you'd get this output:
748
749\begin{verbatim}
750>>> s = socket.socket()
751>>> s.__class__
752<type 'socket'>
753\end{verbatim}
754
755In 2.3, you get this:
756\begin{verbatim}
757>>> s.__class__
758<type '_socket.socket'>
759\end{verbatim}
760
761\end{itemize}
762
763
764\subsection{String Changes}
765
766\begin{itemize}
767
768\item The \code{in} operator now works differently for strings.
769Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
770and \var{Y} are strings, \var{X} could only be a single character.
771That's now changed; \var{X} can be a string of any length, and
772\code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
773substring of \var{Y}. If \var{X} is the empty string, the result is
774always \constant{True}.
775
776\begin{verbatim}
777>>> 'ab' in 'abcd'
778True
779>>> 'ad' in 'abcd'
780False
781>>> '' in 'abcd'
782True
783\end{verbatim}
784
785Note that this doesn't tell you where the substring starts; the
786\method{find()} method is still necessary to figure that out.
787
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000788\item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
789string methods now have an optional argument for specifying the
790characters to strip. The default is still to remove all whitespace
791characters:
792
793\begin{verbatim}
794>>> ' abc '.strip()
795'abc'
796>>> '><><abc<><><>'.strip('<>')
797'abc'
798>>> '><><abc<><><>\n'.strip('<>')
799'abc<><><>\n'
800>>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
801u'\u4001abc'
802>>>
803\end{verbatim}
804
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +0000805(Suggested by Simon Brunning, and implemented by Walter D\"orwald.)
Andrew M. Kuchling346386f2002-07-12 20:24:42 +0000806
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000807\item The \method{startswith()} and \method{endswith()}
808string methods now accept negative numbers for the start and end
809parameters.
810
811\item Another new string method is \method{zfill()}, originally a
812function in the \module{string} module. \method{zfill()} pads a
813numeric string with zeros on the left until it's the specified width.
814Note that the \code{\%} operator is still more flexible and powerful
815than \method{zfill()}.
816
817\begin{verbatim}
818>>> '45'.zfill(4)
819'0045'
820>>> '12345'.zfill(4)
821'12345'
822>>> 'goofy'.zfill(6)
823'0goofy'
824\end{verbatim}
825
Andrew M. Kuchling346386f2002-07-12 20:24:42 +0000826(Contributed by Walter D\"orwald.)
827
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000828\item A new type object, \class{basestring}, has been added.
829 Both 8-bit strings and Unicode strings inherit from this type, so
830 \code{isinstance(obj, basestring)} will return \constant{True} for
831 either kind of string. It's a completely abstract type, so you
832 can't create \class{basestring} instances.
833
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000834\item Interned strings are no longer immortal. Interned will now be
835garbage-collected in the usual way when the only reference to them is
836from the internal dictionary of interned strings. (Implemented by
837Oren Tirosh.)
838
839\end{itemize}
840
841
842\subsection{Optimizations}
843
844\begin{itemize}
845
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000846\item The \method{sort()} method of list objects has been extensively
847rewritten by Tim Peters, and the implementation is significantly
848faster.
849
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000850\item Multiplication of large long integers is now much faster thanks
851to an implementation of Karatsuba multiplication, an algorithm that
852scales better than the O(n*n) required for the grade-school
853multiplication algorithm. (Original patch by Christopher A. Craig,
854and significantly reworked by Tim Peters.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000855
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000856\item The \code{SET_LINENO} opcode is now gone. This may provide a
857small speed increase, subject to your compiler's idiosyncrasies.
858(Removed by Michael Hudson.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000859
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000860\item A number of small rearrangements have been made in various
861hotspots to improve performance, inlining a function here, removing
862some code there. (Implemented mostly by GvR, but lots of people have
863contributed to one change or another.)
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000864
865\end{itemize}
Neal Norwitzd68f5172002-05-29 15:54:55 +0000866
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000867
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000868%======================================================================
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +0000869\section{New and Improved Modules}
870
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000871As usual, Python's standard modules had a number of enhancements and
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000872bug fixes. Here's a partial list of the most notable changes, sorted
873alphabetically by module name. Consult the
874\file{Misc/NEWS} file in the source tree for a more
875complete list of changes, or look through the CVS logs for all the
876details.
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000877
878\begin{itemize}
879
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000880\item The \module{array} module now supports arrays of Unicode
881characters using the \samp{u} format character. Arrays also now
882support using the \code{+=} assignment operator to add another array's
883contents, and the \code{*=} assignment operator to repeat an array.
884(Contributed by Jason Orendorff.)
885
886\item The Distutils \class{Extension} class now supports
887an extra constructor argument named \samp{depends} for listing
888additional source files that an extension depends on. This lets
889Distutils recompile the module if any of the dependency files are
890modified. For example, if \samp{sampmodule.c} includes the header
891file \file{sample.h}, you would create the \class{Extension} object like
892this:
893
894\begin{verbatim}
895ext = Extension("samp",
896 sources=["sampmodule.c"],
897 depends=["sample.h"])
898\end{verbatim}
899
900Modifying \file{sample.h} would then cause the module to be recompiled.
901(Contributed by Jeremy Hylton.)
902
Andrew M. Kuchlingdc3f7e12002-11-04 20:05:10 +0000903\item Other minor changes to Distutils:
904it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP},
905\envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using
906them to override the settings in Python's configuration (contributed
907by Robert Weber); the \function{get_distutils_option()} method lists
908recently-added extensions to Distutils.
909
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000910\item The \module{getopt} module gained a new function,
911\function{gnu_getopt()}, that supports the same arguments as the existing
912\function{getopt()} function but uses GNU-style scanning mode.
913The existing \function{getopt()} stops processing options as soon as a
914non-option argument is encountered, but in GNU-style mode processing
915continues, meaning that options and arguments can be mixed. For
916example:
917
918\begin{verbatim}
919>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
920([('-f', 'filename')], ['output', '-v'])
921>>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
922([('-f', 'filename'), ('-v', '')], ['output'])
923\end{verbatim}
924
925(Contributed by Peter \AA{strand}.)
926
927\item The \module{grp}, \module{pwd}, and \module{resource} modules
928now return enhanced tuples:
929
930\begin{verbatim}
931>>> import grp
932>>> g = grp.getgrnam('amk')
933>>> g.gr_name, g.gr_gid
934('amk', 500)
935\end{verbatim}
936
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000937\item The new \module{heapq} module contains an implementation of a
938heap queue algorithm. A heap is an array-like data structure that
939keeps items in a sorted order such that, for every index k, heap[k] <=
940heap[2*k+1] and heap[k] <= heap[2*k+2]. This makes it quick to remove
941the smallest item, and inserting a new item while maintaining the heap
942property is O(lg~n). (See
943\url{http://www.nist.gov/dads/HTML/priorityque.html} for more
944information about the priority queue data structure.)
945
946The Python \module{heapq} module provides \function{heappush()} and
947\function{heappop()} functions for adding and removing items while
948maintaining the heap property on top of some other mutable Python
949sequence type. For example:
950
951\begin{verbatim}
952>>> import heapq
953>>> heap = []
954>>> for item in [3, 7, 5, 11, 1]:
955... heapq.heappush(heap, item)
956...
957>>> heap
958[1, 3, 5, 11, 7]
959>>> heapq.heappop(heap)
9601
961>>> heapq.heappop(heap)
9623
963>>> heap
964[5, 7, 11]
965>>>
966>>> heapq.heappush(heap, 5)
967>>> heap = []
968>>> for item in [3, 7, 5, 11, 1]:
969... heapq.heappush(heap, item)
970...
971>>> heap
972[1, 3, 5, 11, 7]
973>>> heapq.heappop(heap)
9741
975>>> heapq.heappop(heap)
9763
977>>> heap
978[5, 7, 11]
979>>>
980\end{verbatim}
981
982(Contributed by Kevin O'Connor.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000983
984\item Two new functions in the \module{math} module,
985\function{degrees(\var{rads})} and \function{radians(\var{degs})},
986convert between radians and degrees. Other functions in the
987\module{math} module such as
988\function{math.sin()} and \function{math.cos()} have always required
989input values measured in radians. (Contributed by Raymond Hettinger.)
990
Andrew M. Kuchlingc309cca2002-10-10 16:04:08 +0000991\item Seven new functions, \function{getpgid()}, \function{killpg()},
992\function{lchown()}, \function{major()}, \function{makedev()},
993\function{minor()}, and \function{mknod()}, were added to the
994\module{posix} module that underlies the \module{os} module.
995(Contributed by Gustavo Niemeyer and Geert Jansen.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000996
997\item The parser objects provided by the \module{pyexpat} module
998can now optionally buffer character data, resulting in fewer calls to
999your character data handler and therefore faster performance. Setting
1000the parser object's \member{buffer_text} attribute to \constant{True}
1001will enable buffering.
1002
1003\item The \module{readline} module also gained a number of new
1004functions: \function{get_history_item()},
1005\function{get_current_history_length()}, and \function{redisplay()}.
1006
1007\item Support for more advanced POSIX signal handling was added
1008to the \module{signal} module by adding the \function{sigpending},
1009\function{sigprocmask} and \function{sigsuspend} functions, where supported
1010by the platform. These functions make it possible to avoid some previously
1011unavoidable race conditions.
1012
1013\item The \module{socket} module now supports timeouts. You
1014can call the \method{settimeout(\var{t})} method on a socket object to
1015set a timeout of \var{t} seconds. Subsequent socket operations that
1016take longer than \var{t} seconds to complete will abort and raise a
1017\exception{socket.error} exception.
1018
1019The original timeout implementation was by Tim O'Malley. Michael
1020Gilfix integrated it into the Python \module{socket} module, after the
1021patch had undergone a lengthy review. After it was checked in, Guido
1022van~Rossum rewrote parts of it. This is a good example of the free
1023software development process in action.
1024
Fred Drake583db0d2002-09-14 02:03:25 +00001025\item The value of the C \constant{PYTHON_API_VERSION} macro is now exposed
1026at the Python level as \code{sys.api_version}.
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001027
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001028\item The new \module{textwrap} module contains functions for wrapping
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001029strings containing paragraphs of text. The \function{wrap(\var{text},
1030\var{width})} function takes a string and returns a list containing
1031the text split into lines of no more than the chosen width. The
1032\function{fill(\var{text}, \var{width})} function returns a single
1033string, reformatted to fit into lines no longer than the chosen width.
1034(As you can guess, \function{fill()} is built on top of
1035\function{wrap()}. For example:
1036
1037\begin{verbatim}
1038>>> import textwrap
1039>>> paragraph = "Not a whit, we defy augury: ... more text ..."
1040>>> textwrap.wrap(paragraph, 60)
1041["Not a whit, we defy augury: there's a special providence in",
1042 "the fall of a sparrow. If it be now, 'tis not to come; if it",
1043 ...]
1044>>> print textwrap.fill(paragraph, 35)
1045Not a whit, we defy augury: there's
1046a special providence in the fall of
1047a sparrow. If it be now, 'tis not
1048to come; if it be not to come, it
1049will be now; if it be not now, yet
1050it will come: the readiness is all.
1051>>>
1052\end{verbatim}
1053
1054The module also contains a \class{TextWrapper} class that actually
1055implements the text wrapping strategy. Both the
1056\class{TextWrapper} class and the \function{wrap()} and
1057\function{fill()} functions support a number of additional keyword
1058arguments for fine-tuning the formatting; consult the module's
1059documentation for details.
1060% XXX add a link to the module docs?
1061(Contributed by Greg Ward.)
1062
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001063\item The \module{time} module's \function{strptime()} function has
1064long been an annoyance because it uses the platform C library's
1065\function{strptime()} implementation, and different platforms
1066sometimes have odd bugs. Brett Cannon contributed a portable
1067implementation that's written in pure Python, which should behave
1068identically on all platforms.
1069
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001070\item The DOM implementation
1071in \module{xml.dom.minidom} can now generate XML output in a
1072particular encoding, by specifying an optional encoding argument to
1073the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1074
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001075\item The \function{*stat()} family of functions can now report
1076fractions of a second in a timestamp. Such time stamps are
1077represented as floats, similar to \function{time.time()}.
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001078
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001079During testing, it was found that some applications will break if time
1080stamps are floats. For compatibility, when using the tuple interface
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001081of the \class{stat_result}, time stamps are represented as integers.
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001082When using named fields (a feature first introduced in Python 2.2),
1083time stamps are still represented as ints, unless
1084\function{os.stat_float_times()} is invoked to enable float return
1085values:
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001086
1087\begin{verbatim}
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001088>>> os.stat("/tmp").st_mtime
10891034791200
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001090>>> os.stat_float_times(True)
1091>>> os.stat("/tmp").st_mtime
10921034791200.6335014
1093\end{verbatim}
1094
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001095In Python 2.4, the default will change to always returning floats.
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001096
1097Application developers should use this feature only if all their
1098libraries work properly when confronted with floating point time
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001099stamps, or if they use the tuple API. If used, the feature should be
1100activated on an application level instead of trying to enable it on a
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001101per-use basis.
1102
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001103\end{itemize}
1104
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001105
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001106%======================================================================
1107\section{Specialized Object Allocator (pymalloc)\label{section-pymalloc}}
1108
1109An experimental feature added to Python 2.1 was a specialized object
1110allocator called pymalloc, written by Vladimir Marangozov. Pymalloc
1111was intended to be faster than the system \cfunction{malloc()} and have
1112less memory overhead for typical allocation patterns of Python
1113programs. The allocator uses C's \cfunction{malloc()} function to get
1114large pools of memory, and then fulfills smaller memory requests from
1115these pools.
1116
1117In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
1118enabled by default; you had to explicitly turn it on by providing the
1119\longprogramopt{with-pymalloc} option to the \program{configure}
1120script. In 2.3, pymalloc has had further enhancements and is now
1121enabled by default; you'll have to supply
1122\longprogramopt{without-pymalloc} to disable it.
1123
1124This change is transparent to code written in Python; however,
1125pymalloc may expose bugs in C extensions. Authors of C extension
1126modules should test their code with the object allocator enabled,
1127because some incorrect code may cause core dumps at runtime. There
1128are a bunch of memory allocation functions in Python's C API that have
1129previously been just aliases for the C library's \cfunction{malloc()}
1130and \cfunction{free()}, meaning that if you accidentally called
1131mismatched functions, the error wouldn't be noticeable. When the
1132object allocator is enabled, these functions aren't aliases of
1133\cfunction{malloc()} and \cfunction{free()} any more, and calling the
1134wrong function to free memory may get you a core dump. For example,
1135if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
1136be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A
1137few modules included with Python fell afoul of this and had to be
1138fixed; doubtless there are more third-party modules that will have the
1139same problem.
1140
1141As part of this change, the confusing multiple interfaces for
1142allocating memory have been consolidated down into two API families.
1143Memory allocated with one family must not be manipulated with
1144functions from the other family.
1145
1146There is another family of functions specifically for allocating
1147Python \emph{objects} (as opposed to memory).
1148
1149\begin{itemize}
1150 \item To allocate and free an undistinguished chunk of memory use
1151 the ``raw memory'' family: \cfunction{PyMem_Malloc()},
1152 \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
1153
1154 \item The ``object memory'' family is the interface to the pymalloc
1155 facility described above and is biased towards a large number of
1156 ``small'' allocations: \cfunction{PyObject_Malloc},
1157 \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
1158
1159 \item To allocate and free Python objects, use the ``object'' family
1160 \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
1161 \cfunction{PyObject_Del()}.
1162\end{itemize}
1163
1164Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
1165debugging features to catch memory overwrites and doubled frees in
1166both extension modules and in the interpreter itself. To enable this
1167support, turn on the Python interpreter's debugging code by running
1168\program{configure} with \longprogramopt{with-pydebug}.
1169
1170To aid extension writers, a header file \file{Misc/pymemcompat.h} is
1171distributed with the source to Python 2.3 that allows Python
1172extensions to use the 2.3 interfaces to memory allocation and compile
1173against any version of Python since 1.5.2. You would copy the file
1174from Python's source distribution and bundle it with the source of
1175your extension.
1176
1177\begin{seealso}
1178
1179\seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
1180{For the full details of the pymalloc implementation, see
1181the comments at the top of the file \file{Objects/obmalloc.c} in the
1182Python source code. The above link points to the file within the
1183SourceForge CVS browser.}
1184
1185\end{seealso}
1186
1187
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001188% ======================================================================
1189\section{Build and C API Changes}
1190
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00001191Changes to Python's build process and to the C API include:
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001192
1193\begin{itemize}
1194
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001195\item The C-level interface to the garbage collector has been changed,
1196to make it easier to write extension types that support garbage
1197collection, and to make it easier to debug misuses of the functions.
1198Various functions have slightly different semantics, so a bunch of
1199functions had to be renamed. Extensions that use the old API will
1200still compile but will \emph{not} participate in garbage collection,
1201so updating them for 2.3 should be considered fairly high priority.
1202
1203To upgrade an extension module to the new API, perform the following
1204steps:
1205
1206\begin{itemize}
1207
1208\item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}.
1209
1210\item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to
1211allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them.
1212
1213\item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and
1214\cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}.
1215
1216\item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations.
1217
1218\item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}.
1219
1220\end{itemize}
1221
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001222\item Python can now optionally be built as a shared library
1223(\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +00001224when running Python's \file{configure} script. (Contributed by Ondrej
1225Palkovsky.)
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +00001226
Michael W. Hudsondd32a912002-08-15 14:59:02 +00001227\item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
1228are now deprecated. Initialization functions for Python extension
1229modules should now be declared using the new macro
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00001230\csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
1231use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
1232macros.
Neal Norwitzbba23a82002-07-22 13:18:59 +00001233
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001234\item The interpreter can be compiled without any docstrings for
1235the built-in functions and modules by supplying
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001236\longprogramopt{without-doc-strings} to the \file{configure} script.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001237This makes the Python executable about 10\% smaller, but will also
1238mean that you can't get help for Python's built-ins. (Contributed by
1239Gustavo Niemeyer.)
1240
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001241\item The cycle detection implementation used by the garbage collection
1242has proven to be stable, so it's now being made mandatory; you can no
1243longer compile Python without it, and the
1244\longprogramopt{with-cycle-gc} switch to \file{configure} has been removed.
1245
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001246\item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00001247that uses it should be changed. For Python 2.2 and later, the method
1248definition table can specify the
1249\constant{METH_NOARGS} flag, signalling that there are no arguments, and
1250the argument checking can then be removed. If compatibility with
1251pre-2.2 versions of Python is important, the code could use
1252\code{PyArg_ParseTuple(args, "")} instead, but this will be slower
1253than using \constant{METH_NOARGS}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001254
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001255\item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
1256char *\var{key})} was added
1257as shorthand for
1258\code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key})}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001259
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001260\item File objects now manage their internal string buffer
1261differently by increasing it exponentially when needed.
1262This results in the benchmark tests in \file{Lib/test/test_bufio.py}
1263speeding up from 57 seconds to 1.7 seconds, according to one
1264measurement.
1265
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00001266\item It's now possible to define class and static methods for a C
1267extension type by setting either the \constant{METH_CLASS} or
1268\constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
1269structure.
Andrew M. Kuchling45afd542002-04-02 14:25:25 +00001270
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00001271\item Python now includes a copy of the Expat XML parser's source code,
1272removing any dependence on a system version or local installation of
1273Expat.
1274
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001275\end{itemize}
1276
1277\subsection{Port-Specific Changes}
1278
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001279Support for a port to IBM's OS/2 using the EMX runtime environment was
1280merged into the main Python source tree. EMX is a POSIX emulation
1281layer over the OS/2 system APIs. The Python port for EMX tries to
1282support all the POSIX-like capability exposed by the EMX runtime, and
1283mostly succeeds; \function{fork()} and \function{fcntl()} are
1284restricted by the limitations of the underlying emulation layer. The
1285standard OS/2 port, which uses IBM's Visual Age compiler, also gained
1286support for case-sensitive import semantics as part of the integration
1287of the EMX port into CVS. (Contributed by Andrew MacIntyre.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001288
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00001289On MacOS, most toolbox modules have been weaklinked to improve
1290backward compatibility. This means that modules will no longer fail
1291to load if a single routine is missing on the curent OS version.
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001292Instead calling the missing routine will raise an exception.
1293(Contributed by Jack Jansen.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001294
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001295The RPM spec files, found in the \file{Misc/RPM/} directory in the
1296Python source distribution, were updated for 2.3. (Contributed by
1297Sean Reifschneider.)
Fred Drake03e10312002-03-26 19:17:43 +00001298
Andrew M. Kuchling3e3e1292002-10-10 11:32:30 +00001299Python now supports AtheOS (\url{http://www.atheos.cx}) and GNU/Hurd.
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001300
Fred Drake03e10312002-03-26 19:17:43 +00001301
1302%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001303\section{Other Changes and Fixes}
1304
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00001305As usual, there were a bunch of other improvements and bugfixes
1306scattered throughout the source tree. A search through the CVS change
1307logs finds there were 289 patches applied and 323 bugs fixed between
1308Python 2.2 and 2.3. Both figures are likely to be underestimates.
1309
1310Some of the more notable changes are:
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001311
1312\begin{itemize}
1313
1314\item The tools used to build the documentation now work under Cygwin
1315as well as \UNIX.
1316
Michael W. Hudsondd32a912002-08-15 14:59:02 +00001317\item The \code{SET_LINENO} opcode has been removed. Back in the
1318mists of time, this opcode was needed to produce line numbers in
1319tracebacks and support trace functions (for, e.g., \module{pdb}).
1320Since Python 1.5, the line numbers in tracebacks have been computed
1321using a different mechanism that works with ``python -O''. For Python
13222.3 Michael Hudson implemented a similar scheme to determine when to
1323call the trace function, removing the need for \code{SET_LINENO}
1324entirely.
1325
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00001326It would be difficult to detect any resulting difference from Python
1327code, apart from a slight speed up when Python is run without
Michael W. Hudsondd32a912002-08-15 14:59:02 +00001328\programopt{-O}.
1329
1330C extensions that access the \member{f_lineno} field of frame objects
1331should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
1332This will have the added effect of making the code work as desired
1333under ``python -O'' in earlier versions of Python.
1334
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001335\end{itemize}
1336
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001337
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001338%======================================================================
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001339\section{Porting to Python 2.3}
1340
1341XXX write this
1342
1343
1344%======================================================================
Fred Drake03e10312002-03-26 19:17:43 +00001345\section{Acknowledgements \label{acks}}
1346
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001347The author would like to thank the following people for offering
1348suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00001349article: Simon Brunning, Michael Chermside, Scott David Daniels, Fred~L. Drake, Jr.,
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00001350Michael Hudson, Detlef Lannert, Martin von L\"owis, Andrew MacIntyre,
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00001351Lalo Martins, Gustavo Niemeyer, Neal Norwitz, Neil Schemenauer, Jason
1352Tishler.
Fred Drake03e10312002-03-26 19:17:43 +00001353
1354\end{document}