blob: 2b1469830eadd539df02754a2d474fe324147488 [file] [log] [blame]
Fred Drake03e10312002-03-26 19:17:43 +00001\documentclass{howto}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002% $Id$
3
4\title{What's New in Python 2.3}
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00005\release{0.03}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00006\author{A.M. Kuchling}
7\authoraddress{\email{akuchlin@mems-exchange.org}}
Fred Drake03e10312002-03-26 19:17:43 +00008
9\begin{document}
10\maketitle
11\tableofcontents
12
Andrew M. Kuchlingf70a0a82002-06-10 13:22:46 +000013% Optik (or whatever it gets called)
14%
Andrew M. Kuchlingc61ec522002-08-04 01:20:05 +000015% MacOS framework-related changes (section of its own, probably)
16%
Andrew M. Kuchling950725f2002-08-06 01:40:48 +000017% New sorting code
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +000018%
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +000019% xreadlines obsolete; files are their own iterator
Andrew M. Kuchlingf70a0a82002-06-10 13:22:46 +000020
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000021%\section{Introduction \label{intro}}
22
23{\large This article is a draft, and is currently up to date for some
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +000024random version of the CVS tree around mid-July 2002. Please send any
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000025additions, comments or errata to the author.}
26
27This article explains the new features in Python 2.3. The tentative
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +000028release date of Python 2.3 is currently scheduled for some undefined
29time before the end of 2002.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000030
31This article doesn't attempt to provide a complete specification of
32the new features, but instead provides a convenient overview. For
33full details, you should refer to the documentation for Python 2.3,
34such as the
35\citetitle[http://www.python.org/doc/2.3/lib/lib.html]{Python Library
36Reference} and the
37\citetitle[http://www.python.org/doc/2.3/ref/ref.html]{Python
38Reference Manual}. If you want to understand the complete
39implementation and design rationale for a change, refer to the PEP for
40a particular new feature.
Fred Drake03e10312002-03-26 19:17:43 +000041
42
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000043%======================================================================
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000044\section{PEP 218: A Standard Set Datatype}
45
46The new \module{sets} module contains an implementation of a set
47datatype. The \class{Set} class is for mutable sets, sets that can
48have members added and removed. The \class{ImmutableSet} class is for
49sets that can't be modified, and can be used as dictionary keys. Sets
50are built on top of dictionaries, so the elements within a set must be
51hashable.
52
53As a simple example,
54
55\begin{verbatim}
56>>> import sets
57>>> S = sets.Set([1,2,3])
58>>> S
59Set([1, 2, 3])
60>>> 1 in S
61True
62>>> 0 in S
63False
64>>> S.add(5)
65>>> S.remove(3)
66>>> S
67Set([1, 2, 5])
68>>>
69\end{verbatim}
70
71The union and intersection of sets can be computed with the
72\method{union()} and \method{intersection()} methods, or,
73alternatively, using the bitwise operators \samp{\&} and \samp{|}.
74Mutable sets also have in-place versions of these methods,
75\method{union_update()} and \method{intersection_update()}.
76
77\begin{verbatim}
78>>> S1 = sets.Set([1,2,3])
79>>> S2 = sets.Set([4,5,6])
80>>> S1.union(S2)
81Set([1, 2, 3, 4, 5, 6])
82>>> S1 | S2 # Alternative notation
83Set([1, 2, 3, 4, 5, 6])
84>>> S1.intersection(S2)
85Set([])
86>>> S1 & S2 # Alternative notation
87Set([])
88>>> S1.union_update(S2)
89Set([1, 2, 3, 4, 5, 6])
90>>> S1
91Set([1, 2, 3, 4, 5, 6])
92>>>
93\end{verbatim}
94
95It's also possible to take the symmetric difference of two sets. This
96is the set of all elements in the union that aren't in the
97intersection. An alternative way of expressing the symmetric
98difference is that it contains all elements that are in exactly one
99set. Again, there's an in-place version, with the ungainly name
100\method{symmetric_difference_update()}.
101
102\begin{verbatim}
103>>> S1 = sets.Set([1,2,3,4])
104>>> S2 = sets.Set([3,4,5,6])
105>>> S1.symmetric_difference(S2)
106Set([1, 2, 5, 6])
107>>> S1 ^ S2
108Set([1, 2, 5, 6])
109>>>
110\end{verbatim}
111
112There are also methods, \method{issubset()} and \method{issuperset()},
113for checking whether one set is a strict subset or superset of
114another:
115
116\begin{verbatim}
117>>> S1 = sets.Set([1,2,3])
118>>> S2 = sets.Set([2,3])
119>>> S2.issubset(S1)
120True
121>>> S1.issubset(S2)
122False
123>>> S1.issuperset(S2)
124True
125>>>
126\end{verbatim}
127
128
129\begin{seealso}
130
131\seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson.
132Implemented by Greg V. Wilson, Alex Martelli, and GvR.}
133
134\end{seealso}
135
136
137
138%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000139\section{PEP 255: Simple Generators\label{section-generators}}
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000140
141In Python 2.2, generators were added as an optional feature, to be
142enabled by a \code{from __future__ import generators} directive. In
1432.3 generators no longer need to be specially enabled, and are now
144always present; this means that \keyword{yield} is now always a
145keyword. The rest of this section is a copy of the description of
146generators from the ``What's New in Python 2.2'' document; if you read
147it when 2.2 came out, you can skip the rest of this section.
148
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000149You're doubtless familiar with how function calls work in Python or C.
150When you call a function, it gets a private namespace where its local
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000151variables are created. When the function reaches a \keyword{return}
152statement, the local variables are destroyed and the resulting value
153is returned to the caller. A later call to the same function will get
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000154a fresh new set of local variables. But, what if the local variables
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000155weren't thrown away on exiting a function? What if you could later
156resume the function where it left off? This is what generators
157provide; they can be thought of as resumable functions.
158
159Here's the simplest example of a generator function:
160
161\begin{verbatim}
162def generate_ints(N):
163 for i in range(N):
164 yield i
165\end{verbatim}
166
167A new keyword, \keyword{yield}, was introduced for generators. Any
168function containing a \keyword{yield} statement is a generator
169function; this is detected by Python's bytecode compiler which
170compiles the function specially as a result.
171
172When you call a generator function, it doesn't return a single value;
173instead it returns a generator object that supports the iterator
174protocol. On executing the \keyword{yield} statement, the generator
175outputs the value of \code{i}, similar to a \keyword{return}
176statement. The big difference between \keyword{yield} and a
177\keyword{return} statement is that on reaching a \keyword{yield} the
178generator's state of execution is suspended and local variables are
179preserved. On the next call to the generator's \code{.next()} method,
180the function will resume executing immediately after the
181\keyword{yield} statement. (For complicated reasons, the
182\keyword{yield} statement isn't allowed inside the \keyword{try} block
183of a \code{try...finally} statement; read \pep{255} for a full
184explanation of the interaction between \keyword{yield} and
185exceptions.)
186
187Here's a sample usage of the \function{generate_ints} generator:
188
189\begin{verbatim}
190>>> gen = generate_ints(3)
191>>> gen
192<generator object at 0x8117f90>
193>>> gen.next()
1940
195>>> gen.next()
1961
197>>> gen.next()
1982
199>>> gen.next()
200Traceback (most recent call last):
Andrew M. Kuchling9f6e1042002-06-17 13:40:04 +0000201 File "stdin", line 1, in ?
202 File "stdin", line 2, in generate_ints
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000203StopIteration
204\end{verbatim}
205
206You could equally write \code{for i in generate_ints(5)}, or
207\code{a,b,c = generate_ints(3)}.
208
209Inside a generator function, the \keyword{return} statement can only
210be used without a value, and signals the end of the procession of
211values; afterwards the generator cannot return any further values.
212\keyword{return} with a value, such as \code{return 5}, is a syntax
213error inside a generator function. The end of the generator's results
214can also be indicated by raising \exception{StopIteration} manually,
215or by just letting the flow of execution fall off the bottom of the
216function.
217
218You could achieve the effect of generators manually by writing your
219own class and storing all the local variables of the generator as
220instance variables. For example, returning a list of integers could
221be done by setting \code{self.count} to 0, and having the
222\method{next()} method increment \code{self.count} and return it.
223However, for a moderately complicated generator, writing a
224corresponding class would be much messier.
225\file{Lib/test/test_generators.py} contains a number of more
226interesting examples. The simplest one implements an in-order
227traversal of a tree using generators recursively.
228
229\begin{verbatim}
230# A recursive generator that generates Tree leaves in in-order.
231def inorder(t):
232 if t:
233 for x in inorder(t.left):
234 yield x
235 yield t.label
236 for x in inorder(t.right):
237 yield x
238\end{verbatim}
239
240Two other examples in \file{Lib/test/test_generators.py} produce
241solutions for the N-Queens problem (placing $N$ queens on an $NxN$
242chess board so that no queen threatens another) and the Knight's Tour
243(a route that takes a knight to every square of an $NxN$ chessboard
244without visiting any square twice).
245
246The idea of generators comes from other programming languages,
247especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
248idea of generators is central. In Icon, every
249expression and function call behaves like a generator. One example
250from ``An Overview of the Icon Programming Language'' at
251\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
252what this looks like:
253
254\begin{verbatim}
255sentence := "Store it in the neighboring harbor"
256if (i := find("or", sentence)) > 5 then write(i)
257\end{verbatim}
258
259In Icon the \function{find()} function returns the indexes at which the
260substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
261\code{i} is first assigned a value of 3, but 3 is less than 5, so the
262comparison fails, and Icon retries it with the second value of 23. 23
263is greater than 5, so the comparison now succeeds, and the code prints
264the value 23 to the screen.
265
266Python doesn't go nearly as far as Icon in adopting generators as a
267central concept. Generators are considered a new part of the core
268Python language, but learning or using them isn't compulsory; if they
269don't solve any problems that you have, feel free to ignore them.
270One novel feature of Python's interface as compared to
271Icon's is that a generator's state is represented as a concrete object
272(the iterator) that can be passed around to other functions or stored
273in a data structure.
274
275\begin{seealso}
276
277\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
278Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
279and Tim Peters, with other fixes from the Python Labs crew.}
280
281\end{seealso}
282
283
284%======================================================================
Fred Drake13090e12002-08-22 16:51:08 +0000285\section{PEP 263: Source Code Encodings \label{section-encodings}}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000286
287Python source files can now be declared as being in different
288character set encodings. Encodings are declared by including a
289specially formatted comment in the first or second line of the source
290file. For example, a UTF-8 file can be declared with:
291
292\begin{verbatim}
293#!/usr/bin/env python
294# -*- coding: UTF-8 -*-
295\end{verbatim}
296
297Without such an encoding declaration, the default encoding used is
298ISO-8859-1, also known as Latin1.
299
300The encoding declaration only affects Unicode string literals; the
301text in the source code will be converted to Unicode using the
302specified encoding. Note that Python identifiers are still restricted
303to ASCII characters, so you can't have variable names that use
304characters outside of the usual alphanumerics.
305
306\begin{seealso}
307
308\seepep{263}{Defining Python Source Code Encodings}{Written by
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000309Marc-Andr\'e Lemburg and Martin von L\"owis; implemented by SUZUKI
310Hisao and Martin von L\"owis.}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000311
312\end{seealso}
313
314
315%======================================================================
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000316\section{PEP 277: Unicode file name support for Windows NT}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000317
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000318On Windows NT, 2000, and XP, the system stores file names as Unicode
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000319strings. Traditionally, Python has represented file names as byte
320strings, which is inadequate because it renders some file names
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000321inaccessible.
322
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000323Python now allows using arbitrary Unicode strings (within the
324limitations of the file system) for all functions that expect file
325names, in particular the \function{open()} built-in. If a Unicode
326string is passed to \function{os.listdir}, Python now returns a list
327of Unicode strings. A new function, \function{os.getcwdu()}, returns
328the current directory as a Unicode string.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000329
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000330Byte strings still work as file names, and Python will transparently
331convert them to Unicode using the \code{mbcs} encoding.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000332
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000333Other systems also allow Unicode strings as file names, but convert
334them to byte strings before passing them to the system which may cause
335a \exception{UnicodeError} to be raised. Applications can test whether
336arbitrary Unicode strings are supported as file names by checking
337\member{os.path.unicode_file_names}, a Boolean value.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000338
339\begin{seealso}
340
341\seepep{277}{Unicode file name support for Windows NT}{Written by Neil
342Hodgson; implemented by Neil Hodgson, Martin von L\"owis, and Mark
343Hammond.}
344
345\end{seealso}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000346
347
348%======================================================================
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000349\section{PEP 278: Universal Newline Support}
350
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000351The three major operating systems used today are Microsoft Windows,
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000352Apple's Macintosh OS, and the various \UNIX\ derivatives. A minor
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000353irritation is that these three platforms all use different characters
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000354to mark the ends of lines in text files. \UNIX\ uses character 10,
355the ASCII linefeed, while MacOS uses character 13, the ASCII carriage
356return, and Windows uses a two-character sequence of a carriage return
357plus a newline.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000358
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000359Python's file objects can now support end of line conventions other
360than the one followed by the platform on which Python is running.
361Opening a file with the mode \samp{U} or \samp{rU} will open a file
362for reading in universal newline mode. All three line ending
363conventions will be translated to a \samp{\e n} in the strings
364returned by the various file methods such as \method{read()} and
365\method{readline()}.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000366
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000367Universal newline support is also used when importing modules and when
368executing a file with the \function{execfile()} function. This means
369that Python modules can be shared between all three operating systems
370without needing to convert the line-endings.
371
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000372This feature can be disabled at compile-time by specifying
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000373\longprogramopt{without-universal-newlines} when running Python's
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000374\file{configure} script.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000375
376\begin{seealso}
377
378\seepep{278}{Universal Newline Support}{Written
379and implemented by Jack Jansen.}
380
381\end{seealso}
382
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000383
384%======================================================================
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000385\section{PEP 279: The \function{enumerate()} Built-in Function\label{section-enumerate}}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000386
387A new built-in function, \function{enumerate()}, will make
388certain loops a bit clearer. \code{enumerate(thing)}, where
389\var{thing} is either an iterator or a sequence, returns a iterator
390that will return \code{(0, \var{thing[0]})}, \code{(1,
391\var{thing[1]})}, \code{(2, \var{thing[2]})}, and so forth. Fairly
392often you'll see code to change every element of a list that looks
393like this:
394
395\begin{verbatim}
396for i in range(len(L)):
397 item = L[i]
398 # ... compute some result based on item ...
399 L[i] = result
400\end{verbatim}
401
402This can be rewritten using \function{enumerate()} as:
403
404\begin{verbatim}
405for i, item in enumerate(L):
406 # ... compute some result based on item ...
407 L[i] = result
408\end{verbatim}
409
410
411\begin{seealso}
412
413\seepep{279}{The enumerate() built-in function}{Written
414by Raymond D. Hettinger.}
415
416\end{seealso}
417
418
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000419%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000420\section{PEP 285: The \class{bool} Type\label{section-bool}}
421
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000422A Boolean type was added to Python 2.3. Two new constants were added
423to the \module{__builtin__} module, \constant{True} and
424\constant{False}. The type object for this new type is named
425\class{bool}; the constructor for it takes any Python value and
426converts it to \constant{True} or \constant{False}.
427
428\begin{verbatim}
429>>> bool(1)
430True
431>>> bool(0)
432False
433>>> bool([])
434False
435>>> bool( (1,) )
436True
437\end{verbatim}
438
439Most of the standard library modules and built-in functions have been
440changed to return Booleans.
441
442\begin{verbatim}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000443>>> obj = []
444>>> hasattr(obj, 'append')
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000445True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000446>>> isinstance(obj, list)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000447True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000448>>> isinstance(obj, tuple)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000449False
450\end{verbatim}
451
452Python's Booleans were added with the primary goal of making code
453clearer. For example, if you're reading a function and encounter the
454statement \code{return 1}, you might wonder whether the \samp{1}
455represents a truth value, or whether it's an index, or whether it's a
456coefficient that multiplies some other quantity. If the statement is
457\code{return True}, however, the meaning of the return value is quite
458clearly a truth value.
459
460Python's Booleans were not added for the sake of strict type-checking.
Andrew M. Kuchlinga2a206b2002-05-24 21:08:58 +0000461A very strict language such as Pascal would also prevent you
462performing arithmetic with Booleans, and would require that the
463expression in an \keyword{if} statement always evaluate to a Boolean.
464Python is not this strict, and it never will be. (\pep{285}
465explicitly says so.) So you can still use any expression in an
466\keyword{if}, even ones that evaluate to a list or tuple or some
467random object, and the Boolean type is a subclass of the
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000468\class{int} class, so arithmetic using a Boolean still works.
469
470\begin{verbatim}
471>>> True + 1
4722
473>>> False + 1
4741
475>>> False * 75
4760
477>>> True * 75
47875
479\end{verbatim}
480
481To sum up \constant{True} and \constant{False} in a sentence: they're
482alternative ways to spell the integer values 1 and 0, with the single
483difference that \function{str()} and \function{repr()} return the
484strings \samp{True} and \samp{False} instead of \samp{1} and \samp{0}.
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000485
486\begin{seealso}
487
488\seepep{285}{Adding a bool type}{Written and implemented by GvR.}
489
490\end{seealso}
491
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000492
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000493%======================================================================
494\section{PEP 293: Codec Error Handling Callbacks}
495
Martin v. Löwis20eae692002-10-07 19:01:07 +0000496When encoding a Unicode string into a byte string, unencodable
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000497characters may be encountered. So far, Python has allowed specifying
498the error processing as either ``strict'' (raising
499\exception{UnicodeError}), ``ignore'' (skip the character), or
500``replace'' (with question mark), defaulting to ``strict''. It may be
501desirable to specify an alternative processing of the error, e.g. by
502inserting an XML character reference or HTML entity reference into the
503converted string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000504
505Python now has a flexible framework to add additional processing
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000506strategies. New error handlers can be added with
Martin v. Löwis20eae692002-10-07 19:01:07 +0000507\function{codecs.register_error}. Codecs then can access the error
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000508handler with \function{codecs.lookup_error}. An equivalent C API has
509been added for codecs written in C. The error handler gets the
510necessary state information, such as the string being converted, the
511position in the string where the error was detected, and the target
512encoding. The handler can then either raise an exception, or return a
513replacement string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000514
515Two additional error handlers have been implemented using this
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000516framework: ``backslashreplace'' uses Python backslash quoting to
Martin v. Löwis20eae692002-10-07 19:01:07 +0000517represent the unencodable character, and ``xmlcharrefreplace'' emits
518XML character references.
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000519
520\begin{seealso}
521
522\seepep{293}{Codec Error Handling Callbacks}{Written and implemented by
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000523Walter D\"orwald.}
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000524
525\end{seealso}
526
527
528%======================================================================
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000529\section{Extended Slices\label{section-slices}}
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000530
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000531Ever since Python 1.4, the slicing syntax has supported an optional
532third ``step'' or ``stride'' argument. For example, these are all
533legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
534\code{L[::-1]}. This was added to Python included at the request of
535the developers of Numerical Python. However, the built-in sequence
536types of lists, tuples, and strings have never supported this feature,
537and you got a \exception{TypeError} if you tried it. Michael Hudson
538contributed a patch that was applied to Python 2.3 and fixed this
539shortcoming.
540
541For example, you can now easily extract the elements of a list that
542have even indexes:
Fred Drakedf872a22002-07-03 12:02:01 +0000543
544\begin{verbatim}
545>>> L = range(10)
546>>> L[::2]
547[0, 2, 4, 6, 8]
548\end{verbatim}
549
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000550Negative values also work, so you can make a copy of the same list in
551reverse order:
Fred Drakedf872a22002-07-03 12:02:01 +0000552
553\begin{verbatim}
554>>> L[::-1]
555[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
556\end{verbatim}
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000557
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000558This also works for strings:
559
560\begin{verbatim}
561>>> s='abcd'
562>>> s[::2]
563'ac'
564>>> s[::-1]
565'dcba'
566\end{verbatim}
567
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000568as well as tuples and arrays.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000569
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000570If you have a mutable sequence (i.e. a list or an array) you can
571assign to or delete an extended slice, but there are some differences
572in assignment to extended and regular slices. Assignment to a regular
573slice can be used to change the length of the sequence:
574
575\begin{verbatim}
576>>> a = range(3)
577>>> a
578[0, 1, 2]
579>>> a[1:3] = [4, 5, 6]
580>>> a
581[0, 4, 5, 6]
582\end{verbatim}
583
584but when assigning to an extended slice the list on the right hand
585side of the statement must contain the same number of items as the
586slice it is replacing:
587
588\begin{verbatim}
589>>> a = range(4)
590>>> a
591[0, 1, 2, 3]
592>>> a[::2]
593[0, 2]
594>>> a[::2] = range(0, -2, -1)
595>>> a
596[0, 1, -1, 3]
597>>> a[::2] = range(3)
598Traceback (most recent call last):
599 File "<stdin>", line 1, in ?
600ValueError: attempt to assign list of size 3 to extended slice of size 2
601\end{verbatim}
602
603Deletion is more straightforward:
604
605\begin{verbatim}
606>>> a = range(4)
607>>> a[::2]
608[0, 2]
609>>> del a[::2]
610>>> a
611[1, 3]
612\end{verbatim}
613
614One can also now pass slice objects to builtin sequences
615\method{__getitem__} methods:
616
617\begin{verbatim}
618>>> range(10).__getitem__(slice(0, 5, 2))
619[0, 2, 4]
620\end{verbatim}
621
622or use them directly in subscripts:
623
624\begin{verbatim}
625>>> range(10)[slice(0, 5, 2)]
626[0, 2, 4]
627\end{verbatim}
628
629To make implementing sequences that support extended slicing in Python
630easier, slice ojects now have a method \method{indices} which given
631the length of a sequence returns \code{(start, stop, step)} handling
632omitted and out-of-bounds indices in a manner consistent with regular
633slices (and this innocuous phrase hides a welter of confusing
634details!). The method is intended to be used like this:
635
636\begin{verbatim}
637class FakeSeq:
638 ...
639 def calc_item(self, i):
640 ...
641 def __getitem__(self, item):
642 if isinstance(item, slice):
643 return FakeSeq([self.calc_item(i)
644 in range(*item.indices(len(self)))])
645 else:
646 return self.calc_item(i)
647\end{verbatim}
648
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +0000649From this example you can also see that the builtin ``\class{slice}''
650object is now the type object for the slice type, and is no longer a
651function. This is consistent with Python 2.2, where \class{int},
652\class{str}, etc., underwent the same change.
653
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000654
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000655%======================================================================
Fred Drakedf872a22002-07-03 12:02:01 +0000656\section{Other Language Changes}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000657
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000658Here are all of the changes that Python 2.3 makes to the core Python
659language.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000660
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000661\begin{itemize}
662\item The \keyword{yield} statement is now always a keyword, as
663described in section~\ref{section-generators} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000664
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000665\item A new built-in function \function{enumerate()}
666was added, as described in section~\ref{section-enumerate} of this
667document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000668
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000669\item Two new constants, \constant{True} and \constant{False} were
670added along with the built-in \class{bool} type, as described in
671section~\ref{section-bool} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000672
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000673\item Built-in types now support the extended slicing syntax,
674as described in section~\ref{section-slices} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000675
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000676\item Dictionaries have a new method, \method{pop(\var{key})}, that
677returns the value corresponding to \var{key} and removes that
678key/value pair from the dictionary. \method{pop()} will raise a
679\exception{KeyError} if the requested key isn't present in the
680dictionary:
681
682\begin{verbatim}
683>>> d = {1:2}
684>>> d
685{1: 2}
686>>> d.pop(4)
687Traceback (most recent call last):
688 File ``stdin'', line 1, in ?
689KeyError: 4
690>>> d.pop(1)
6912
692>>> d.pop(1)
693Traceback (most recent call last):
694 File ``stdin'', line 1, in ?
695KeyError: pop(): dictionary is empty
696>>> d
697{}
698>>>
699\end{verbatim}
700
701(Patch contributed by Raymond Hettinger.)
702
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000703\item The \keyword{assert} statement no longer checks the \code{__debug__}
704flag, so you can no longer disable assertions by assigning to \code{__debug__}.
705Running Python with the \programopt{-O} switch will still generate
706code that doesn't execute any assertions.
707
708\item Most type objects are now callable, so you can use them
709to create new objects such as functions, classes, and modules. (This
710means that the \module{new} module can be deprecated in a future
711Python version, because you can now use the type objects available
712in the \module{types} module.)
713% XXX should new.py use PendingDeprecationWarning?
714For example, you can create a new module object with the following code:
715
716\begin{verbatim}
717>>> import types
718>>> m = types.ModuleType('abc','docstring')
719>>> m
720<module 'abc' (built-in)>
721>>> m.__doc__
722'docstring'
723\end{verbatim}
724
725\item
726A new warning, \exception{PendingDeprecationWarning} was added to
727indicate features which are in the process of being
728deprecated. The warning will \emph{not} be printed by default. To
729check for use of features that will be deprecated in the future,
730supply \programopt{-Walways::PendingDeprecationWarning::} on the
731command line or use \function{warnings.filterwarnings()}.
732
733\item Using \code{None} as a variable name will now result in a
734\exception{SyntaxWarning} warning. In a future version of Python,
735\code{None} may finally become a keyword.
736
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +0000737\item Python runs multithreaded programs by switching between threads
738after executing N bytecodes. The default value for N has been
739increased from 10 to 100 bytecodes, speeding up single-threaded
740applications by reducing the switching overhead. Some multithreaded
741applications may suffer slower response time, but that's easily fixed
742by setting the limit back to a lower number by calling
743\function{sys.setcheckinterval(\var{N})}.
744
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000745\item One minor but far-reaching change is that the names of extension
746types defined by the modules included with Python now contain the
747module and a \samp{.} in front of the type name. For example, in
748Python 2.2, if you created a socket and printed its
749\member{__class__}, you'd get this output:
750
751\begin{verbatim}
752>>> s = socket.socket()
753>>> s.__class__
754<type 'socket'>
755\end{verbatim}
756
757In 2.3, you get this:
758\begin{verbatim}
759>>> s.__class__
760<type '_socket.socket'>
761\end{verbatim}
762
763\end{itemize}
764
765
766\subsection{String Changes}
767
768\begin{itemize}
769
770\item The \code{in} operator now works differently for strings.
771Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
772and \var{Y} are strings, \var{X} could only be a single character.
773That's now changed; \var{X} can be a string of any length, and
774\code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
775substring of \var{Y}. If \var{X} is the empty string, the result is
776always \constant{True}.
777
778\begin{verbatim}
779>>> 'ab' in 'abcd'
780True
781>>> 'ad' in 'abcd'
782False
783>>> '' in 'abcd'
784True
785\end{verbatim}
786
787Note that this doesn't tell you where the substring starts; the
788\method{find()} method is still necessary to figure that out.
789
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000790\item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
791string methods now have an optional argument for specifying the
792characters to strip. The default is still to remove all whitespace
793characters:
794
795\begin{verbatim}
796>>> ' abc '.strip()
797'abc'
798>>> '><><abc<><><>'.strip('<>')
799'abc'
800>>> '><><abc<><><>\n'.strip('<>')
801'abc<><><>\n'
802>>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
803u'\u4001abc'
804>>>
805\end{verbatim}
806
Andrew M. Kuchling346386f2002-07-12 20:24:42 +0000807(Contributed by Simon Brunning.)
808
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000809\item The \method{startswith()} and \method{endswith()}
810string methods now accept negative numbers for the start and end
811parameters.
812
813\item Another new string method is \method{zfill()}, originally a
814function in the \module{string} module. \method{zfill()} pads a
815numeric string with zeros on the left until it's the specified width.
816Note that the \code{\%} operator is still more flexible and powerful
817than \method{zfill()}.
818
819\begin{verbatim}
820>>> '45'.zfill(4)
821'0045'
822>>> '12345'.zfill(4)
823'12345'
824>>> 'goofy'.zfill(6)
825'0goofy'
826\end{verbatim}
827
Andrew M. Kuchling346386f2002-07-12 20:24:42 +0000828(Contributed by Walter D\"orwald.)
829
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000830\item A new type object, \class{basestring}, has been added.
831 Both 8-bit strings and Unicode strings inherit from this type, so
832 \code{isinstance(obj, basestring)} will return \constant{True} for
833 either kind of string. It's a completely abstract type, so you
834 can't create \class{basestring} instances.
835
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000836\item Interned strings are no longer immortal. Interned will now be
837garbage-collected in the usual way when the only reference to them is
838from the internal dictionary of interned strings. (Implemented by
839Oren Tirosh.)
840
841\end{itemize}
842
843
844\subsection{Optimizations}
845
846\begin{itemize}
847
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000848\item The \method{sort()} method of list objects has been extensively
849rewritten by Tim Peters, and the implementation is significantly
850faster.
851
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000852\item Multiplication of large long integers is now much faster thanks
853to an implementation of Karatsuba multiplication, an algorithm that
854scales better than the O(n*n) required for the grade-school
855multiplication algorithm. (Original patch by Christopher A. Craig,
856and significantly reworked by Tim Peters.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000857
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000858\item The \code{SET_LINENO} opcode is now gone. This may provide a
859small speed increase, subject to your compiler's idiosyncrasies.
860(Removed by Michael Hudson.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +0000861
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000862\item A number of small rearrangements have been made in various
863hotspots to improve performance, inlining a function here, removing
864some code there. (Implemented mostly by GvR, but lots of people have
865contributed to one change or another.)
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000866
867\end{itemize}
Neal Norwitzd68f5172002-05-29 15:54:55 +0000868
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +0000869
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000870%======================================================================
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +0000871\section{New and Improved Modules}
872
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000873As usual, Python's standard modules had a number of enhancements and
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000874bug fixes. Here's a partial list of the most notable changes, sorted
875alphabetically by module name. Consult the
876\file{Misc/NEWS} file in the source tree for a more
877complete list of changes, or look through the CVS logs for all the
878details.
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000879
880\begin{itemize}
881
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000882\item The \module{array} module now supports arrays of Unicode
883characters using the \samp{u} format character. Arrays also now
884support using the \code{+=} assignment operator to add another array's
885contents, and the \code{*=} assignment operator to repeat an array.
886(Contributed by Jason Orendorff.)
887
888\item The Distutils \class{Extension} class now supports
889an extra constructor argument named \samp{depends} for listing
890additional source files that an extension depends on. This lets
891Distutils recompile the module if any of the dependency files are
892modified. For example, if \samp{sampmodule.c} includes the header
893file \file{sample.h}, you would create the \class{Extension} object like
894this:
895
896\begin{verbatim}
897ext = Extension("samp",
898 sources=["sampmodule.c"],
899 depends=["sample.h"])
900\end{verbatim}
901
902Modifying \file{sample.h} would then cause the module to be recompiled.
903(Contributed by Jeremy Hylton.)
904
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000905\item The \module{getopt} module gained a new function,
906\function{gnu_getopt()}, that supports the same arguments as the existing
907\function{getopt()} function but uses GNU-style scanning mode.
908The existing \function{getopt()} stops processing options as soon as a
909non-option argument is encountered, but in GNU-style mode processing
910continues, meaning that options and arguments can be mixed. For
911example:
912
913\begin{verbatim}
914>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
915([('-f', 'filename')], ['output', '-v'])
916>>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
917([('-f', 'filename'), ('-v', '')], ['output'])
918\end{verbatim}
919
920(Contributed by Peter \AA{strand}.)
921
922\item The \module{grp}, \module{pwd}, and \module{resource} modules
923now return enhanced tuples:
924
925\begin{verbatim}
926>>> import grp
927>>> g = grp.getgrnam('amk')
928>>> g.gr_name, g.gr_gid
929('amk', 500)
930\end{verbatim}
931
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000932\item The new \module{heapq} module contains an implementation of a
933heap queue algorithm. A heap is an array-like data structure that
934keeps items in a sorted order such that, for every index k, heap[k] <=
935heap[2*k+1] and heap[k] <= heap[2*k+2]. This makes it quick to remove
936the smallest item, and inserting a new item while maintaining the heap
937property is O(lg~n). (See
938\url{http://www.nist.gov/dads/HTML/priorityque.html} for more
939information about the priority queue data structure.)
940
941The Python \module{heapq} module provides \function{heappush()} and
942\function{heappop()} functions for adding and removing items while
943maintaining the heap property on top of some other mutable Python
944sequence type. For example:
945
946\begin{verbatim}
947>>> import heapq
948>>> heap = []
949>>> for item in [3, 7, 5, 11, 1]:
950... heapq.heappush(heap, item)
951...
952>>> heap
953[1, 3, 5, 11, 7]
954>>> heapq.heappop(heap)
9551
956>>> heapq.heappop(heap)
9573
958>>> heap
959[5, 7, 11]
960>>>
961>>> heapq.heappush(heap, 5)
962>>> heap = []
963>>> for item in [3, 7, 5, 11, 1]:
964... heapq.heappush(heap, item)
965...
966>>> heap
967[1, 3, 5, 11, 7]
968>>> heapq.heappop(heap)
9691
970>>> heapq.heappop(heap)
9713
972>>> heap
973[5, 7, 11]
974>>>
975\end{verbatim}
976
977(Contributed by Kevin O'Connor.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000978
979\item Two new functions in the \module{math} module,
980\function{degrees(\var{rads})} and \function{radians(\var{degs})},
981convert between radians and degrees. Other functions in the
982\module{math} module such as
983\function{math.sin()} and \function{math.cos()} have always required
984input values measured in radians. (Contributed by Raymond Hettinger.)
985
Andrew M. Kuchlingc309cca2002-10-10 16:04:08 +0000986\item Seven new functions, \function{getpgid()}, \function{killpg()},
987\function{lchown()}, \function{major()}, \function{makedev()},
988\function{minor()}, and \function{mknod()}, were added to the
989\module{posix} module that underlies the \module{os} module.
990(Contributed by Gustavo Niemeyer and Geert Jansen.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +0000991
992\item The parser objects provided by the \module{pyexpat} module
993can now optionally buffer character data, resulting in fewer calls to
994your character data handler and therefore faster performance. Setting
995the parser object's \member{buffer_text} attribute to \constant{True}
996will enable buffering.
997
998\item The \module{readline} module also gained a number of new
999functions: \function{get_history_item()},
1000\function{get_current_history_length()}, and \function{redisplay()}.
1001
1002\item Support for more advanced POSIX signal handling was added
1003to the \module{signal} module by adding the \function{sigpending},
1004\function{sigprocmask} and \function{sigsuspend} functions, where supported
1005by the platform. These functions make it possible to avoid some previously
1006unavoidable race conditions.
1007
1008\item The \module{socket} module now supports timeouts. You
1009can call the \method{settimeout(\var{t})} method on a socket object to
1010set a timeout of \var{t} seconds. Subsequent socket operations that
1011take longer than \var{t} seconds to complete will abort and raise a
1012\exception{socket.error} exception.
1013
1014The original timeout implementation was by Tim O'Malley. Michael
1015Gilfix integrated it into the Python \module{socket} module, after the
1016patch had undergone a lengthy review. After it was checked in, Guido
1017van~Rossum rewrote parts of it. This is a good example of the free
1018software development process in action.
1019
Fred Drake583db0d2002-09-14 02:03:25 +00001020\item The value of the C \constant{PYTHON_API_VERSION} macro is now exposed
1021at the Python level as \code{sys.api_version}.
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001022
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001023\item The new \module{textwrap} module contains functions for wrapping
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001024strings containing paragraphs of text. The \function{wrap(\var{text},
1025\var{width})} function takes a string and returns a list containing
1026the text split into lines of no more than the chosen width. The
1027\function{fill(\var{text}, \var{width})} function returns a single
1028string, reformatted to fit into lines no longer than the chosen width.
1029(As you can guess, \function{fill()} is built on top of
1030\function{wrap()}. For example:
1031
1032\begin{verbatim}
1033>>> import textwrap
1034>>> paragraph = "Not a whit, we defy augury: ... more text ..."
1035>>> textwrap.wrap(paragraph, 60)
1036["Not a whit, we defy augury: there's a special providence in",
1037 "the fall of a sparrow. If it be now, 'tis not to come; if it",
1038 ...]
1039>>> print textwrap.fill(paragraph, 35)
1040Not a whit, we defy augury: there's
1041a special providence in the fall of
1042a sparrow. If it be now, 'tis not
1043to come; if it be not to come, it
1044will be now; if it be not now, yet
1045it will come: the readiness is all.
1046>>>
1047\end{verbatim}
1048
1049The module also contains a \class{TextWrapper} class that actually
1050implements the text wrapping strategy. Both the
1051\class{TextWrapper} class and the \function{wrap()} and
1052\function{fill()} functions support a number of additional keyword
1053arguments for fine-tuning the formatting; consult the module's
1054documentation for details.
1055% XXX add a link to the module docs?
1056(Contributed by Greg Ward.)
1057
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001058\item The \module{time} module's \function{strptime()} function has
1059long been an annoyance because it uses the platform C library's
1060\function{strptime()} implementation, and different platforms
1061sometimes have odd bugs. Brett Cannon contributed a portable
1062implementation that's written in pure Python, which should behave
1063identically on all platforms.
1064
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001065\item The DOM implementation
1066in \module{xml.dom.minidom} can now generate XML output in a
1067particular encoding, by specifying an optional encoding argument to
1068the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1069
Martin v. Löwisf607bda2002-10-16 18:27:39 +00001070\item The \function{stat} family of functions can now report fractions
1071of a second in a time stamp. Similar to \function{time.time}, such
1072time stamps are represented as floats.
1073
1074During testing, it was found that some applications break if time
1075stamps are floats. For compatibility, when using the tuple interface
1076of the \class{stat_result}, time stamps are represented as integers.
1077When using named fields (first introduced in Python 2.2), time stamps
1078are still represented as ints, unless \function{os.stat_float_times}
1079is invoked:
1080
1081\begin{verbatim}
1082>>> os.stat_float_times(True)
1083>>> os.stat("/tmp").st_mtime
10841034791200.6335014
1085\end{verbatim}
1086
1087In Python 2.4, the default will change to return floats.
1088
1089Application developers should use this feature only if all their
1090libraries work properly when confronted with floating point time
1091stamps (or use the tuple API). If used, the feature should be
1092activated on application level, instead of trying to activate it on a
1093per-use basis.
1094
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001095\end{itemize}
1096
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001097
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001098%======================================================================
1099\section{Specialized Object Allocator (pymalloc)\label{section-pymalloc}}
1100
1101An experimental feature added to Python 2.1 was a specialized object
1102allocator called pymalloc, written by Vladimir Marangozov. Pymalloc
1103was intended to be faster than the system \cfunction{malloc()} and have
1104less memory overhead for typical allocation patterns of Python
1105programs. The allocator uses C's \cfunction{malloc()} function to get
1106large pools of memory, and then fulfills smaller memory requests from
1107these pools.
1108
1109In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
1110enabled by default; you had to explicitly turn it on by providing the
1111\longprogramopt{with-pymalloc} option to the \program{configure}
1112script. In 2.3, pymalloc has had further enhancements and is now
1113enabled by default; you'll have to supply
1114\longprogramopt{without-pymalloc} to disable it.
1115
1116This change is transparent to code written in Python; however,
1117pymalloc may expose bugs in C extensions. Authors of C extension
1118modules should test their code with the object allocator enabled,
1119because some incorrect code may cause core dumps at runtime. There
1120are a bunch of memory allocation functions in Python's C API that have
1121previously been just aliases for the C library's \cfunction{malloc()}
1122and \cfunction{free()}, meaning that if you accidentally called
1123mismatched functions, the error wouldn't be noticeable. When the
1124object allocator is enabled, these functions aren't aliases of
1125\cfunction{malloc()} and \cfunction{free()} any more, and calling the
1126wrong function to free memory may get you a core dump. For example,
1127if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
1128be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A
1129few modules included with Python fell afoul of this and had to be
1130fixed; doubtless there are more third-party modules that will have the
1131same problem.
1132
1133As part of this change, the confusing multiple interfaces for
1134allocating memory have been consolidated down into two API families.
1135Memory allocated with one family must not be manipulated with
1136functions from the other family.
1137
1138There is another family of functions specifically for allocating
1139Python \emph{objects} (as opposed to memory).
1140
1141\begin{itemize}
1142 \item To allocate and free an undistinguished chunk of memory use
1143 the ``raw memory'' family: \cfunction{PyMem_Malloc()},
1144 \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
1145
1146 \item The ``object memory'' family is the interface to the pymalloc
1147 facility described above and is biased towards a large number of
1148 ``small'' allocations: \cfunction{PyObject_Malloc},
1149 \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
1150
1151 \item To allocate and free Python objects, use the ``object'' family
1152 \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
1153 \cfunction{PyObject_Del()}.
1154\end{itemize}
1155
1156Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
1157debugging features to catch memory overwrites and doubled frees in
1158both extension modules and in the interpreter itself. To enable this
1159support, turn on the Python interpreter's debugging code by running
1160\program{configure} with \longprogramopt{with-pydebug}.
1161
1162To aid extension writers, a header file \file{Misc/pymemcompat.h} is
1163distributed with the source to Python 2.3 that allows Python
1164extensions to use the 2.3 interfaces to memory allocation and compile
1165against any version of Python since 1.5.2. You would copy the file
1166from Python's source distribution and bundle it with the source of
1167your extension.
1168
1169\begin{seealso}
1170
1171\seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
1172{For the full details of the pymalloc implementation, see
1173the comments at the top of the file \file{Objects/obmalloc.c} in the
1174Python source code. The above link points to the file within the
1175SourceForge CVS browser.}
1176
1177\end{seealso}
1178
1179
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001180% ======================================================================
1181\section{Build and C API Changes}
1182
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00001183Changes to Python's build process and to the C API include:
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001184
1185\begin{itemize}
1186
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001187\item The C-level interface to the garbage collector has been changed,
1188to make it easier to write extension types that support garbage
1189collection, and to make it easier to debug misuses of the functions.
1190Various functions have slightly different semantics, so a bunch of
1191functions had to be renamed. Extensions that use the old API will
1192still compile but will \emph{not} participate in garbage collection,
1193so updating them for 2.3 should be considered fairly high priority.
1194
1195To upgrade an extension module to the new API, perform the following
1196steps:
1197
1198\begin{itemize}
1199
1200\item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}.
1201
1202\item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to
1203allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them.
1204
1205\item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and
1206\cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}.
1207
1208\item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations.
1209
1210\item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}.
1211
1212\end{itemize}
1213
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001214\item Python can now optionally be built as a shared library
1215(\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +00001216when running Python's \file{configure} script. (Contributed by Ondrej
1217Palkovsky.)
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +00001218
Michael W. Hudsondd32a912002-08-15 14:59:02 +00001219\item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
1220are now deprecated. Initialization functions for Python extension
1221modules should now be declared using the new macro
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00001222\csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
1223use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
1224macros.
Neal Norwitzbba23a82002-07-22 13:18:59 +00001225
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001226\item The interpreter can be compiled without any docstrings for
1227the built-in functions and modules by supplying
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001228\longprogramopt{without-doc-strings} to the \file{configure} script.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001229This makes the Python executable about 10\% smaller, but will also
1230mean that you can't get help for Python's built-ins. (Contributed by
1231Gustavo Niemeyer.)
1232
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001233\item The cycle detection implementation used by the garbage collection
1234has proven to be stable, so it's now being made mandatory; you can no
1235longer compile Python without it, and the
1236\longprogramopt{with-cycle-gc} switch to \file{configure} has been removed.
1237
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001238\item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00001239that uses it should be changed. For Python 2.2 and later, the method
1240definition table can specify the
1241\constant{METH_NOARGS} flag, signalling that there are no arguments, and
1242the argument checking can then be removed. If compatibility with
1243pre-2.2 versions of Python is important, the code could use
1244\code{PyArg_ParseTuple(args, "")} instead, but this will be slower
1245than using \constant{METH_NOARGS}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001246
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001247\item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
1248char *\var{key})} was added
1249as shorthand for
1250\code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key})}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001251
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001252\item File objects now manage their internal string buffer
1253differently by increasing it exponentially when needed.
1254This results in the benchmark tests in \file{Lib/test/test_bufio.py}
1255speeding up from 57 seconds to 1.7 seconds, according to one
1256measurement.
1257
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00001258\item It's now possible to define class and static methods for a C
1259extension type by setting either the \constant{METH_CLASS} or
1260\constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
1261structure.
Andrew M. Kuchling45afd542002-04-02 14:25:25 +00001262
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00001263\item Python now includes a copy of the Expat XML parser's source code,
1264removing any dependence on a system version or local installation of
1265Expat.
1266
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001267\end{itemize}
1268
1269\subsection{Port-Specific Changes}
1270
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001271Support for a port to IBM's OS/2 using the EMX runtime environment was
1272merged into the main Python source tree. EMX is a POSIX emulation
1273layer over the OS/2 system APIs. The Python port for EMX tries to
1274support all the POSIX-like capability exposed by the EMX runtime, and
1275mostly succeeds; \function{fork()} and \function{fcntl()} are
1276restricted by the limitations of the underlying emulation layer. The
1277standard OS/2 port, which uses IBM's Visual Age compiler, also gained
1278support for case-sensitive import semantics as part of the integration
1279of the EMX port into CVS. (Contributed by Andrew MacIntyre.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001280
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00001281On MacOS, most toolbox modules have been weaklinked to improve
1282backward compatibility. This means that modules will no longer fail
1283to load if a single routine is missing on the curent OS version.
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001284Instead calling the missing routine will raise an exception.
1285(Contributed by Jack Jansen.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001286
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001287The RPM spec files, found in the \file{Misc/RPM/} directory in the
1288Python source distribution, were updated for 2.3. (Contributed by
1289Sean Reifschneider.)
Fred Drake03e10312002-03-26 19:17:43 +00001290
Andrew M. Kuchling3e3e1292002-10-10 11:32:30 +00001291Python now supports AtheOS (\url{http://www.atheos.cx}) and GNU/Hurd.
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001292
Fred Drake03e10312002-03-26 19:17:43 +00001293
1294%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001295\section{Other Changes and Fixes}
1296
1297Finally, there are various miscellaneous fixes:
1298
1299\begin{itemize}
1300
1301\item The tools used to build the documentation now work under Cygwin
1302as well as \UNIX.
1303
Michael W. Hudsondd32a912002-08-15 14:59:02 +00001304\item The \code{SET_LINENO} opcode has been removed. Back in the
1305mists of time, this opcode was needed to produce line numbers in
1306tracebacks and support trace functions (for, e.g., \module{pdb}).
1307Since Python 1.5, the line numbers in tracebacks have been computed
1308using a different mechanism that works with ``python -O''. For Python
13092.3 Michael Hudson implemented a similar scheme to determine when to
1310call the trace function, removing the need for \code{SET_LINENO}
1311entirely.
1312
1313Python code will be hard pushed to notice a difference from this
1314change, apart from a slight speed up when python is run without
1315\programopt{-O}.
1316
1317C extensions that access the \member{f_lineno} field of frame objects
1318should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
1319This will have the added effect of making the code work as desired
1320under ``python -O'' in earlier versions of Python.
1321
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001322\end{itemize}
1323
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00001324
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001325%======================================================================
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001326\section{Porting to Python 2.3}
1327
1328XXX write this
1329
1330
1331%======================================================================
Fred Drake03e10312002-03-26 19:17:43 +00001332\section{Acknowledgements \label{acks}}
1333
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001334The author would like to thank the following people for offering
1335suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchling7f147a72002-06-10 18:58:19 +00001336article: Michael Chermside, Scott David Daniels, Fred~L. Drake, Jr.,
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00001337Michael Hudson, Detlef Lannert, Martin von L\"owis, Andrew MacIntyre,
Andrew M. Kuchling83992482002-10-10 11:31:48 +00001338Lalo Martins, Gustavo Niemeyer, Neal Norwitz, Jason Tishler.
Fred Drake03e10312002-03-26 19:17:43 +00001339
1340\end{document}