blob: 5d07c3d446e540414376d37cd32560dbff745a71 [file] [log] [blame]
Fred Drake03e10312002-03-26 19:17:43 +00001\documentclass{howto}
Fred Drake693aea22003-02-07 14:52:18 +00002\usepackage{distutils}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00003% $Id$
4
5\title{What's New in Python 2.3}
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +00006\release{0.11}
Fred Drakeaac8c582003-01-17 22:50:10 +00007\author{A.M.\ Kuchling}
Andrew M. Kuchlingbc5e3cc2002-11-05 00:26:33 +00008\authoraddress{\email{amk@amk.ca}}
Fred Drake03e10312002-03-26 19:17:43 +00009
10\begin{document}
11\maketitle
12\tableofcontents
13
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +000014% To do:
Andrew M. Kuchlingc760c6c2003-07-16 20:12:33 +000015% PYTHONINSPECT
Andrew M. Kuchlingc760c6c2003-07-16 20:12:33 +000016% file.encoding
17% doctest extensions
18% new version of IDLE
19% XML-RPC nil extension
Andrew M. Kuchlingc61ec522002-08-04 01:20:05 +000020% MacOS framework-related changes (section of its own, probably)
Andrew M. Kuchlingf70a0a82002-06-10 13:22:46 +000021
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000022%\section{Introduction \label{intro}}
23
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +000024{\large This article is a draft, and is currently up to date for
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +000025Python 2.3rc1. Please send any additions, comments or errata to the
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +000026author.}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000027
28This article explains the new features in Python 2.3. The tentative
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +000029release date of Python 2.3 is currently scheduled for August 2003.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000030
31This article doesn't attempt to provide a complete specification of
32the new features, but instead provides a convenient overview. For
33full details, you should refer to the documentation for Python 2.3,
Fred Drake693aea22003-02-07 14:52:18 +000034such as the \citetitle[../lib/lib.html]{Python Library Reference} and
35the \citetitle[../ref/ref.html]{Python Reference Manual}. If you want
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +000036to understand the complete implementation and design rationale,
37refer to the PEP for a particular new feature.
Fred Drake03e10312002-03-26 19:17:43 +000038
39
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +000040%======================================================================
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000041\section{PEP 218: A Standard Set Datatype}
42
43The new \module{sets} module contains an implementation of a set
44datatype. The \class{Set} class is for mutable sets, sets that can
45have members added and removed. The \class{ImmutableSet} class is for
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +000046sets that can't be modified, and instances of \class{ImmutableSet} can
47therefore be used as dictionary keys. Sets are built on top of
48dictionaries, so the elements within a set must be hashable.
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000049
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +000050Here's a simple example:
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000051
52\begin{verbatim}
53>>> import sets
54>>> S = sets.Set([1,2,3])
55>>> S
56Set([1, 2, 3])
57>>> 1 in S
58True
59>>> 0 in S
60False
61>>> S.add(5)
62>>> S.remove(3)
63>>> S
64Set([1, 2, 5])
Fred Drake5c4cf152002-11-13 14:59:06 +000065>>>
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000066\end{verbatim}
67
68The union and intersection of sets can be computed with the
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +000069\method{union()} and \method{intersection()} methods; an alternative
70notation uses the bitwise operators \code{\&} and \code{|}.
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000071Mutable sets also have in-place versions of these methods,
72\method{union_update()} and \method{intersection_update()}.
73
74\begin{verbatim}
75>>> S1 = sets.Set([1,2,3])
76>>> S2 = sets.Set([4,5,6])
77>>> S1.union(S2)
78Set([1, 2, 3, 4, 5, 6])
79>>> S1 | S2 # Alternative notation
80Set([1, 2, 3, 4, 5, 6])
Fred Drake5c4cf152002-11-13 14:59:06 +000081>>> S1.intersection(S2)
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000082Set([])
83>>> S1 & S2 # Alternative notation
84Set([])
85>>> S1.union_update(S2)
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000086>>> S1
87Set([1, 2, 3, 4, 5, 6])
Fred Drake5c4cf152002-11-13 14:59:06 +000088>>>
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000089\end{verbatim}
90
91It's also possible to take the symmetric difference of two sets. This
92is the set of all elements in the union that aren't in the
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +000093intersection. Another way of putting it is that the symmetric
94difference contains all elements that are in exactly one
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +000095set. Again, there's an alternative notation (\code{\^}), and an
96in-place version with the ungainly name
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +000097\method{symmetric_difference_update()}.
98
99\begin{verbatim}
100>>> S1 = sets.Set([1,2,3,4])
101>>> S2 = sets.Set([3,4,5,6])
102>>> S1.symmetric_difference(S2)
103Set([1, 2, 5, 6])
104>>> S1 ^ S2
105Set([1, 2, 5, 6])
106>>>
107\end{verbatim}
108
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000109There are also \method{issubset()} and \method{issuperset()} methods
Michael W. Hudson065f5fa2003-02-10 19:24:50 +0000110for checking whether one set is a subset or superset of another:
Andrew M. Kuchlingbc465102002-08-20 01:34:06 +0000111
112\begin{verbatim}
113>>> S1 = sets.Set([1,2,3])
114>>> S2 = sets.Set([2,3])
115>>> S2.issubset(S1)
116True
117>>> S1.issubset(S2)
118False
119>>> S1.issuperset(S2)
120True
121>>>
122\end{verbatim}
123
124
125\begin{seealso}
126
127\seepep{218}{Adding a Built-In Set Object Type}{PEP written by Greg V. Wilson.
128Implemented by Greg V. Wilson, Alex Martelli, and GvR.}
129
130\end{seealso}
131
132
133
134%======================================================================
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000135\section{PEP 255: Simple Generators\label{section-generators}}
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000136
137In Python 2.2, generators were added as an optional feature, to be
138enabled by a \code{from __future__ import generators} directive. In
1392.3 generators no longer need to be specially enabled, and are now
140always present; this means that \keyword{yield} is now always a
141keyword. The rest of this section is a copy of the description of
142generators from the ``What's New in Python 2.2'' document; if you read
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000143it back when Python 2.2 came out, you can skip the rest of this section.
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000144
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000145You're doubtless familiar with how function calls work in Python or C.
146When you call a function, it gets a private namespace where its local
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000147variables are created. When the function reaches a \keyword{return}
148statement, the local variables are destroyed and the resulting value
149is returned to the caller. A later call to the same function will get
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000150a fresh new set of local variables. But, what if the local variables
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000151weren't thrown away on exiting a function? What if you could later
152resume the function where it left off? This is what generators
153provide; they can be thought of as resumable functions.
154
155Here's the simplest example of a generator function:
156
157\begin{verbatim}
158def generate_ints(N):
159 for i in range(N):
160 yield i
161\end{verbatim}
162
163A new keyword, \keyword{yield}, was introduced for generators. Any
164function containing a \keyword{yield} statement is a generator
165function; this is detected by Python's bytecode compiler which
Fred Drake5c4cf152002-11-13 14:59:06 +0000166compiles the function specially as a result.
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000167
168When you call a generator function, it doesn't return a single value;
169instead it returns a generator object that supports the iterator
170protocol. On executing the \keyword{yield} statement, the generator
171outputs the value of \code{i}, similar to a \keyword{return}
172statement. The big difference between \keyword{yield} and a
173\keyword{return} statement is that on reaching a \keyword{yield} the
174generator's state of execution is suspended and local variables are
175preserved. On the next call to the generator's \code{.next()} method,
176the function will resume executing immediately after the
177\keyword{yield} statement. (For complicated reasons, the
178\keyword{yield} statement isn't allowed inside the \keyword{try} block
Fred Drakeaac8c582003-01-17 22:50:10 +0000179of a \keyword{try}...\keyword{finally} statement; read \pep{255} for a full
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000180explanation of the interaction between \keyword{yield} and
181exceptions.)
182
Fred Drakeaac8c582003-01-17 22:50:10 +0000183Here's a sample usage of the \function{generate_ints()} generator:
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000184
185\begin{verbatim}
186>>> gen = generate_ints(3)
187>>> gen
188<generator object at 0x8117f90>
189>>> gen.next()
1900
191>>> gen.next()
1921
193>>> gen.next()
1942
195>>> gen.next()
196Traceback (most recent call last):
Andrew M. Kuchling9f6e1042002-06-17 13:40:04 +0000197 File "stdin", line 1, in ?
198 File "stdin", line 2, in generate_ints
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000199StopIteration
200\end{verbatim}
201
202You could equally write \code{for i in generate_ints(5)}, or
203\code{a,b,c = generate_ints(3)}.
204
205Inside a generator function, the \keyword{return} statement can only
206be used without a value, and signals the end of the procession of
207values; afterwards the generator cannot return any further values.
208\keyword{return} with a value, such as \code{return 5}, is a syntax
209error inside a generator function. The end of the generator's results
210can also be indicated by raising \exception{StopIteration} manually,
211or by just letting the flow of execution fall off the bottom of the
212function.
213
214You could achieve the effect of generators manually by writing your
215own class and storing all the local variables of the generator as
216instance variables. For example, returning a list of integers could
217be done by setting \code{self.count} to 0, and having the
218\method{next()} method increment \code{self.count} and return it.
219However, for a moderately complicated generator, writing a
220corresponding class would be much messier.
221\file{Lib/test/test_generators.py} contains a number of more
222interesting examples. The simplest one implements an in-order
223traversal of a tree using generators recursively.
224
225\begin{verbatim}
226# A recursive generator that generates Tree leaves in in-order.
227def inorder(t):
228 if t:
229 for x in inorder(t.left):
230 yield x
231 yield t.label
232 for x in inorder(t.right):
233 yield x
234\end{verbatim}
235
236Two other examples in \file{Lib/test/test_generators.py} produce
237solutions for the N-Queens problem (placing $N$ queens on an $NxN$
238chess board so that no queen threatens another) and the Knight's Tour
239(a route that takes a knight to every square of an $NxN$ chessboard
Fred Drake5c4cf152002-11-13 14:59:06 +0000240without visiting any square twice).
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000241
242The idea of generators comes from other programming languages,
243especially Icon (\url{http://www.cs.arizona.edu/icon/}), where the
244idea of generators is central. In Icon, every
245expression and function call behaves like a generator. One example
246from ``An Overview of the Icon Programming Language'' at
247\url{http://www.cs.arizona.edu/icon/docs/ipd266.htm} gives an idea of
248what this looks like:
249
250\begin{verbatim}
251sentence := "Store it in the neighboring harbor"
252if (i := find("or", sentence)) > 5 then write(i)
253\end{verbatim}
254
255In Icon the \function{find()} function returns the indexes at which the
256substring ``or'' is found: 3, 23, 33. In the \keyword{if} statement,
257\code{i} is first assigned a value of 3, but 3 is less than 5, so the
258comparison fails, and Icon retries it with the second value of 23. 23
259is greater than 5, so the comparison now succeeds, and the code prints
260the value 23 to the screen.
261
262Python doesn't go nearly as far as Icon in adopting generators as a
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000263central concept. Generators are considered part of the core
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +0000264Python language, but learning or using them isn't compulsory; if they
265don't solve any problems that you have, feel free to ignore them.
266One novel feature of Python's interface as compared to
267Icon's is that a generator's state is represented as a concrete object
268(the iterator) that can be passed around to other functions or stored
269in a data structure.
270
271\begin{seealso}
272
273\seepep{255}{Simple Generators}{Written by Neil Schemenauer, Tim
274Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer
275and Tim Peters, with other fixes from the Python Labs crew.}
276
277\end{seealso}
278
279
280%======================================================================
Fred Drake13090e12002-08-22 16:51:08 +0000281\section{PEP 263: Source Code Encodings \label{section-encodings}}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000282
283Python source files can now be declared as being in different
284character set encodings. Encodings are declared by including a
285specially formatted comment in the first or second line of the source
286file. For example, a UTF-8 file can be declared with:
287
288\begin{verbatim}
289#!/usr/bin/env python
290# -*- coding: UTF-8 -*-
291\end{verbatim}
292
293Without such an encoding declaration, the default encoding used is
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +00002947-bit ASCII. Executing or importing modules that contain string
295literals with 8-bit characters and have no encoding declaration will result
Andrew M. Kuchlingacddabc2003-02-18 00:43:24 +0000296in a \exception{DeprecationWarning} being signalled by Python 2.3; in
2972.4 this will be a syntax error.
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000298
Andrew M. Kuchlingacddabc2003-02-18 00:43:24 +0000299The encoding declaration only affects Unicode string literals, which
300will be converted to Unicode using the specified encoding. Note that
301Python identifiers are still restricted to ASCII characters, so you
302can't have variable names that use characters outside of the usual
303alphanumerics.
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000304
305\begin{seealso}
306
307\seepep{263}{Defining Python Source Code Encodings}{Written by
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +0000308Marc-Andr\'e Lemburg and Martin von~L\"owis; implemented by Suzuki
309Hisao and Martin von~L\"owis.}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +0000310
311\end{seealso}
312
313
314%======================================================================
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000315\section{PEP 277: Unicode file name support for Windows NT}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000316
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000317On Windows NT, 2000, and XP, the system stores file names as Unicode
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000318strings. Traditionally, Python has represented file names as byte
319strings, which is inadequate because it renders some file names
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000320inaccessible.
321
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000322Python now allows using arbitrary Unicode strings (within the
323limitations of the file system) for all functions that expect file
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000324names, most notably the \function{open()} built-in function. If a Unicode
325string is passed to \function{os.listdir()}, Python now returns a list
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000326of Unicode strings. A new function, \function{os.getcwdu()}, returns
327the current directory as a Unicode string.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000328
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000329Byte strings still work as file names, and on Windows Python will
330transparently convert them to Unicode using the \code{mbcs} encoding.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000331
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000332Other systems also allow Unicode strings as file names but convert
333them to byte strings before passing them to the system, which can
334cause a \exception{UnicodeError} to be raised. Applications can test
335whether arbitrary Unicode strings are supported as file names by
Andrew M. Kuchlingb9ba4e62003-02-03 15:16:15 +0000336checking \member{os.path.supports_unicode_filenames}, a Boolean value.
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000337
Andrew M. Kuchling563389f2003-03-02 02:31:58 +0000338Under MacOS, \function{os.listdir()} may now return Unicode filenames.
339
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000340\begin{seealso}
341
342\seepep{277}{Unicode file name support for Windows NT}{Written by Neil
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +0000343Hodgson; implemented by Neil Hodgson, Martin von~L\"owis, and Mark
Martin v. Löwisbd5e38d2002-10-07 18:52:29 +0000344Hammond.}
345
346\end{seealso}
Andrew M. Kuchling0f345562002-10-04 22:34:11 +0000347
348
349%======================================================================
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000350\section{PEP 278: Universal Newline Support}
351
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000352The three major operating systems used today are Microsoft Windows,
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000353Apple's Macintosh OS, and the various \UNIX\ derivatives. A minor
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +0000354irritation of cross-platform work
355is that these three platforms all use different characters
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000356to mark the ends of lines in text files. \UNIX\ uses the linefeed
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +0000357(ASCII character 10), MacOS uses the carriage return (ASCII
358character 13), and Windows uses a two-character sequence of a
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000359carriage return plus a newline.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000360
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000361Python's file objects can now support end of line conventions other
362than the one followed by the platform on which Python is running.
Fred Drake5c4cf152002-11-13 14:59:06 +0000363Opening a file with the mode \code{'U'} or \code{'rU'} will open a file
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000364for reading in universal newline mode. All three line ending
Fred Drake5c4cf152002-11-13 14:59:06 +0000365conventions will be translated to a \character{\e n} in the strings
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000366returned by the various file methods such as \method{read()} and
Fred Drake5c4cf152002-11-13 14:59:06 +0000367\method{readline()}.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000368
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000369Universal newline support is also used when importing modules and when
370executing a file with the \function{execfile()} function. This means
371that Python modules can be shared between all three operating systems
372without needing to convert the line-endings.
373
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +0000374This feature can be disabled when compiling Python by specifying
375the \longprogramopt{without-universal-newlines} switch when running Python's
Fred Drake5c4cf152002-11-13 14:59:06 +0000376\program{configure} script.
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000377
378\begin{seealso}
379
Fred Drake5c4cf152002-11-13 14:59:06 +0000380\seepep{278}{Universal Newline Support}{Written
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000381and implemented by Jack Jansen.}
382
383\end{seealso}
384
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000385
386%======================================================================
Andrew M. Kuchling433307b2003-05-13 14:23:54 +0000387\section{PEP 279: enumerate()\label{section-enumerate}}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000388
389A new built-in function, \function{enumerate()}, will make
390certain loops a bit clearer. \code{enumerate(thing)}, where
391\var{thing} is either an iterator or a sequence, returns a iterator
Fred Drake3605ae52003-07-16 03:26:31 +0000392that will return \code{(0, \var{thing}[0])}, \code{(1,
393\var{thing}[1])}, \code{(2, \var{thing}[2])}, and so forth.
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000394
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +0000395A common idiom to change every element of a list looks like this:
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000396
397\begin{verbatim}
398for i in range(len(L)):
399 item = L[i]
400 # ... compute some result based on item ...
401 L[i] = result
402\end{verbatim}
403
404This can be rewritten using \function{enumerate()} as:
405
406\begin{verbatim}
407for i, item in enumerate(L):
408 # ... compute some result based on item ...
409 L[i] = result
410\end{verbatim}
411
412
413\begin{seealso}
414
Fred Drake5c4cf152002-11-13 14:59:06 +0000415\seepep{279}{The enumerate() built-in function}{Written
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000416and implemented by Raymond D. Hettinger.}
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +0000417
418\end{seealso}
419
420
Andrew M. Kuchlingf3676512002-04-15 02:27:55 +0000421%======================================================================
Andrew M. Kuchling433307b2003-05-13 14:23:54 +0000422\section{PEP 282: The logging Package}
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000423
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000424A standard package for writing logs, \module{logging}, has been added
425to Python 2.3. It provides a powerful and flexible mechanism for
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000426generating logging output which can then be filtered and processed in
427various ways. A configuration file written in a standard format can
428be used to control the logging behavior of a program. Python
429includes handlers that will write log records to
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000430standard error or to a file or socket, send them to the system log, or
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000431even e-mail them to a particular address; of course, it's also
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000432possible to write your own handler classes.
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000433
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000434The \class{Logger} class is the primary class.
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000435Most application code will deal with one or more \class{Logger}
436objects, each one used by a particular subsystem of the application.
437Each \class{Logger} is identified by a name, and names are organized
438into a hierarchy using \samp{.} as the component separator. For
439example, you might have \class{Logger} instances named \samp{server},
440\samp{server.auth} and \samp{server.network}. The latter two
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000441instances are below \samp{server} in the hierarchy. This means that
442if you turn up the verbosity for \samp{server} or direct \samp{server}
443messages to a different handler, the changes will also apply to
444records logged to \samp{server.auth} and \samp{server.network}.
445There's also a root \class{Logger} that's the parent of all other
446loggers.
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000447
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000448For simple uses, the \module{logging} package contains some
449convenience functions that always use the root log:
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000450
451\begin{verbatim}
452import logging
453
454logging.debug('Debugging information')
455logging.info('Informational message')
Andrew M. Kuchling37495072003-02-19 13:46:18 +0000456logging.warning('Warning:config file %s not found', 'server.conf')
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000457logging.error('Error occurred')
458logging.critical('Critical error -- shutting down')
459\end{verbatim}
460
461This produces the following output:
462
463\begin{verbatim}
Andrew M. Kuchling37495072003-02-19 13:46:18 +0000464WARNING:root:Warning:config file server.conf not found
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000465ERROR:root:Error occurred
466CRITICAL:root:Critical error -- shutting down
467\end{verbatim}
468
469In the default configuration, informational and debugging messages are
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000470suppressed and the output is sent to standard error. You can enable
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000471the display of informational and debugging messages by calling the
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000472\method{setLevel()} method on the root logger.
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000473
Andrew M. Kuchling37495072003-02-19 13:46:18 +0000474Notice the \function{warning()} call's use of string formatting
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000475operators; all of the functions for logging messages take the
476arguments \code{(\var{msg}, \var{arg1}, \var{arg2}, ...)} and log the
477string resulting from \code{\var{msg} \% (\var{arg1}, \var{arg2},
478...)}.
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000479
480There's also an \function{exception()} function that records the most
481recent traceback. Any of the other functions will also record the
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000482traceback if you specify a true value for the keyword argument
Fred Drakeaac8c582003-01-17 22:50:10 +0000483\var{exc_info}.
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000484
485\begin{verbatim}
486def f():
487 try: 1/0
488 except: logging.exception('Problem recorded')
489
490f()
491\end{verbatim}
492
493This produces the following output:
494
495\begin{verbatim}
496ERROR:root:Problem recorded
497Traceback (most recent call last):
498 File "t.py", line 6, in f
499 1/0
500ZeroDivisionError: integer division or modulo by zero
501\end{verbatim}
502
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000503Slightly more advanced programs will use a logger other than the root
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000504logger. The \function{getLogger(\var{name})} function is used to get
505a particular log, creating it if it doesn't exist yet.
Andrew M. Kuchlingb1e4bf92002-12-03 13:35:17 +0000506\function{getLogger(None)} returns the root logger.
507
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000508
509\begin{verbatim}
510log = logging.getLogger('server')
511 ...
512log.info('Listening on port %i', port)
513 ...
514log.critical('Disk full')
515 ...
516\end{verbatim}
517
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000518Log records are usually propagated up the hierarchy, so a message
519logged to \samp{server.auth} is also seen by \samp{server} and
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +0000520\samp{root}, but a \class{Logger} can prevent this by setting its
Fred Drakeaac8c582003-01-17 22:50:10 +0000521\member{propagate} attribute to \constant{False}.
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000522
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000523There are more classes provided by the \module{logging} package that
524can be customized. When a \class{Logger} instance is told to log a
525message, it creates a \class{LogRecord} instance that is sent to any
526number of different \class{Handler} instances. Loggers and handlers
527can also have an attached list of filters, and each filter can cause
528the \class{LogRecord} to be ignored or can modify the record before
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +0000529passing it along. When they're finally output, \class{LogRecord}
530instances are converted to text by a \class{Formatter} class. All of
531these classes can be replaced by your own specially-written classes.
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000532
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +0000533With all of these features the \module{logging} package should provide
534enough flexibility for even the most complicated applications. This
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +0000535is only an incomplete overview of its features, so please see the
536\ulink{package's reference documentation}{../lib/module-logging.html}
537for all of the details. Reading \pep{282} will also be helpful.
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +0000538
539
540\begin{seealso}
541
542\seepep{282}{A Logging System}{Written by Vinay Sajip and Trent Mick;
543implemented by Vinay Sajip.}
544
545\end{seealso}
546
547
548%======================================================================
Andrew M. Kuchling433307b2003-05-13 14:23:54 +0000549\section{PEP 285: A Boolean Type\label{section-bool}}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000550
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000551A Boolean type was added to Python 2.3. Two new constants were added
552to the \module{__builtin__} module, \constant{True} and
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000553\constant{False}. (\constant{True} and
554\constant{False} constants were added to the built-ins
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000555in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of
Andrew M. Kuchling5a224532003-01-03 16:52:27 +00005561 and 0 and aren't a different type.)
557
558The type object for this new type is named
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000559\class{bool}; the constructor for it takes any Python value and
560converts it to \constant{True} or \constant{False}.
561
562\begin{verbatim}
563>>> bool(1)
564True
565>>> bool(0)
566False
567>>> bool([])
568False
569>>> bool( (1,) )
570True
571\end{verbatim}
572
573Most of the standard library modules and built-in functions have been
574changed to return Booleans.
575
576\begin{verbatim}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000577>>> obj = []
578>>> hasattr(obj, 'append')
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000579True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000580>>> isinstance(obj, list)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000581True
Andrew M. Kuchling517109b2002-05-07 21:01:16 +0000582>>> isinstance(obj, tuple)
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000583False
584\end{verbatim}
585
586Python's Booleans were added with the primary goal of making code
587clearer. For example, if you're reading a function and encounter the
Fred Drake5c4cf152002-11-13 14:59:06 +0000588statement \code{return 1}, you might wonder whether the \code{1}
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000589represents a Boolean truth value, an index, or a
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000590coefficient that multiplies some other quantity. If the statement is
591\code{return True}, however, the meaning of the return value is quite
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000592clear.
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000593
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000594Python's Booleans were \emph{not} added for the sake of strict
595type-checking. A very strict language such as Pascal would also
596prevent you performing arithmetic with Booleans, and would require
597that the expression in an \keyword{if} statement always evaluate to a
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000598Boolean result. Python is not this strict and never will be, as
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000599\pep{285} explicitly says. This means you can still use any
600expression in an \keyword{if} statement, even ones that evaluate to a
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000601list or tuple or some random object. The Boolean type is a
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000602subclass of the \class{int} class so that arithmetic using a Boolean
603still works.
Andrew M. Kuchling821013e2002-05-06 17:46:39 +0000604
605\begin{verbatim}
606>>> True + 1
6072
608>>> False + 1
6091
610>>> False * 75
6110
612>>> True * 75
61375
614\end{verbatim}
615
616To sum up \constant{True} and \constant{False} in a sentence: they're
617alternative ways to spell the integer values 1 and 0, with the single
618difference that \function{str()} and \function{repr()} return the
Fred Drake5c4cf152002-11-13 14:59:06 +0000619strings \code{'True'} and \code{'False'} instead of \code{'1'} and
620\code{'0'}.
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000621
622\begin{seealso}
623
624\seepep{285}{Adding a bool type}{Written and implemented by GvR.}
625
626\end{seealso}
627
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000628
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000629%======================================================================
630\section{PEP 293: Codec Error Handling Callbacks}
631
Martin v. Löwis20eae692002-10-07 19:01:07 +0000632When encoding a Unicode string into a byte string, unencodable
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000633characters may be encountered. So far, Python has allowed specifying
634the error processing as either ``strict'' (raising
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000635\exception{UnicodeError}), ``ignore'' (skipping the character), or
636``replace'' (using a question mark in the output string), with
637``strict'' being the default behavior. It may be desirable to specify
638alternative processing of such errors, such as inserting an XML
639character reference or HTML entity reference into the converted
640string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000641
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +0000642Python now has a flexible framework to add different processing
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000643strategies. New error handlers can be added with
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000644\function{codecs.register_error}, and codecs then can access the error
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000645handler with \function{codecs.lookup_error}. An equivalent C API has
646been added for codecs written in C. The error handler gets the
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000647necessary state information such as the string being converted, the
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000648position in the string where the error was detected, and the target
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000649encoding. The handler can then either raise an exception or return a
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000650replacement string.
Martin v. Löwis20eae692002-10-07 19:01:07 +0000651
652Two additional error handlers have been implemented using this
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000653framework: ``backslashreplace'' uses Python backslash quoting to
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +0000654represent unencodable characters and ``xmlcharrefreplace'' emits
Martin v. Löwis20eae692002-10-07 19:01:07 +0000655XML character references.
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000656
657\begin{seealso}
658
Fred Drake5c4cf152002-11-13 14:59:06 +0000659\seepep{293}{Codec Error Handling Callbacks}{Written and implemented by
Andrew M. Kuchling0a6fa962002-10-09 12:11:10 +0000660Walter D\"orwald.}
Andrew M. Kuchling65b72822002-09-03 00:53:21 +0000661
662\end{seealso}
663
664
665%======================================================================
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000666\section{PEP 273: Importing Modules from Zip Archives}
667
668The new \module{zipimport} module adds support for importing
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000669modules from a ZIP-format archive. You don't need to import the
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000670module explicitly; it will be automatically imported if a ZIP
671archive's filename is added to \code{sys.path}. For example:
672
673\begin{verbatim}
674amk@nyman:~/src/python$ unzip -l /tmp/example.zip
675Archive: /tmp/example.zip
676 Length Date Time Name
677 -------- ---- ---- ----
678 8467 11-26-02 22:30 jwzthreading.py
679 -------- -------
680 8467 1 file
681amk@nyman:~/src/python$ ./python
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000682Python 2.3 (#1, Aug 1 2003, 19:54:32)
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000683>>> import sys
684>>> sys.path.insert(0, '/tmp/example.zip') # Add .zip file to front of path
685>>> import jwzthreading
686>>> jwzthreading.__file__
687'/tmp/example.zip/jwzthreading.py'
688>>>
689\end{verbatim}
690
691An entry in \code{sys.path} can now be the filename of a ZIP archive.
692The ZIP archive can contain any kind of files, but only files named
Fred Drakeaac8c582003-01-17 22:50:10 +0000693\file{*.py}, \file{*.pyc}, or \file{*.pyo} can be imported. If an
694archive only contains \file{*.py} files, Python will not attempt to
695modify the archive by adding the corresponding \file{*.pyc} file, meaning
696that if a ZIP archive doesn't contain \file{*.pyc} files, importing may be
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000697rather slow.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000698
699A path within the archive can also be specified to only import from a
700subdirectory; for example, the path \file{/tmp/example.zip/lib/}
701would only import from the \file{lib/} subdirectory within the
702archive.
703
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000704\begin{seealso}
705
706\seepep{273}{Import Modules from Zip Archives}{Written by James C. Ahlstrom,
707who also provided an implementation.
708Python 2.3 follows the specification in \pep{273},
Andrew M. Kuchlingae3bbf52002-12-31 14:03:45 +0000709but uses an implementation written by Just van~Rossum
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000710that uses the import hooks described in \pep{302}.
711See section~\ref{section-pep302} for a description of the new import hooks.
712}
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000713
714\end{seealso}
715
716%======================================================================
Fred Drake693aea22003-02-07 14:52:18 +0000717\section{PEP 301: Package Index and Metadata for
718Distutils\label{section-pep301}}
Andrew M. Kuchling87cebbf2003-01-03 16:24:28 +0000719
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000720Support for the long-requested Python catalog makes its first
721appearance in 2.3.
722
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000723The heart of the catalog is the new Distutils \command{register} command.
Fred Drake693aea22003-02-07 14:52:18 +0000724Running \code{python setup.py register} will collect the metadata
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000725describing a package, such as its name, version, maintainer,
Andrew M. Kuchlingc61402b2003-02-26 19:00:52 +0000726description, \&c., and send it to a central catalog server. The
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000727resulting catalog is available from \url{http://www.python.org/pypi}.
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000728
729To make the catalog a bit more useful, a new optional
Fred Drake693aea22003-02-07 14:52:18 +0000730\var{classifiers} keyword argument has been added to the Distutils
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000731\function{setup()} function. A list of
Fred Drake693aea22003-02-07 14:52:18 +0000732\ulink{Trove}{http://catb.org/\textasciitilde esr/trove/}-style
733strings can be supplied to help classify the software.
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000734
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +0000735Here's an example \file{setup.py} with classifiers, written to be compatible
736with older versions of the Distutils:
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000737
738\begin{verbatim}
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +0000739from distutils import core
Fred Drake693aea22003-02-07 14:52:18 +0000740kw = {'name': "Quixote",
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +0000741 'version': "0.5.1",
742 'description': "A highly Pythonic Web application framework",
Fred Drake693aea22003-02-07 14:52:18 +0000743 # ...
744 }
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +0000745
Andrew M. Kuchlinga6b1c752003-04-09 17:26:38 +0000746if (hasattr(core, 'setup_keywords') and
747 'classifiers' in core.setup_keywords):
Fred Drake693aea22003-02-07 14:52:18 +0000748 kw['classifiers'] = \
749 ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
750 'Environment :: No Input/Output (Daemon)',
751 'Intended Audience :: Developers'],
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +0000752
Fred Drake693aea22003-02-07 14:52:18 +0000753core.setup(**kw)
Andrew M. Kuchling5a224532003-01-03 16:52:27 +0000754\end{verbatim}
755
756The full list of classifiers can be obtained by running
757\code{python setup.py register --list-classifiers}.
Andrew M. Kuchling87cebbf2003-01-03 16:24:28 +0000758
759\begin{seealso}
760
Fred Drake693aea22003-02-07 14:52:18 +0000761\seepep{301}{Package Index and Metadata for Distutils}{Written and
762implemented by Richard Jones.}
Andrew M. Kuchling87cebbf2003-01-03 16:24:28 +0000763
764\end{seealso}
765
766
767%======================================================================
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000768\section{PEP 302: New Import Hooks \label{section-pep302}}
769
770While it's been possible to write custom import hooks ever since the
771\module{ihooks} module was introduced in Python 1.3, no one has ever
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000772been really happy with it because writing new import hooks is
773difficult and messy. There have been various proposed alternatives
774such as the \module{imputil} and \module{iu} modules, but none of them
775has ever gained much acceptance, and none of them were easily usable
776from \C{} code.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000777
778\pep{302} borrows ideas from its predecessors, especially from
779Gordon McMillan's \module{iu} module. Three new items
780are added to the \module{sys} module:
781
782\begin{itemize}
Andrew M. Kuchlingd5ac8d02003-01-02 21:33:15 +0000783 \item \code{sys.path_hooks} is a list of callable objects; most
Fred Drakeaac8c582003-01-17 22:50:10 +0000784 often they'll be classes. Each callable takes a string containing a
785 path and either returns an importer object that will handle imports
786 from this path or raises an \exception{ImportError} exception if it
787 can't handle this path.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000788
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000789 \item \code{sys.path_importer_cache} caches importer objects for
Fred Drakeaac8c582003-01-17 22:50:10 +0000790 each path, so \code{sys.path_hooks} will only need to be traversed
791 once for each path.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000792
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000793 \item \code{sys.meta_path} is a list of importer objects that will
794 be traversed before \code{sys.path} is checked. This list is
795 initially empty, but user code can add objects to it. Additional
796 built-in and frozen modules can be imported by an object added to
797 this list.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000798
799\end{itemize}
800
801Importer objects must have a single method,
802\method{find_module(\var{fullname}, \var{path}=None)}. \var{fullname}
803will be a module or package name, e.g. \samp{string} or
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000804\samp{distutils.core}. \method{find_module()} must return a loader object
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000805that has a single method, \method{load_module(\var{fullname})}, that
806creates and returns the corresponding module object.
807
808Pseudo-code for Python's new import logic, therefore, looks something
809like this (simplified a bit; see \pep{302} for the full details):
810
811\begin{verbatim}
812for mp in sys.meta_path:
813 loader = mp(fullname)
814 if loader is not None:
Andrew M. Kuchlingd5ac8d02003-01-02 21:33:15 +0000815 <module> = loader.load_module(fullname)
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000816
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000817for path in sys.path:
818 for hook in sys.path_hooks:
Andrew M. Kuchlingd5ac8d02003-01-02 21:33:15 +0000819 try:
820 importer = hook(path)
821 except ImportError:
822 # ImportError, so try the other path hooks
823 pass
824 else:
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000825 loader = importer.find_module(fullname)
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000826 <module> = loader.load_module(fullname)
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000827
828# Not found!
829raise ImportError
830\end{verbatim}
831
832\begin{seealso}
833
834\seepep{302}{New Import Hooks}{Written by Just van~Rossum and Paul Moore.
Andrew M. Kuchlingae3bbf52002-12-31 14:03:45 +0000835Implemented by Just van~Rossum.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +0000836}
837
838\end{seealso}
839
840
841%======================================================================
Andrew M. Kuchlinga978e102003-03-21 18:10:12 +0000842\section{PEP 305: Comma-separated Files \label{section-pep305}}
843
844Comma-separated files are a format frequently used for exporting data
845from databases and spreadsheets. Python 2.3 adds a parser for
846comma-separated files.
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000847
848Comma-separated format is deceptively simple at first glance:
Andrew M. Kuchlinga978e102003-03-21 18:10:12 +0000849
850\begin{verbatim}
851Costs,150,200,3.95
852\end{verbatim}
853
854Read a line and call \code{line.split(',')}: what could be simpler?
855But toss in string data that can contain commas, and things get more
856complicated:
857
858\begin{verbatim}
859"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
860\end{verbatim}
861
862A big ugly regular expression can parse this, but using the new
863\module{csv} package is much simpler:
864
865\begin{verbatim}
Andrew M. Kuchlingba887bb2003-04-13 21:13:02 +0000866import csv
Andrew M. Kuchlinga978e102003-03-21 18:10:12 +0000867
868input = open('datafile', 'rb')
869reader = csv.reader(input)
870for line in reader:
871 print line
872\end{verbatim}
873
874The \function{reader} function takes a number of different options.
875The field separator isn't limited to the comma and can be changed to
876any character, and so can the quoting and line-ending characters.
877
878Different dialects of comma-separated files can be defined and
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000879registered; currently there are two dialects, both used by Microsoft Excel.
Andrew M. Kuchlinga978e102003-03-21 18:10:12 +0000880A separate \class{csv.writer} class will generate comma-separated files
881from a succession of tuples or lists, quoting strings that contain the
882delimiter.
883
884\begin{seealso}
885
886\seepep{305}{CSV File API}{Written and implemented
887by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells.
888}
889
890\end{seealso}
891
892%======================================================================
Andrew M. Kuchlinga092ba12003-03-21 18:32:43 +0000893\section{PEP 307: Pickle Enhancements \label{section-pep305}}
894
895The \module{pickle} and \module{cPickle} modules received some
896attention during the 2.3 development cycle. In 2.2, new-style classes
Andrew M. Kuchlinga6b1c752003-04-09 17:26:38 +0000897could be pickled without difficulty, but they weren't pickled very
Andrew M. Kuchlinga092ba12003-03-21 18:32:43 +0000898compactly; \pep{307} quotes a trivial example where a new-style class
899results in a pickled string three times longer than that for a classic
900class.
901
902The solution was to invent a new pickle protocol. The
903\function{pickle.dumps()} function has supported a text-or-binary flag
904for a long time. In 2.3, this flag is redefined from a Boolean to an
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000905integer: 0 is the old text-mode pickle format, 1 is the old binary
906format, and now 2 is a new 2.3-specific format. A new constant,
Andrew M. Kuchlinga092ba12003-03-21 18:32:43 +0000907\constant{pickle.HIGHEST_PROTOCOL}, can be used to select the fanciest
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000908protocol available.
Andrew M. Kuchlinga092ba12003-03-21 18:32:43 +0000909
910Unpickling is no longer considered a safe operation. 2.2's
911\module{pickle} provided hooks for trying to prevent unsafe classes
912from being unpickled (specifically, a
913\member{__safe_for_unpickling__} attribute), but none of this code
914was ever audited and therefore it's all been ripped out in 2.3. You
915should not unpickle untrusted data in any version of Python.
916
917To reduce the pickling overhead for new-style classes, a new interface
918for customizing pickling was added using three special methods:
919\method{__getstate__}, \method{__setstate__}, and
920\method{__getnewargs__}. Consult \pep{307} for the full semantics
921of these methods.
922
923As a way to compress pickles yet further, it's now possible to use
924integer codes instead of long strings to identify pickled classes.
925The Python Software Foundation will maintain a list of standardized
926codes; there's also a range of codes for private use. Currently no
927codes have been specified.
928
929\begin{seealso}
930
931\seepep{307}{Extensions to the pickle protocol}{Written and implemented
932by Guido van Rossum and Tim Peters.}
933
934\end{seealso}
935
936%======================================================================
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000937\section{Extended Slices\label{section-slices}}
Michael W. Hudson5efaf7e2002-06-11 10:55:12 +0000938
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000939Ever since Python 1.4, the slicing syntax has supported an optional
940third ``step'' or ``stride'' argument. For example, these are all
941legal Python syntax: \code{L[1:10:2]}, \code{L[:-1:1]},
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000942\code{L[::-1]}. This was added to Python at the request of
943the developers of Numerical Python, which uses the third argument
944extensively. However, Python's built-in list, tuple, and string
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000945sequence types have never supported this feature, raising a
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000946\exception{TypeError} if you tried it. Michael Hudson contributed a
947patch to fix this shortcoming.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000948
949For example, you can now easily extract the elements of a list that
950have even indexes:
Fred Drakedf872a22002-07-03 12:02:01 +0000951
952\begin{verbatim}
953>>> L = range(10)
954>>> L[::2]
955[0, 2, 4, 6, 8]
956\end{verbatim}
957
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000958Negative values also work to make a copy of the same list in reverse
959order:
Fred Drakedf872a22002-07-03 12:02:01 +0000960
961\begin{verbatim}
962>>> L[::-1]
963[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
964\end{verbatim}
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +0000965
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000966This also works for tuples, arrays, and strings:
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +0000967
968\begin{verbatim}
969>>> s='abcd'
970>>> s[::2]
971'ac'
972>>> s[::-1]
973'dcba'
974\end{verbatim}
975
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000976If you have a mutable sequence such as a list or an array you can
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000977assign to or delete an extended slice, but there are some differences
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000978between assignment to extended and regular slices. Assignment to a
979regular slice can be used to change the length of the sequence:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000980
981\begin{verbatim}
982>>> a = range(3)
983>>> a
984[0, 1, 2]
985>>> a[1:3] = [4, 5, 6]
986>>> a
987[0, 4, 5, 6]
988\end{verbatim}
989
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000990Extended slices aren't this flexible. When assigning to an extended
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +0000991slice, the list on the right hand side of the statement must contain
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +0000992the same number of items as the slice it is replacing:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +0000993
994\begin{verbatim}
995>>> a = range(4)
996>>> a
997[0, 1, 2, 3]
998>>> a[::2]
999[0, 2]
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001000>>> a[::2] = [0, -1]
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001001>>> a
1002[0, 1, -1, 3]
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001003>>> a[::2] = [0,1,2]
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001004Traceback (most recent call last):
1005 File "<stdin>", line 1, in ?
Raymond Hettingeree1bded2003-01-17 16:20:23 +00001006ValueError: attempt to assign sequence of size 3 to extended slice of size 2
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001007\end{verbatim}
1008
1009Deletion is more straightforward:
1010
1011\begin{verbatim}
1012>>> a = range(4)
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001013>>> a
1014[0, 1, 2, 3]
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001015>>> a[::2]
1016[0, 2]
1017>>> del a[::2]
1018>>> a
1019[1, 3]
1020\end{verbatim}
1021
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001022One can also now pass slice objects to the
1023\method{__getitem__} methods of the built-in sequences:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001024
1025\begin{verbatim}
1026>>> range(10).__getitem__(slice(0, 5, 2))
1027[0, 2, 4]
1028\end{verbatim}
1029
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001030Or use slice objects directly in subscripts:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001031
1032\begin{verbatim}
1033>>> range(10)[slice(0, 5, 2)]
1034[0, 2, 4]
1035\end{verbatim}
1036
Andrew M. Kuchlingb6f79592002-11-29 19:43:45 +00001037To simplify implementing sequences that support extended slicing,
1038slice objects now have a method \method{indices(\var{length})} which,
Fred Drakeaac8c582003-01-17 22:50:10 +00001039given the length of a sequence, returns a \code{(\var{start},
1040\var{stop}, \var{step})} tuple that can be passed directly to
1041\function{range()}.
Andrew M. Kuchlingb6f79592002-11-29 19:43:45 +00001042\method{indices()} handles omitted and out-of-bounds indices in a
1043manner consistent with regular slices (and this innocuous phrase hides
1044a welter of confusing details!). The method is intended to be used
1045like this:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001046
1047\begin{verbatim}
1048class FakeSeq:
1049 ...
1050 def calc_item(self, i):
1051 ...
1052 def __getitem__(self, item):
1053 if isinstance(item, slice):
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001054 indices = item.indices(len(self))
1055 return FakeSeq([self.calc_item(i) in range(*indices)])
Fred Drake5c4cf152002-11-13 14:59:06 +00001056 else:
Michael W. Hudson4da01ed2002-07-19 15:48:56 +00001057 return self.calc_item(i)
1058\end{verbatim}
1059
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001060From this example you can also see that the built-in \class{slice}
Andrew M. Kuchling90e9a792002-08-15 00:40:21 +00001061object is now the type object for the slice type, and is no longer a
1062function. This is consistent with Python 2.2, where \class{int},
1063\class{str}, etc., underwent the same change.
1064
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001065
Andrew M. Kuchling3a52ff62002-04-03 22:44:47 +00001066%======================================================================
Fred Drakedf872a22002-07-03 12:02:01 +00001067\section{Other Language Changes}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001068
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001069Here are all of the changes that Python 2.3 makes to the core Python
1070language.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001071
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001072\begin{itemize}
1073\item The \keyword{yield} statement is now always a keyword, as
1074described in section~\ref{section-generators} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001075
Fred Drake5c4cf152002-11-13 14:59:06 +00001076\item A new built-in function \function{enumerate()}
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001077was added, as described in section~\ref{section-enumerate} of this
1078document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001079
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001080\item Two new constants, \constant{True} and \constant{False} were
1081added along with the built-in \class{bool} type, as described in
1082section~\ref{section-bool} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001083
Andrew M. Kuchling495172c2002-11-20 13:50:15 +00001084\item The \function{int()} type constructor will now return a long
1085integer instead of raising an \exception{OverflowError} when a string
1086or floating-point number is too large to fit into an integer. This
1087can lead to the paradoxical result that
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001088\code{isinstance(int(\var{expression}), int)} is false, but that seems
1089unlikely to cause problems in practice.
Andrew M. Kuchling495172c2002-11-20 13:50:15 +00001090
Fred Drake5c4cf152002-11-13 14:59:06 +00001091\item Built-in types now support the extended slicing syntax,
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001092as described in section~\ref{section-slices} of this document.
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001093
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001094\item A new built-in function, \function{sum(\var{iterable}, \var{start}=0)},
1095adds up the numeric items in the iterable object and returns their sum.
1096\function{sum()} only accepts numbers, meaning that you can't use it
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +00001097to concatenate a bunch of strings. (Contributed by Alex
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001098Martelli.)
1099
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +00001100\item \code{list.insert(\var{pos}, \var{value})} used to
1101insert \var{value} at the front of the list when \var{pos} was
1102negative. The behaviour has now been changed to be consistent with
1103slice indexing, so when \var{pos} is -1 the value will be inserted
1104before the last element, and so forth.
1105
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +00001106\item \code{list.index(\var{value})}, which searches for \var{value}
1107within the list and returns its index, now takes optional
1108\var{start} and \var{stop} arguments to limit the search to
1109only part of the list.
1110
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001111\item Dictionaries have a new method, \method{pop(\var{key}\optional{,
1112\var{default}})}, that returns the value corresponding to \var{key}
1113and removes that key/value pair from the dictionary. If the requested
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001114key isn't present in the dictionary, \var{default} is returned if it's
1115specified and \exception{KeyError} raised if it isn't.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001116
1117\begin{verbatim}
1118>>> d = {1:2}
1119>>> d
1120{1: 2}
1121>>> d.pop(4)
1122Traceback (most recent call last):
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +00001123 File "stdin", line 1, in ?
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001124KeyError: 4
1125>>> d.pop(1)
11262
1127>>> d.pop(1)
1128Traceback (most recent call last):
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +00001129 File "stdin", line 1, in ?
Raymond Hettingeree1bded2003-01-17 16:20:23 +00001130KeyError: 'pop(): dictionary is empty'
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001131>>> d
1132{}
1133>>>
1134\end{verbatim}
1135
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00001136There's also a new class method,
1137\method{dict.fromkeys(\var{iterable}, \var{value})}, that
1138creates a dictionary with keys taken from the supplied iterator
1139\var{iterable} and all values set to \var{value}, defaulting to
1140\code{None}.
1141
1142(Patches contributed by Raymond Hettinger.)
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001143
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001144Also, the \function{dict()} constructor now accepts keyword arguments to
Raymond Hettinger45bda572002-12-14 20:20:45 +00001145simplify creating small dictionaries:
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001146
1147\begin{verbatim}
1148>>> dict(red=1, blue=2, green=3, black=4)
1149{'blue': 2, 'black': 4, 'green': 3, 'red': 1}
1150\end{verbatim}
1151
Andrew M. Kuchlingae3bbf52002-12-31 14:03:45 +00001152(Contributed by Just van~Rossum.)
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001153
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00001154\item The \keyword{assert} statement no longer checks the \code{__debug__}
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001155flag, so you can no longer disable assertions by assigning to \code{__debug__}.
Fred Drake5c4cf152002-11-13 14:59:06 +00001156Running Python with the \programopt{-O} switch will still generate
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001157code that doesn't execute any assertions.
1158
1159\item Most type objects are now callable, so you can use them
1160to create new objects such as functions, classes, and modules. (This
1161means that the \module{new} module can be deprecated in a future
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001162Python version, because you can now use the type objects available in
1163the \module{types} module.)
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001164% XXX should new.py use PendingDeprecationWarning?
1165For example, you can create a new module object with the following code:
1166
1167\begin{verbatim}
1168>>> import types
1169>>> m = types.ModuleType('abc','docstring')
1170>>> m
1171<module 'abc' (built-in)>
1172>>> m.__doc__
1173'docstring'
1174\end{verbatim}
1175
Fred Drake5c4cf152002-11-13 14:59:06 +00001176\item
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001177A new warning, \exception{PendingDeprecationWarning} was added to
1178indicate features which are in the process of being
1179deprecated. The warning will \emph{not} be printed by default. To
1180check for use of features that will be deprecated in the future,
1181supply \programopt{-Walways::PendingDeprecationWarning::} on the
1182command line or use \function{warnings.filterwarnings()}.
1183
Andrew M. Kuchlingc1dd1742003-01-13 13:59:22 +00001184\item The process of deprecating string-based exceptions, as
1185in \code{raise "Error occurred"}, has begun. Raising a string will
1186now trigger \exception{PendingDeprecationWarning}.
1187
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001188\item Using \code{None} as a variable name will now result in a
1189\exception{SyntaxWarning} warning. In a future version of Python,
1190\code{None} may finally become a keyword.
1191
Andrew M. Kuchlingb60ea3f2002-11-15 14:37:10 +00001192\item The method resolution order used by new-style classes has
1193changed, though you'll only notice the difference if you have a really
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +00001194complicated inheritance hierarchy. Classic classes are unaffected by
1195this change. Python 2.2 originally used a topological sort of a
Andrew M. Kuchlingb60ea3f2002-11-15 14:37:10 +00001196class's ancestors, but 2.3 now uses the C3 algorithm as described in
Andrew M. Kuchling6f429c32002-11-19 13:09:00 +00001197the paper \ulink{``A Monotonic Superclass Linearization for
1198Dylan''}{http://www.webcom.com/haahr/dylan/linearization-oopsla96.html}.
Andrew M. Kuchlingc1dd1742003-01-13 13:59:22 +00001199To understand the motivation for this change,
1200read Michele Simionato's article
Fred Drake693aea22003-02-07 14:52:18 +00001201\ulink{``Python 2.3 Method Resolution Order''}
Andrew M. Kuchlingb8a39052003-02-07 20:22:33 +00001202 {http://www.python.org/2.3/mro.html}, or
Andrew M. Kuchlingc1dd1742003-01-13 13:59:22 +00001203read the thread on python-dev starting with the message at
Andrew M. Kuchlingb60ea3f2002-11-15 14:37:10 +00001204\url{http://mail.python.org/pipermail/python-dev/2002-October/029035.html}.
1205Samuele Pedroni first pointed out the problem and also implemented the
1206fix by coding the C3 algorithm.
1207
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001208\item Python runs multithreaded programs by switching between threads
1209after executing N bytecodes. The default value for N has been
1210increased from 10 to 100 bytecodes, speeding up single-threaded
1211applications by reducing the switching overhead. Some multithreaded
1212applications may suffer slower response time, but that's easily fixed
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001213by setting the limit back to a lower number using
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001214\function{sys.setcheckinterval(\var{N})}.
Andrew M. Kuchlingc760c6c2003-07-16 20:12:33 +00001215The limit can be retrieved with the new
1216\function{sys.getcheckinterval()} function.
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001217
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001218\item One minor but far-reaching change is that the names of extension
1219types defined by the modules included with Python now contain the
Fred Drake5c4cf152002-11-13 14:59:06 +00001220module and a \character{.} in front of the type name. For example, in
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001221Python 2.2, if you created a socket and printed its
1222\member{__class__}, you'd get this output:
1223
1224\begin{verbatim}
1225>>> s = socket.socket()
1226>>> s.__class__
1227<type 'socket'>
1228\end{verbatim}
1229
1230In 2.3, you get this:
1231\begin{verbatim}
1232>>> s.__class__
1233<type '_socket.socket'>
1234\end{verbatim}
1235
Michael W. Hudson96bc3b42002-11-26 14:48:23 +00001236\item One of the noted incompatibilities between old- and new-style
1237 classes has been removed: you can now assign to the
1238 \member{__name__} and \member{__bases__} attributes of new-style
1239 classes. There are some restrictions on what can be assigned to
1240 \member{__bases__} along the lines of those relating to assigning to
1241 an instance's \member{__class__} attribute.
1242
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001243\end{itemize}
1244
1245
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00001246%======================================================================
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001247\subsection{String Changes}
1248
1249\begin{itemize}
1250
Fred Drakeaac8c582003-01-17 22:50:10 +00001251\item The \keyword{in} operator now works differently for strings.
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001252Previously, when evaluating \code{\var{X} in \var{Y}} where \var{X}
1253and \var{Y} are strings, \var{X} could only be a single character.
1254That's now changed; \var{X} can be a string of any length, and
1255\code{\var{X} in \var{Y}} will return \constant{True} if \var{X} is a
1256substring of \var{Y}. If \var{X} is the empty string, the result is
1257always \constant{True}.
1258
1259\begin{verbatim}
1260>>> 'ab' in 'abcd'
1261True
1262>>> 'ad' in 'abcd'
1263False
1264>>> '' in 'abcd'
1265True
1266\end{verbatim}
1267
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001268Note that this doesn't tell you where the substring starts; if you
Andrew M. Kuchlingaa9b39f2003-07-16 20:37:26 +00001269need that information, use the \method{find()} string method.
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001270
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001271\item The \method{strip()}, \method{lstrip()}, and \method{rstrip()}
1272string methods now have an optional argument for specifying the
1273characters to strip. The default is still to remove all whitespace
1274characters:
1275
1276\begin{verbatim}
1277>>> ' abc '.strip()
1278'abc'
1279>>> '><><abc<><><>'.strip('<>')
1280'abc'
1281>>> '><><abc<><><>\n'.strip('<>')
1282'abc<><><>\n'
1283>>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
1284u'\u4001abc'
1285>>>
1286\end{verbatim}
1287
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001288(Suggested by Simon Brunning and implemented by Walter D\"orwald.)
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00001289
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001290\item The \method{startswith()} and \method{endswith()}
1291string methods now accept negative numbers for the start and end
1292parameters.
1293
1294\item Another new string method is \method{zfill()}, originally a
1295function in the \module{string} module. \method{zfill()} pads a
1296numeric string with zeros on the left until it's the specified width.
1297Note that the \code{\%} operator is still more flexible and powerful
1298than \method{zfill()}.
1299
1300\begin{verbatim}
1301>>> '45'.zfill(4)
1302'0045'
1303>>> '12345'.zfill(4)
1304'12345'
1305>>> 'goofy'.zfill(6)
1306'0goofy'
1307\end{verbatim}
1308
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00001309(Contributed by Walter D\"orwald.)
1310
Fred Drake5c4cf152002-11-13 14:59:06 +00001311\item A new type object, \class{basestring}, has been added.
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001312 Both 8-bit strings and Unicode strings inherit from this type, so
1313 \code{isinstance(obj, basestring)} will return \constant{True} for
1314 either kind of string. It's a completely abstract type, so you
1315 can't create \class{basestring} instances.
1316
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001317\item Interned strings are no longer immortal, and will now be
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001318garbage-collected in the usual way when the only reference to them is
1319from the internal dictionary of interned strings. (Implemented by
1320Oren Tirosh.)
1321
1322\end{itemize}
1323
1324
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00001325%======================================================================
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001326\subsection{Optimizations}
1327
1328\begin{itemize}
1329
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001330\item The creation of new-style class instances has been made much
1331faster; they're now faster than classic classes!
1332
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001333\item The \method{sort()} method of list objects has been extensively
1334rewritten by Tim Peters, and the implementation is significantly
1335faster.
1336
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001337\item Multiplication of large long integers is now much faster thanks
1338to an implementation of Karatsuba multiplication, an algorithm that
1339scales better than the O(n*n) required for the grade-school
1340multiplication algorithm. (Original patch by Christopher A. Craig,
1341and significantly reworked by Tim Peters.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001342
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001343\item The \code{SET_LINENO} opcode is now gone. This may provide a
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001344small speed increase, depending on your compiler's idiosyncrasies.
1345See section~\ref{section-other} for a longer explanation.
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001346(Removed by Michael Hudson.)
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001347
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001348\item \function{xrange()} objects now have their own iterator, making
1349\code{for i in xrange(n)} slightly faster than
1350\code{for i in range(n)}. (Patch by Raymond Hettinger.)
1351
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001352\item A number of small rearrangements have been made in various
1353hotspots to improve performance, inlining a function here, removing
1354some code there. (Implemented mostly by GvR, but lots of people have
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001355contributed single changes.)
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00001356
1357\end{itemize}
Neal Norwitzd68f5172002-05-29 15:54:55 +00001358
Andrew M. Kuchling6974aa92002-08-20 00:54:36 +00001359
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00001360%======================================================================
Andrew M. Kuchlingef893fe2003-01-06 20:04:17 +00001361\section{New, Improved, and Deprecated Modules}
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001362
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001363As usual, Python's standard library received a number of enhancements and
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001364bug fixes. Here's a partial list of the most notable changes, sorted
1365alphabetically by module name. Consult the
1366\file{Misc/NEWS} file in the source tree for a more
1367complete list of changes, or look through the CVS logs for all the
1368details.
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001369
1370\begin{itemize}
1371
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001372\item The \module{array} module now supports arrays of Unicode
Fred Drake5c4cf152002-11-13 14:59:06 +00001373characters using the \character{u} format character. Arrays also now
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001374support using the \code{+=} assignment operator to add another array's
1375contents, and the \code{*=} assignment operator to repeat an array.
1376(Contributed by Jason Orendorff.)
1377
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001378\item The \module{bsddb} module has been replaced by version 4.1.1
Andrew M. Kuchling669249e2002-11-19 13:05:33 +00001379of the \ulink{PyBSDDB}{http://pybsddb.sourceforge.net} package,
1380providing a more complete interface to the transactional features of
1381the BerkeleyDB library.
1382The old version of the module has been renamed to
1383\module{bsddb185} and is no longer built automatically; you'll
1384have to edit \file{Modules/Setup} to enable it. Note that the new
1385\module{bsddb} package is intended to be compatible with the
1386old module, so be sure to file bugs if you discover any
Skip Montanaro959c7722003-03-07 15:45:15 +00001387incompatibilities. When upgrading to Python 2.3, if you also change
1388the underlying BerkeleyDB library, you will almost certainly have to
1389convert your database files to the new version. You can do this
1390fairly easily with the new scripts \file{db2pickle.py} and
1391\file{pickle2db.py} which you will find in the distribution's
1392Tools/scripts directory. If you've already been using the PyBSDDB
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001393package and importing it as \module{bsddb3}, you will have to change your
Skip Montanaro959c7722003-03-07 15:45:15 +00001394\code{import} statements.
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001395
1396\item The new \module{bz2} module is an interface to the bz2 data
1397compression library. bz2 usually produces output that's smaller than
1398the compressed output from the \module{zlib} module, meaning that it
1399compresses data more highly. (Contributed by Gustavo Niemeyer.)
Andrew M. Kuchling669249e2002-11-19 13:05:33 +00001400
Fred Drake5c4cf152002-11-13 14:59:06 +00001401\item The Distutils \class{Extension} class now supports
1402an extra constructor argument named \var{depends} for listing
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001403additional source files that an extension depends on. This lets
1404Distutils recompile the module if any of the dependency files are
Fred Drake5c4cf152002-11-13 14:59:06 +00001405modified. For example, if \file{sampmodule.c} includes the header
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001406file \file{sample.h}, you would create the \class{Extension} object like
1407this:
1408
1409\begin{verbatim}
1410ext = Extension("samp",
1411 sources=["sampmodule.c"],
1412 depends=["sample.h"])
1413\end{verbatim}
1414
1415Modifying \file{sample.h} would then cause the module to be recompiled.
1416(Contributed by Jeremy Hylton.)
1417
Andrew M. Kuchlingdc3f7e12002-11-04 20:05:10 +00001418\item Other minor changes to Distutils:
1419it now checks for the \envvar{CC}, \envvar{CFLAGS}, \envvar{CPP},
1420\envvar{LDFLAGS}, and \envvar{CPPFLAGS} environment variables, using
1421them to override the settings in Python's configuration (contributed
Andrew M. Kuchlinga31bb372003-01-27 16:36:34 +00001422by Robert Weber).
Andrew M. Kuchlingdc3f7e12002-11-04 20:05:10 +00001423
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001424\item The new \function{gc.get_referents(\var{object})} function returns a
1425list of all the objects referenced by \var{object}.
1426
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001427\item The \module{getopt} module gained a new function,
1428\function{gnu_getopt()}, that supports the same arguments as the existing
Fred Drake5c4cf152002-11-13 14:59:06 +00001429\function{getopt()} function but uses GNU-style scanning mode.
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001430The existing \function{getopt()} stops processing options as soon as a
1431non-option argument is encountered, but in GNU-style mode processing
1432continues, meaning that options and arguments can be mixed. For
1433example:
1434
1435\begin{verbatim}
1436>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1437([('-f', 'filename')], ['output', '-v'])
1438>>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
1439([('-f', 'filename'), ('-v', '')], ['output'])
1440\end{verbatim}
1441
1442(Contributed by Peter \AA{strand}.)
1443
1444\item The \module{grp}, \module{pwd}, and \module{resource} modules
Fred Drake5c4cf152002-11-13 14:59:06 +00001445now return enhanced tuples:
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001446
1447\begin{verbatim}
1448>>> import grp
1449>>> g = grp.getgrnam('amk')
1450>>> g.gr_name, g.gr_gid
1451('amk', 500)
1452\end{verbatim}
1453
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001454\item The \module{gzip} module can now handle files exceeding 2~Gb.
1455
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001456\item The new \module{heapq} module contains an implementation of a
1457heap queue algorithm. A heap is an array-like data structure that
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001458keeps items in a partially sorted order such that, for every index
1459\var{k}, \code{heap[\var{k}] <= heap[2*\var{k}+1]} and
1460\code{heap[\var{k}] <= heap[2*\var{k}+2]}. This makes it quick to
1461remove the smallest item, and inserting a new item while maintaining
1462the heap property is O(lg~n). (See
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001463\url{http://www.nist.gov/dads/HTML/priorityque.html} for more
1464information about the priority queue data structure.)
1465
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001466The \module{heapq} module provides \function{heappush()} and
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001467\function{heappop()} functions for adding and removing items while
1468maintaining the heap property on top of some other mutable Python
1469sequence type. For example:
1470
1471\begin{verbatim}
1472>>> import heapq
1473>>> heap = []
1474>>> for item in [3, 7, 5, 11, 1]:
1475... heapq.heappush(heap, item)
1476...
1477>>> heap
1478[1, 3, 5, 11, 7]
1479>>> heapq.heappop(heap)
14801
1481>>> heapq.heappop(heap)
14823
1483>>> heap
1484[5, 7, 11]
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00001485\end{verbatim}
1486
1487(Contributed by Kevin O'Connor.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001488
Andrew M. Kuchling87cebbf2003-01-03 16:24:28 +00001489\item The \module{imaplib} module now supports IMAP over SSL.
1490(Contributed by Piers Lauder and Tino Lange.)
1491
Andrew M. Kuchling41c3e002003-03-02 02:13:52 +00001492\item The \module{itertools} contains a number of useful functions for
1493use with iterators, inspired by various functions provided by the ML
1494and Haskell languages. For example,
1495\code{itertools.ifilter(predicate, iterator)} returns all elements in
1496the iterator for which the function \function{predicate()} returns
Andrew M. Kuchling563389f2003-03-02 02:31:58 +00001497\constant{True}, and \code{itertools.repeat(obj, \var{N})} returns
Andrew M. Kuchling41c3e002003-03-02 02:13:52 +00001498\code{obj} \var{N} times. There are a number of other functions in
1499the module; see the \ulink{package's reference
1500documentation}{../lib/module-itertools.html} for details.
Raymond Hettinger5284b442003-03-09 07:19:38 +00001501(Contributed by Raymond Hettinger.)
Fred Drakecade7132003-02-19 16:08:08 +00001502
Fred Drake5c4cf152002-11-13 14:59:06 +00001503\item Two new functions in the \module{math} module,
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001504\function{degrees(\var{rads})} and \function{radians(\var{degs})},
Fred Drake5c4cf152002-11-13 14:59:06 +00001505convert between radians and degrees. Other functions in the
Andrew M. Kuchling8e5b53b2002-12-15 20:17:38 +00001506\module{math} module such as \function{math.sin()} and
1507\function{math.cos()} have always required input values measured in
1508radians. Also, an optional \var{base} argument was added to
1509\function{math.log()} to make it easier to compute logarithms for
1510bases other than \code{e} and \code{10}. (Contributed by Raymond
1511Hettinger.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001512
Andrew M. Kuchlingae3bbf52002-12-31 14:03:45 +00001513\item Several new functions (\function{getpgid()}, \function{killpg()},
1514\function{lchown()}, \function{loadavg()}, \function{major()}, \function{makedev()},
1515\function{minor()}, and \function{mknod()}) were added to the
Andrew M. Kuchlingc309cca2002-10-10 16:04:08 +00001516\module{posix} module that underlies the \module{os} module.
Andrew M. Kuchlingae3bbf52002-12-31 14:03:45 +00001517(Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001518
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001519\item In the \module{os} module, the \function{*stat()} family of functions can now report
1520fractions of a second in a timestamp. Such time stamps are
1521represented as floats, similar to \function{time.time()}.
1522
1523During testing, it was found that some applications will break if time
1524stamps are floats. For compatibility, when using the tuple interface
1525of the \class{stat_result} time stamps will be represented as integers.
1526When using named fields (a feature first introduced in Python 2.2),
1527time stamps are still represented as integers, unless
1528\function{os.stat_float_times()} is invoked to enable float return
1529values:
1530
1531\begin{verbatim}
1532>>> os.stat("/tmp").st_mtime
15331034791200
1534>>> os.stat_float_times(True)
1535>>> os.stat("/tmp").st_mtime
15361034791200.6335014
1537\end{verbatim}
1538
1539In Python 2.4, the default will change to always returning floats.
1540
1541Application developers should enable this feature only if all their
1542libraries work properly when confronted with floating point time
1543stamps, or if they use the tuple API. If used, the feature should be
1544activated on an application level instead of trying to enable it on a
1545per-use basis.
1546
Andrew M. Kuchling53262572002-12-01 14:00:21 +00001547\item The old and never-documented \module{linuxaudiodev} module has
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001548been deprecated, and a new version named \module{ossaudiodev} has been
1549added. The module was renamed because the OSS sound drivers can be
1550used on platforms other than Linux, and the interface has also been
1551tidied and brought up to date in various ways. (Contributed by Greg
Greg Wardaa1d3aa2003-01-03 18:03:21 +00001552Ward and Nicholas FitzRoy-Dale.)
Andrew M. Kuchling53262572002-12-01 14:00:21 +00001553
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001554\item The new \module{platform} module contains a number of functions
1555that try to determine various properties of the platform you're
1556running on. There are functions for getting the architecture, CPU
1557type, the Windows OS version, and even the Linux distribution version.
1558(Contributed by Marc-Andr\'e Lemburg.)
1559
Fred Drake5c4cf152002-11-13 14:59:06 +00001560\item The parser objects provided by the \module{pyexpat} module
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001561can now optionally buffer character data, resulting in fewer calls to
1562your character data handler and therefore faster performance. Setting
Fred Drake5c4cf152002-11-13 14:59:06 +00001563the parser object's \member{buffer_text} attribute to \constant{True}
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001564will enable buffering.
1565
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001566\item The \function{sample(\var{population}, \var{k})} function was
1567added to the \module{random} module. \var{population} is a sequence
Fred Drakeaac8c582003-01-17 22:50:10 +00001568or \class{xrange} object containing the elements of a population, and
1569\function{sample()}
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001570chooses \var{k} elements from the population without replacing chosen
1571elements. \var{k} can be any value up to \code{len(\var{population})}.
1572For example:
1573
1574\begin{verbatim}
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001575>>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
Michael W. Hudsoncfd38842002-12-17 16:15:34 +00001576>>> random.sample(days, 3) # Choose 3 elements
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001577['St', 'Sn', 'Th']
Michael W. Hudsoncfd38842002-12-17 16:15:34 +00001578>>> random.sample(days, 7) # Choose 7 elements
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001579['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
Michael W. Hudsoncfd38842002-12-17 16:15:34 +00001580>>> random.sample(days, 7) # Choose 7 again
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001581['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
Michael W. Hudsoncfd38842002-12-17 16:15:34 +00001582>>> random.sample(days, 8) # Can't choose eight
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001583Traceback (most recent call last):
Andrew M. Kuchling28f2f882002-11-14 14:14:16 +00001584 File "<stdin>", line 1, in ?
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001585 File "random.py", line 414, in sample
1586 raise ValueError, "sample larger than population"
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001587ValueError: sample larger than population
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001588>>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001589[3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001590\end{verbatim}
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001591
1592The \module{random} module now uses a new algorithm, the Mersenne
1593Twister, implemented in C. It's faster and more extensively studied
1594than the previous algorithm.
1595
1596(All changes contributed by Raymond Hettinger.)
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00001597
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001598\item The \module{readline} module also gained a number of new
1599functions: \function{get_history_item()},
1600\function{get_current_history_length()}, and \function{redisplay()}.
1601
Andrew M. Kuchlingef893fe2003-01-06 20:04:17 +00001602\item The \module{rexec} and \module{Bastion} modules have been
1603declared dead, and attempts to import them will fail with a
1604\exception{RuntimeError}. New-style classes provide new ways to break
1605out of the restricted execution environment provided by
1606\module{rexec}, and no one has interest in fixing them or time to do
1607so. If you have applications using \module{rexec}, rewrite them to
1608use something else.
1609
1610(Sticking with Python 2.2 or 2.1 will not make your applications any
Andrew M. Kuchling13b4c412003-04-24 13:23:43 +00001611safer because there are known bugs in the \module{rexec} module in
Andrew M. Kuchling035272b2003-04-24 16:38:20 +00001612those versions. To repeat: if you're using \module{rexec}, stop using
Andrew M. Kuchlingef893fe2003-01-06 20:04:17 +00001613it immediately.)
1614
Andrew M. Kuchling13b4c412003-04-24 13:23:43 +00001615\item The \module{rotor} module has been deprecated because the
1616 algorithm it uses for encryption is not believed to be secure. If
1617 you need encryption, use one of the several AES Python modules
1618 that are available separately.
1619
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001620\item The \module{shutil} module gained a \function{move(\var{src},
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001621\var{dest})} function that recursively moves a file or directory to a new
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001622location.
1623
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001624\item Support for more advanced POSIX signal handling was added
Michael W. Hudson43ed43b2003-03-13 13:56:53 +00001625to the \module{signal} but then removed again as it proved impossible
1626to make it work reliably across platforms.
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001627
1628\item The \module{socket} module now supports timeouts. You
1629can call the \method{settimeout(\var{t})} method on a socket object to
1630set a timeout of \var{t} seconds. Subsequent socket operations that
1631take longer than \var{t} seconds to complete will abort and raise a
Andrew M. Kuchlingc760c6c2003-07-16 20:12:33 +00001632\exception{socket.timeout} exception.
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001633
1634The original timeout implementation was by Tim O'Malley. Michael
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001635Gilfix integrated it into the Python \module{socket} module and
1636shepherded it through a lengthy review. After the code was checked
1637in, Guido van~Rossum rewrote parts of it. (This is a good example of
1638a collaborative development process in action.)
Andrew M. Kuchlinga982eb12002-07-22 18:57:36 +00001639
Mark Hammond8af50bc2002-12-03 06:13:35 +00001640\item On Windows, the \module{socket} module now ships with Secure
Michael W. Hudson065f5fa2003-02-10 19:24:50 +00001641Sockets Layer (SSL) support.
Mark Hammond8af50bc2002-12-03 06:13:35 +00001642
Andrew M. Kuchling563389f2003-03-02 02:31:58 +00001643\item The value of the C \constant{PYTHON_API_VERSION} macro is now
1644exposed at the Python level as \code{sys.api_version}. The current
1645exception can be cleared by calling the new \function{sys.exc_clear()}
1646function.
Andrew M. Kuchlingdcfd8252002-09-13 22:21:42 +00001647
Andrew M. Kuchling674b0bf2003-01-07 00:07:19 +00001648\item The new \module{tarfile} module
Neal Norwitz55d555f2003-01-08 05:27:42 +00001649allows reading from and writing to \program{tar}-format archive files.
Andrew M. Kuchling674b0bf2003-01-07 00:07:19 +00001650(Contributed by Lars Gust\"abel.)
1651
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00001652\item The new \module{textwrap} module contains functions for wrapping
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001653strings containing paragraphs of text. The \function{wrap(\var{text},
1654\var{width})} function takes a string and returns a list containing
1655the text split into lines of no more than the chosen width. The
1656\function{fill(\var{text}, \var{width})} function returns a single
1657string, reformatted to fit into lines no longer than the chosen width.
1658(As you can guess, \function{fill()} is built on top of
1659\function{wrap()}. For example:
1660
1661\begin{verbatim}
1662>>> import textwrap
1663>>> paragraph = "Not a whit, we defy augury: ... more text ..."
1664>>> textwrap.wrap(paragraph, 60)
Fred Drake5c4cf152002-11-13 14:59:06 +00001665["Not a whit, we defy augury: there's a special providence in",
1666 "the fall of a sparrow. If it be now, 'tis not to come; if it",
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001667 ...]
1668>>> print textwrap.fill(paragraph, 35)
1669Not a whit, we defy augury: there's
1670a special providence in the fall of
1671a sparrow. If it be now, 'tis not
1672to come; if it be not to come, it
1673will be now; if it be not now, yet
1674it will come: the readiness is all.
Fred Drake5c4cf152002-11-13 14:59:06 +00001675>>>
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001676\end{verbatim}
1677
1678The module also contains a \class{TextWrapper} class that actually
Fred Drake5c4cf152002-11-13 14:59:06 +00001679implements the text wrapping strategy. Both the
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001680\class{TextWrapper} class and the \function{wrap()} and
1681\function{fill()} functions support a number of additional keyword
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001682arguments for fine-tuning the formatting; consult the \ulink{module's
1683documentation}{../lib/module-textwrap.html} for details.
Andrew M. Kuchlingd003a2a2002-06-26 13:23:55 +00001684(Contributed by Greg Ward.)
1685
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001686\item The \module{thread} and \module{threading} modules now have
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001687companion modules, \module{dummy_thread} and \module{dummy_threading},
1688that provide a do-nothing implementation of the \module{thread}
1689module's interface for platforms where threads are not supported. The
1690intention is to simplify thread-aware modules (ones that \emph{don't}
1691rely on threads to run) by putting the following code at the top:
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001692
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001693\begin{verbatim}
1694try:
1695 import threading as _threading
1696except ImportError:
1697 import dummy_threading as _threading
1698\end{verbatim}
1699
1700Code can then call functions and use classes in \module{_threading}
1701whether or not threads are supported, avoiding an \keyword{if}
1702statement and making the code slightly clearer. This module will not
1703magically make multithreaded code run without threads; code that waits
1704for another thread to return or to do something will simply hang
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001705forever. (In this example, \module{_threading} is used as the module
1706name to make it clear that the module being used is not necessarily
1707the actual \module{threading} module.)
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00001708
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001709\item The \module{time} module's \function{strptime()} function has
Fred Drake5c4cf152002-11-13 14:59:06 +00001710long been an annoyance because it uses the platform C library's
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001711\function{strptime()} implementation, and different platforms
1712sometimes have odd bugs. Brett Cannon contributed a portable
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001713implementation that's written in pure Python and should behave
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001714identically on all platforms.
1715
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001716\item The new \module{timeit} module helps measure how long snippets
1717of Python code take to execute. The \file{timeit.py} file can be run
1718directly from the command line, or the module's \class{Timer} class
1719can be imported and used directly. Here's a short example that
1720figures out whether it's faster to convert an 8-bit string to Unicode
1721by appending an empty Unicode string to it or by using the
1722\function{unicode()} function:
1723
1724\begin{verbatim}
1725import timeit
1726
1727timer1 = timeit.Timer('unicode("abc")')
1728timer2 = timeit.Timer('"abc" + u""')
1729
1730# Run three trials
1731print timer1.repeat(repeat=3, number=100000)
1732print timer2.repeat(repeat=3, number=100000)
1733
1734# On my laptop this outputs:
1735# [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
1736# [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
1737\end{verbatim}
1738
1739
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001740\item The \module{UserDict} module has a new \class{DictMixin} class which
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001741defines all dictionary methods for classes that already have a minimum
1742mapping interface. This greatly simplifies writing classes that need
1743to be substitutable for dictionaries, such as the classes in
1744the \module{shelve} module.
1745
1746Adding the mixin as a superclass provides the full dictionary
1747interface whenever the class defines \method{__getitem__},
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001748\method{__setitem__}, \method{__delitem__}, and \method{keys}.
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001749For example:
1750
1751\begin{verbatim}
1752>>> import UserDict
1753>>> class SeqDict(UserDict.DictMixin):
1754 """Dictionary lookalike implemented with lists."""
1755 def __init__(self):
1756 self.keylist = []
1757 self.valuelist = []
1758 def __getitem__(self, key):
1759 try:
1760 i = self.keylist.index(key)
1761 except ValueError:
1762 raise KeyError
1763 return self.valuelist[i]
1764 def __setitem__(self, key, value):
1765 try:
1766 i = self.keylist.index(key)
1767 self.valuelist[i] = value
1768 except ValueError:
1769 self.keylist.append(key)
1770 self.valuelist.append(value)
1771 def __delitem__(self, key):
1772 try:
1773 i = self.keylist.index(key)
1774 except ValueError:
1775 raise KeyError
1776 self.keylist.pop(i)
1777 self.valuelist.pop(i)
1778 def keys(self):
1779 return list(self.keylist)
1780
1781>>> s = SeqDict()
1782>>> dir(s) # See that other dictionary methods are implemented
1783['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
1784 '__init__', '__iter__', '__len__', '__module__', '__repr__',
1785 '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
1786 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
1787 'setdefault', 'update', 'valuelist', 'values']
Neal Norwitzc7d8c682002-12-24 14:51:43 +00001788\end{verbatim}
Andrew M. Kuchling449a87d2002-12-11 15:03:51 +00001789
1790(Contributed by Raymond Hettinger.)
1791
Raymond Hettinger8ccf4d72003-07-10 15:48:33 +00001792\item The \module{Tix} module has received various bug fixes and
Andrew M. Kuchlingef893fe2003-01-06 20:04:17 +00001793updates for the current version of the Tix package.
1794
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001795\item The \module{Tkinter} module now works with a thread-enabled
1796version of Tcl. Tcl's threading model requires that widgets only be
1797accessed from the thread in which they're created; accesses from
1798another thread can cause Tcl to panic. For certain Tcl interfaces,
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001799\module{Tkinter} will now automatically avoid this
1800when a widget is accessed from a different thread by marshalling a
1801command, passing it to the correct thread, and waiting for the
1802results. Other interfaces can't be handled automatically but
1803\module{Tkinter} will now raise an exception on such an access so that
1804at least you can find out about the problem. See
Fred Drakeb876bcc2003-04-30 15:03:46 +00001805\url{http://mail.python.org/pipermail/python-dev/2002-December/031107.html} %
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001806for a more detailed explanation of this change. (Implemented by
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +00001807Martin von~L\"owis.)
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001808
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00001809\item Calling Tcl methods through \module{_tkinter} no longer
1810returns only strings. Instead, if Tcl returns other objects those
1811objects are converted to their Python equivalent, if one exists, or
1812wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
Raymond Hettinger45bda572002-12-14 20:20:45 +00001813exists. This behavior can be controlled through the
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00001814\method{wantobjects()} method of \class{tkapp} objects.
Martin v. Löwis39b48522002-11-26 09:47:25 +00001815
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00001816When using \module{_tkinter} through the \module{Tkinter} module (as
1817most Tkinter applications will), this feature is always activated. It
1818should not cause compatibility problems, since Tkinter would always
1819convert string results to Python types where possible.
Martin v. Löwis39b48522002-11-26 09:47:25 +00001820
Raymond Hettinger45bda572002-12-14 20:20:45 +00001821If any incompatibilities are found, the old behavior can be restored
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00001822by setting the \member{wantobjects} variable in the \module{Tkinter}
1823module to false before creating the first \class{tkapp} object.
Martin v. Löwis39b48522002-11-26 09:47:25 +00001824
1825\begin{verbatim}
1826import Tkinter
Martin v. Löwis8c8aa5d2002-11-26 21:39:48 +00001827Tkinter.wantobjects = 0
Martin v. Löwis39b48522002-11-26 09:47:25 +00001828\end{verbatim}
1829
Andrew M. Kuchling6c50df22002-12-13 12:53:16 +00001830Any breakage caused by this change should be reported as a bug.
Martin v. Löwis39b48522002-11-26 09:47:25 +00001831
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001832\item The DOM implementation
1833in \module{xml.dom.minidom} can now generate XML output in a
1834particular encoding by providing an optional encoding argument to
1835the \method{toxml()} and \method{toprettyxml()} methods of DOM nodes.
1836
1837\item The new \module{DocXMLRPCServer} module allows writing
1838self-documenting XML-RPC servers. Run it in demo mode (as a program)
1839to see it in action. Pointing the Web browser to the RPC server
1840produces pydoc-style documentation; pointing xmlrpclib to the
1841server allows invoking the actual methods.
1842(Contributed by Brian Quinlan.)
1843
Martin v. Löwis2548c732003-04-18 10:39:54 +00001844\item Support for internationalized domain names (RFCs 3454, 3490,
18453491, and 3492) has been added. The ``idna'' encoding can be used
1846to convert between a Unicode domain name and the ASCII-compatible
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001847encoding (ACE) of that name.
Martin v. Löwis2548c732003-04-18 10:39:54 +00001848
Martin v. Löwisfaf71ea2003-04-18 21:48:56 +00001849\begin{alltt}
Fred Drake15b3dba2003-07-16 04:00:14 +00001850>{}>{}> u"www.Alliancefran\c caise.nu".encode("idna")
Martin v. Löwis2548c732003-04-18 10:39:54 +00001851'www.xn--alliancefranaise-npb.nu'
Martin v. Löwisfaf71ea2003-04-18 21:48:56 +00001852\end{alltt}
Martin v. Löwis2548c732003-04-18 10:39:54 +00001853
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001854The \module{socket} module has also been extended to transparently
1855convert Unicode hostnames to the ACE version before passing them to
1856the C library. Modules that deal with hostnames such as
1857\module{httplib} and \module{ftplib}) also support Unicode host names;
1858\module{httplib} also sends HTTP \samp{Host} headers using the ACE
1859version of the domain name. \module{urllib} supports Unicode URLs
1860with non-ASCII host names as long as the \code{path} part of the URL
1861is ASCII only.
Martin v. Löwis2548c732003-04-18 10:39:54 +00001862
1863To implement this change, the module \module{stringprep}, the tool
Andrew M. Kuchlinge36b6902003-04-19 15:38:47 +00001864\code{mkstringprep} and the \code{punycode} encoding have been added.
Martin v. Löwis281b2c62003-04-18 21:04:39 +00001865
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00001866\end{itemize}
1867
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00001868
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00001869%======================================================================
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001870\subsection{Date/Time Type}
1871
1872Date and time types suitable for expressing timestamps were added as
1873the \module{datetime} module. The types don't support different
1874calendars or many fancy features, and just stick to the basics of
1875representing time.
1876
1877The three primary types are: \class{date}, representing a day, month,
1878and year; \class{time}, consisting of hour, minute, and second; and
1879\class{datetime}, which contains all the attributes of both
Andrew M. Kuchlingc71bb972003-03-21 17:23:07 +00001880\class{date} and \class{time}. There's also a
1881\class{timedelta} class representing differences between two points
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001882in time, and time zone logic is implemented by classes inheriting from
1883the abstract \class{tzinfo} class.
1884
1885You can create instances of \class{date} and \class{time} by either
1886supplying keyword arguments to the appropriate constructor,
1887e.g. \code{datetime.date(year=1972, month=10, day=15)}, or by using
Andrew M. Kuchlingc71bb972003-03-21 17:23:07 +00001888one of a number of class methods. For example, the \method{date.today()}
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001889class method returns the current local date.
1890
1891Once created, instances of the date/time classes are all immutable.
1892There are a number of methods for producing formatted strings from
1893objects:
1894
1895\begin{verbatim}
1896>>> import datetime
1897>>> now = datetime.datetime.now()
1898>>> now.isoformat()
1899'2002-12-30T21:27:03.994956'
1900>>> now.ctime() # Only available on date, datetime
1901'Mon Dec 30 21:27:03 2002'
Raymond Hettingeree1bded2003-01-17 16:20:23 +00001902>>> now.strftime('%Y %d %b')
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001903'2002 30 Dec'
1904\end{verbatim}
1905
1906The \method{replace()} method allows modifying one or more fields
1907of a \class{date} or \class{datetime} instance:
1908
1909\begin{verbatim}
1910>>> d = datetime.datetime.now()
1911>>> d
1912datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
1913>>> d.replace(year=2001, hour = 12)
1914datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
1915>>>
1916\end{verbatim}
1917
1918Instances can be compared, hashed, and converted to strings (the
1919result is the same as that of \method{isoformat()}). \class{date} and
1920\class{datetime} instances can be subtracted from each other, and
Andrew M. Kuchlingc71bb972003-03-21 17:23:07 +00001921added to \class{timedelta} instances. The largest missing feature is
1922that there's no support for parsing strings and getting back a
1923\class{date} or \class{datetime}.
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001924
1925For more information, refer to the \ulink{module's reference
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001926documentation}{../lib/module-datetime.html}.
Andrew M. Kuchlinga974b392003-01-13 19:09:03 +00001927(Contributed by Tim Peters.)
1928
1929
1930%======================================================================
Andrew M. Kuchling8d177092003-05-13 14:26:54 +00001931\subsection{The optparse Module}
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00001932
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001933The \module{getopt} module provides simple parsing of command-line
1934arguments. The new \module{optparse} module (originally named Optik)
1935provides more elaborate command-line parsing that follows the Unix
1936conventions, automatically creates the output for \longprogramopt{help},
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00001937and can perform different actions for different options.
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001938
1939You start by creating an instance of \class{OptionParser} and telling
1940it what your program's options are.
1941
1942\begin{verbatim}
Andrew M. Kuchling7ee9b512003-02-18 00:48:23 +00001943import sys
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001944from optparse import OptionParser
1945
1946op = OptionParser()
1947op.add_option('-i', '--input',
1948 action='store', type='string', dest='input',
1949 help='set input filename')
1950op.add_option('-l', '--length',
1951 action='store', type='int', dest='length',
1952 help='set maximum length of output')
1953\end{verbatim}
1954
1955Parsing a command line is then done by calling the \method{parse_args()}
1956method.
1957
1958\begin{verbatim}
Andrew M. Kuchling7ee9b512003-02-18 00:48:23 +00001959import optparse
1960
1961options, args = optparse.parse_args(sys.argv[1:])
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001962print options
1963print args
1964\end{verbatim}
1965
1966This returns an object containing all of the option values,
1967and a list of strings containing the remaining arguments.
1968
1969Invoking the script with the various arguments now works as you'd
1970expect it to. Note that the length argument is automatically
1971converted to an integer.
1972
1973\begin{verbatim}
1974$ ./python opt.py -i data arg1
1975<Values at 0x400cad4c: {'input': 'data', 'length': None}>
1976['arg1']
1977$ ./python opt.py --input=data --length=4
1978<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
Andrew M. Kuchling7ee9b512003-02-18 00:48:23 +00001979[]
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001980$
1981\end{verbatim}
1982
1983The help message is automatically generated for you:
1984
1985\begin{verbatim}
1986$ ./python opt.py --help
1987usage: opt.py [options]
1988
1989options:
1990 -h, --help show this help message and exit
1991 -iINPUT, --input=INPUT
1992 set input filename
1993 -lLENGTH, --length=LENGTH
1994 set maximum length of output
1995$
1996\end{verbatim}
Andrew M. Kuchling669249e2002-11-19 13:05:33 +00001997% $ prevent Emacs tex-mode from getting confused
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00001998
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00001999See the \ulink{module's documentation}{../lib/module-optparse.html}
2000for more details.
2001
Andrew M. Kuchling24d5a522002-11-14 23:40:42 +00002002Optik was written by Greg Ward, with suggestions from the readers of
2003the Getopt SIG.
2004
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00002005
2006%======================================================================
Andrew M. Kuchling8d177092003-05-13 14:26:54 +00002007\section{Pymalloc: A Specialized Object Allocator\label{section-pymalloc}}
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002008
Andrew M. Kuchlingc71bb972003-03-21 17:23:07 +00002009Pymalloc, a specialized object allocator written by Vladimir
2010Marangozov, was a feature added to Python 2.1. Pymalloc is intended
2011to be faster than the system \cfunction{malloc()} and to have less
2012memory overhead for allocation patterns typical of Python programs.
2013The allocator uses C's \cfunction{malloc()} function to get large
2014pools of memory and then fulfills smaller memory requests from these
2015pools.
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002016
2017In 2.1 and 2.2, pymalloc was an experimental feature and wasn't
Andrew M. Kuchlingc71bb972003-03-21 17:23:07 +00002018enabled by default; you had to explicitly enable it when compiling
2019Python by providing the
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002020\longprogramopt{with-pymalloc} option to the \program{configure}
2021script. In 2.3, pymalloc has had further enhancements and is now
2022enabled by default; you'll have to supply
2023\longprogramopt{without-pymalloc} to disable it.
2024
2025This change is transparent to code written in Python; however,
2026pymalloc may expose bugs in C extensions. Authors of C extension
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002027modules should test their code with pymalloc enabled,
2028because some incorrect code may cause core dumps at runtime.
2029
2030There's one particularly common error that causes problems. There are
2031a number of memory allocation functions in Python's C API that have
2032previously just been aliases for the C library's \cfunction{malloc()}
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002033and \cfunction{free()}, meaning that if you accidentally called
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002034mismatched functions the error wouldn't be noticeable. When the
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002035object allocator is enabled, these functions aren't aliases of
2036\cfunction{malloc()} and \cfunction{free()} any more, and calling the
2037wrong function to free memory may get you a core dump. For example,
2038if memory was allocated using \cfunction{PyObject_Malloc()}, it has to
2039be freed using \cfunction{PyObject_Free()}, not \cfunction{free()}. A
2040few modules included with Python fell afoul of this and had to be
2041fixed; doubtless there are more third-party modules that will have the
2042same problem.
2043
2044As part of this change, the confusing multiple interfaces for
2045allocating memory have been consolidated down into two API families.
2046Memory allocated with one family must not be manipulated with
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002047functions from the other family. There is one family for allocating
2048chunks of memory, and another family of functions specifically for
2049allocating Python objects.
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002050
2051\begin{itemize}
2052 \item To allocate and free an undistinguished chunk of memory use
2053 the ``raw memory'' family: \cfunction{PyMem_Malloc()},
2054 \cfunction{PyMem_Realloc()}, and \cfunction{PyMem_Free()}.
2055
2056 \item The ``object memory'' family is the interface to the pymalloc
2057 facility described above and is biased towards a large number of
2058 ``small'' allocations: \cfunction{PyObject_Malloc},
2059 \cfunction{PyObject_Realloc}, and \cfunction{PyObject_Free}.
2060
2061 \item To allocate and free Python objects, use the ``object'' family
2062 \cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()}, and
2063 \cfunction{PyObject_Del()}.
2064\end{itemize}
2065
2066Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides
2067debugging features to catch memory overwrites and doubled frees in
2068both extension modules and in the interpreter itself. To enable this
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002069support, compile a debugging version of the Python interpreter by
2070running \program{configure} with \longprogramopt{with-pydebug}.
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002071
2072To aid extension writers, a header file \file{Misc/pymemcompat.h} is
2073distributed with the source to Python 2.3 that allows Python
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002074extensions to use the 2.3 interfaces to memory allocation while
2075compiling against any version of Python since 1.5.2. You would copy
2076the file from Python's source distribution and bundle it with the
2077source of your extension.
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002078
2079\begin{seealso}
2080
2081\seeurl{http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/obmalloc.c}
2082{For the full details of the pymalloc implementation, see
2083the comments at the top of the file \file{Objects/obmalloc.c} in the
2084Python source code. The above link points to the file within the
2085SourceForge CVS browser.}
2086
2087\end{seealso}
2088
2089
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00002090% ======================================================================
2091\section{Build and C API Changes}
2092
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00002093Changes to Python's build process and to the C API include:
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00002094
2095\begin{itemize}
2096
Andrew M. Kuchlingef5d06b2002-07-22 19:21:06 +00002097\item The C-level interface to the garbage collector has been changed,
2098to make it easier to write extension types that support garbage
2099collection, and to make it easier to debug misuses of the functions.
2100Various functions have slightly different semantics, so a bunch of
2101functions had to be renamed. Extensions that use the old API will
2102still compile but will \emph{not} participate in garbage collection,
2103so updating them for 2.3 should be considered fairly high priority.
2104
2105To upgrade an extension module to the new API, perform the following
2106steps:
2107
2108\begin{itemize}
2109
2110\item Rename \cfunction{Py_TPFLAGS_GC} to \cfunction{PyTPFLAGS_HAVE_GC}.
2111
2112\item Use \cfunction{PyObject_GC_New} or \cfunction{PyObject_GC_NewVar} to
2113allocate objects, and \cfunction{PyObject_GC_Del} to deallocate them.
2114
2115\item Rename \cfunction{PyObject_GC_Init} to \cfunction{PyObject_GC_Track} and
2116\cfunction{PyObject_GC_Fini} to \cfunction{PyObject_GC_UnTrack}.
2117
2118\item Remove \cfunction{PyGC_HEAD_SIZE} from object size calculations.
2119
2120\item Remove calls to \cfunction{PyObject_AS_GC} and \cfunction{PyObject_FROM_GC}.
2121
2122\end{itemize}
2123
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002124\item The cycle detection implementation used by the garbage collection
2125has proven to be stable, so it's now being made mandatory; you can no
2126longer compile Python without it, and the
2127\longprogramopt{with-cycle-gc} switch to \program{configure} has been removed.
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00002128
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002129\item Python can now optionally be built as a shared library
2130(\file{libpython2.3.so}) by supplying \longprogramopt{enable-shared}
Fred Drake5c4cf152002-11-13 14:59:06 +00002131when running Python's \program{configure} script. (Contributed by Ondrej
Andrew M. Kuchlingfad2f592002-05-10 21:00:05 +00002132Palkovsky.)
Andrew M. Kuchlingf4dd65d2002-04-01 19:28:09 +00002133
Michael W. Hudsondd32a912002-08-15 14:59:02 +00002134\item The \csimplemacro{DL_EXPORT} and \csimplemacro{DL_IMPORT} macros
2135are now deprecated. Initialization functions for Python extension
2136modules should now be declared using the new macro
Andrew M. Kuchling3c305d92002-07-22 18:50:11 +00002137\csimplemacro{PyMODINIT_FUNC}, while the Python core will generally
2138use the \csimplemacro{PyAPI_FUNC} and \csimplemacro{PyAPI_DATA}
2139macros.
Neal Norwitzbba23a82002-07-22 13:18:59 +00002140
Fred Drake5c4cf152002-11-13 14:59:06 +00002141\item The interpreter can be compiled without any docstrings for
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00002142the built-in functions and modules by supplying
Fred Drake5c4cf152002-11-13 14:59:06 +00002143\longprogramopt{without-doc-strings} to the \program{configure} script.
Andrew M. Kuchlinge995d162002-07-11 20:09:50 +00002144This makes the Python executable about 10\% smaller, but will also
2145mean that you can't get help for Python's built-ins. (Contributed by
2146Gustavo Niemeyer.)
2147
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002148\item The \cfunction{PyArg_NoArgs()} macro is now deprecated, and code
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00002149that uses it should be changed. For Python 2.2 and later, the method
2150definition table can specify the
Fred Drake5c4cf152002-11-13 14:59:06 +00002151\constant{METH_NOARGS} flag, signalling that there are no arguments, and
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00002152the argument checking can then be removed. If compatibility with
2153pre-2.2 versions of Python is important, the code could use
Fred Drakeaac8c582003-01-17 22:50:10 +00002154\code{PyArg_ParseTuple(\var{args}, "")} instead, but this will be slower
Andrew M. Kuchling7845e7c2002-07-11 19:27:46 +00002155than using \constant{METH_NOARGS}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002156
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002157\item A new function, \cfunction{PyObject_DelItemString(\var{mapping},
2158char *\var{key})} was added
Fred Drake5c4cf152002-11-13 14:59:06 +00002159as shorthand for
Raymond Hettingera685f522003-07-12 04:42:30 +00002160\code{PyObject_DelItem(\var{mapping}, PyString_New(\var{key}))}.
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002161
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00002162\item The \method{xreadlines()} method of file objects, introduced in
2163Python 2.1, is no longer necessary because files now behave as their
2164own iterator. \method{xreadlines()} was originally introduced as a
2165faster way to loop over all the lines in a file, but now you can
2166simply write \code{for line in file_obj}.
2167
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002168\item File objects now manage their internal string buffer
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002169differently, increasing it exponentially when needed. This results in
2170the benchmark tests in \file{Lib/test/test_bufio.py} speeding up
2171considerably (from 57 seconds to 1.7 seconds, according to one
2172measurement).
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002173
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00002174\item It's now possible to define class and static methods for a C
2175extension type by setting either the \constant{METH_CLASS} or
2176\constant{METH_STATIC} flags in a method's \ctype{PyMethodDef}
2177structure.
Andrew M. Kuchling45afd542002-04-02 14:25:25 +00002178
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00002179\item Python now includes a copy of the Expat XML parser's source code,
2180removing any dependence on a system version or local installation of
Fred Drake5c4cf152002-11-13 14:59:06 +00002181Expat.
Andrew M. Kuchling346386f2002-07-12 20:24:42 +00002182
Michael W. Hudson3e245d82003-02-11 14:19:56 +00002183\item If you dynamically allocate type objects in your extension, you
Neal Norwitzada859c2003-02-11 14:30:39 +00002184should be aware of a change in the rules relating to the
Michael W. Hudson3e245d82003-02-11 14:19:56 +00002185\member{__module__} and \member{__name__} attributes. In summary,
2186you will want to ensure the type's dictionary contains a
2187\code{'__module__'} key; making the module name the part of the type
2188name leading up to the final period will no longer have the desired
2189effect. For more detail, read the API reference documentation or the
2190source.
2191
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00002192\end{itemize}
2193
Andrew M. Kuchling366c10c2002-11-14 23:07:57 +00002194
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00002195%======================================================================
Andrew M. Kuchling821013e2002-05-06 17:46:39 +00002196\subsection{Port-Specific Changes}
2197
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00002198Support for a port to IBM's OS/2 using the EMX runtime environment was
2199merged into the main Python source tree. EMX is a POSIX emulation
2200layer over the OS/2 system APIs. The Python port for EMX tries to
2201support all the POSIX-like capability exposed by the EMX runtime, and
2202mostly succeeds; \function{fork()} and \function{fcntl()} are
2203restricted by the limitations of the underlying emulation layer. The
2204standard OS/2 port, which uses IBM's Visual Age compiler, also gained
2205support for case-sensitive import semantics as part of the integration
2206of the EMX port into CVS. (Contributed by Andrew MacIntyre.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002207
Andrew M. Kuchling72b58e02002-05-29 17:30:34 +00002208On MacOS, most toolbox modules have been weaklinked to improve
2209backward compatibility. This means that modules will no longer fail
2210to load if a single routine is missing on the curent OS version.
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00002211Instead calling the missing routine will raise an exception.
2212(Contributed by Jack Jansen.)
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002213
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00002214The RPM spec files, found in the \file{Misc/RPM/} directory in the
2215Python source distribution, were updated for 2.3. (Contributed by
2216Sean Reifschneider.)
Fred Drake03e10312002-03-26 19:17:43 +00002217
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002218Other new platforms now supported by Python include AtheOS
Fred Drake693aea22003-02-07 14:52:18 +00002219(\url{http://www.atheos.cx/}), GNU/Hurd, and OpenVMS.
Andrew M. Kuchling20e5abc2002-07-11 20:50:34 +00002220
Fred Drake03e10312002-03-26 19:17:43 +00002221
2222%======================================================================
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002223\section{Other Changes and Fixes \label{section-other}}
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002224
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00002225As usual, there were a bunch of other improvements and bugfixes
2226scattered throughout the source tree. A search through the CVS change
Andrew M. Kuchling2cd77312003-07-16 14:44:12 +00002227logs finds there were 523 patches applied and 514 bugs fixed between
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00002228Python 2.2 and 2.3. Both figures are likely to be underestimates.
2229
2230Some of the more notable changes are:
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002231
2232\begin{itemize}
2233
Fred Drake54fe3fd2002-11-26 22:07:35 +00002234\item The \file{regrtest.py} script now provides a way to allow ``all
2235resources except \var{foo}.'' A resource name passed to the
2236\programopt{-u} option can now be prefixed with a hyphen
2237(\character{-}) to mean ``remove this resource.'' For example, the
2238option `\code{\programopt{-u}all,-bsddb}' could be used to enable the
2239use of all resources except \code{bsddb}.
2240
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002241\item The tools used to build the documentation now work under Cygwin
2242as well as \UNIX.
2243
Michael W. Hudsondd32a912002-08-15 14:59:02 +00002244\item The \code{SET_LINENO} opcode has been removed. Back in the
2245mists of time, this opcode was needed to produce line numbers in
2246tracebacks and support trace functions (for, e.g., \module{pdb}).
2247Since Python 1.5, the line numbers in tracebacks have been computed
2248using a different mechanism that works with ``python -O''. For Python
22492.3 Michael Hudson implemented a similar scheme to determine when to
2250call the trace function, removing the need for \code{SET_LINENO}
2251entirely.
2252
Andrew M. Kuchling7a82b8c2002-11-04 20:17:24 +00002253It would be difficult to detect any resulting difference from Python
2254code, apart from a slight speed up when Python is run without
Michael W. Hudsondd32a912002-08-15 14:59:02 +00002255\programopt{-O}.
2256
2257C extensions that access the \member{f_lineno} field of frame objects
2258should instead call \code{PyCode_Addr2Line(f->f_code, f->f_lasti)}.
2259This will have the added effect of making the code work as desired
2260under ``python -O'' in earlier versions of Python.
2261
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002262A nifty new feature is that trace functions can now assign to the
2263\member{f_lineno} attribute of frame objects, changing the line that
2264will be executed next. A \samp{jump} command has been added to the
2265\module{pdb} debugger taking advantage of this new feature.
2266(Implemented by Richie Hindle.)
Andrew M. Kuchling974ab9d2002-12-31 01:20:30 +00002267
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002268\end{itemize}
2269
Andrew M. Kuchling187b1d82002-05-29 19:20:57 +00002270
Andrew M. Kuchling517109b2002-05-07 21:01:16 +00002271%======================================================================
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00002272\section{Porting to Python 2.3}
2273
Andrew M. Kuchlingf15fb292002-12-31 18:34:54 +00002274This section lists previously described changes that may require
2275changes to your code:
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002276
2277\begin{itemize}
2278
2279\item \keyword{yield} is now always a keyword; if it's used as a
2280variable name in your code, a different name must be chosen.
2281
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002282\item For strings \var{X} and \var{Y}, \code{\var{X} in \var{Y}} now works
2283if \var{X} is more than one character long.
2284
Andrew M. Kuchling495172c2002-11-20 13:50:15 +00002285\item The \function{int()} type constructor will now return a long
2286integer instead of raising an \exception{OverflowError} when a string
2287or floating-point number is too large to fit into an integer.
2288
Andrew M. Kuchlingacddabc2003-02-18 00:43:24 +00002289\item If you have Unicode strings that contain 8-bit characters, you
2290must declare the file's encoding (UTF-8, Latin-1, or whatever) by
2291adding a comment to the top of the file. See
2292section~\ref{section-encodings} for more information.
2293
Andrew M. Kuchlingb492fa92002-11-27 19:11:10 +00002294\item Calling Tcl methods through \module{_tkinter} no longer
2295returns only strings. Instead, if Tcl returns other objects those
2296objects are converted to their Python equivalent, if one exists, or
2297wrapped with a \class{_tkinter.Tcl_Obj} object if no Python equivalent
2298exists.
2299
Andrew M. Kuchling80fd7852003-02-06 15:14:04 +00002300\item Large octal and hex literals such as
Andrew M. Kuchling72df65a2003-02-10 15:08:16 +00002301\code{0xffffffff} now trigger a \exception{FutureWarning}. Currently
Andrew M. Kuchling80fd7852003-02-06 15:14:04 +00002302they're stored as 32-bit numbers and result in a negative value, but
Andrew M. Kuchling72df65a2003-02-10 15:08:16 +00002303in Python 2.4 they'll become positive long integers.
2304
2305There are a few ways to fix this warning. If you really need a
2306positive number, just add an \samp{L} to the end of the literal. If
2307you're trying to get a 32-bit integer with low bits set and have
2308previously used an expression such as \code{~(1 << 31)}, it's probably
2309clearest to start with all bits set and clear the desired upper bits.
2310For example, to clear just the top bit (bit 31), you could write
2311\code{0xffffffffL {\&}{\textasciitilde}(1L<<31)}.
Andrew M. Kuchling80fd7852003-02-06 15:14:04 +00002312
Andrew M. Kuchling495172c2002-11-20 13:50:15 +00002313\item You can no longer disable assertions by assigning to \code{__debug__}.
2314
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002315\item The Distutils \function{setup()} function has gained various new
Fred Drake5c4cf152002-11-13 14:59:06 +00002316keyword arguments such as \var{depends}. Old versions of the
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002317Distutils will abort if passed unknown keywords. The fix is to check
2318for the presence of the new \function{get_distutil_options()} function
2319in your \file{setup.py} if you want to only support the new keywords
2320with a version of the Distutils that supports them:
2321
2322\begin{verbatim}
2323from distutils import core
2324
2325kw = {'sources': 'foo.c', ...}
2326if hasattr(core, 'get_distutil_options'):
2327 kw['depends'] = ['foo.h']
Fred Drake5c4cf152002-11-13 14:59:06 +00002328ext = Extension(**kw)
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002329\end{verbatim}
2330
Andrew M. Kuchling495172c2002-11-20 13:50:15 +00002331\item Using \code{None} as a variable name will now result in a
2332\exception{SyntaxWarning} warning.
2333
2334\item Names of extension types defined by the modules included with
2335Python now contain the module and a \character{.} in front of the type
2336name.
2337
Andrew M. Kuchling8a61f492002-11-13 13:24:41 +00002338\end{itemize}
Andrew M. Kuchling950725f2002-08-06 01:40:48 +00002339
2340
2341%======================================================================
Fred Drake03e10312002-03-26 19:17:43 +00002342\section{Acknowledgements \label{acks}}
2343
Andrew M. Kuchling03594bb2002-03-27 02:29:48 +00002344The author would like to thank the following people for offering
2345suggestions, corrections and assistance with various drafts of this
Andrew M. Kuchlingd39078b2003-04-13 21:44:28 +00002346article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside,
2347Andrew Dalke, Scott David Daniels, Fred~L. Drake, Jr., Kelly Gerber,
2348Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert,
Andrew M. Kuchlingfcf6b3e2003-05-07 17:00:35 +00002349Martin von~L\"owis, Andrew MacIntyre, Lalo Martins, Chad Netzer,
2350Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco
2351Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler,
2352Just van~Rossum.
Fred Drake03e10312002-03-26 19:17:43 +00002353
2354\end{document}