blob: 8172518675902a7f786e9e68c3e75b77b19f5f7b [file] [log] [blame]
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001\documentclass{howto}
2
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003\title{What's New in Python 2.0}
4\release{0.04}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00005\author{A.M. Kuchling and Moshe Zadka}
6\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00007\begin{document}
8\maketitle\tableofcontents
9
10\section{Introduction}
11
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000012{\large This is a draft document; please report inaccuracies and
13omissions to the authors. This document should not be treated as
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000014definitive; features described here might be removed or changed during
15the beta cycle before the final release of Python 2.0.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +000016}
17
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000018A new release of Python, version 2.0, will be released some time this
Andrew M. Kuchling70ba3822000-07-01 00:13:30 +000019summer. Beta versions are already available from
20\url{http://www.pythonlabs.com/tech/python2.html}. This article
21covers the exciting new features in 2.0, highlights some other useful
22changes, and points out a few incompatible changes that may require
23rewriting code.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000024
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000025Python's development never completely stops between releases, and a
26steady flow of bug fixes and improvements are always being submitted.
27A host of minor fixes, a few optimizations, additional docstrings, and
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000028better error messages went into 2.0; to list them all would be
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000029impossible, but they're certainly significant. Consult the
30publicly-available CVS logs if you want to see the full list.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000031
32% ======================================================================
33\section{Unicode}
34
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000035The largest new feature in Python 2.0 is a new fundamental data type:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000036Unicode strings. Unicode uses 16-bit numbers to represent characters
37instead of the 8-bit number used by ASCII, meaning that 65,536
38distinct characters can be supported.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000039
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000040The final interface for Unicode support was arrived at through
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000041countless often-stormy discussions on the python-dev mailing list, and
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000042mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
43type implementation by Fredrik Lundh. A detailed explanation of the
44interface is in the file \file{Misc/unicode.txt} in the Python source
45distribution; it's also available on the Web at
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000046\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000047This article will simply cover the most significant points from the
48full interface.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000049
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000050In Python source code, Unicode strings are written as
51\code{u"string"}. Arbitrary Unicode characters can be written using a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000052new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000534-digit hexadecimal number from 0000 to FFFF. The existing
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000054\code{\e x\var{HHHH}} escape sequence can also be used, and octal
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000055escapes can be used for characters up to U+01FF, which is represented
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000056by \code{\e 777}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000057
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000058Unicode strings, just like regular strings, are an immutable sequence
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000059type. They can be indexed and sliced, but not modified in place.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000060Unicode strings have an \method{encode( \optional{encoding} )} method
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000061that returns an 8-bit string in the desired encoding. Encodings are
62named by strings, such as \code{'ascii'}, \code{'utf-8'},
63\code{'iso-8859-1'}, or whatever. A codec API is defined for
64implementing and registering new encodings that are then available
65throughout a Python program. If an encoding isn't specified, the
66default encoding is usually 7-bit ASCII, though it can be changed for
67your Python installation by calling the
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +000068\function{sys.setdefaultencoding(\var{encoding})} function in a
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000069customised version of \file{site.py}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000070
71Combining 8-bit and Unicode strings always coerces to Unicode, using
72the default ASCII encoding; the result of \code{'a' + u'bc'} is
Andrew M. Kuchling7f6270d2000-06-09 02:48:18 +000073\code{u'abc'}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000074
75New built-in functions have been added, and existing built-ins
76modified to support Unicode:
77
78\begin{itemize}
79\item \code{unichr(\var{ch})} returns a Unicode string 1 character
80long, containing the character \var{ch}.
81
82\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
83
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000084\item \code{unicode(\var{string}, \optional{\var{encoding},}
85\optional{\var{errors}} ) } creates a Unicode string from an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000086string. \code{encoding} is a string naming the encoding to use.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000087The \code{errors} parameter specifies the treatment of characters that
88are invalid for the current encoding; passing \code{'strict'} as the
89value causes an exception to be raised on any encoding error, while
90\code{'ignore'} causes errors to be silently ignored and
91\code{'replace'} uses U+FFFD, the official replacement character, in
92case of any problems.
93
94\end{itemize}
95
96A new module, \module{unicodedata}, provides an interface to Unicode
97character properties. For example, \code{unicodedata.category(u'A')}
98returns the 2-character string 'Lu', the 'L' denoting it's a letter,
99and 'u' meaning that it's uppercase.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000100\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000101an Arabic number.
102
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000103The \module{codecs} module contains functions to look up existing encodings
104and register new ones. Unless you want to implement a
105new encoding, you'll most often use the
106\function{codecs.lookup(\var{encoding})} function, which returns a
1074-element tuple: \code{(\var{encode_func},
108\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000109
110\begin{itemize}
111\item \var{encode_func} is a function that takes a Unicode string, and
112returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
113is an 8-bit string containing a portion (perhaps all) of the Unicode
114string converted into the given encoding, and \var{length} tells you how much of the Unicode string was converted.
115
116\item \var{decode_func} is the mirror of \var{encode_func},
117taking a Unicode string and
118returns a 2-tuple \code{(\var{ustring}, \var{length})} containing a Unicode string
119and \var{length} telling you how much of the string was consumed.
120
121\item \var{stream_reader} is a class that supports decoding input from
122a stream. \var{stream_reader(\var{file_obj})} returns an object that
123supports the \method{read()}, \method{readline()}, and
124\method{readlines()} methods. These methods will all translate from
125the given encoding and return Unicode strings.
126
127\item \var{stream_writer}, similarly, is a class that supports
128encoding output to a stream. \var{stream_writer(\var{file_obj})}
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000129returns an object that supports the \method{write()} and
130\method{writelines()} methods. These methods expect Unicode strings,
131translating them to the given encoding on output.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000132\end{itemize}
133
134For example, the following code writes a Unicode string into a file,
135encoding it as UTF-8:
136
137\begin{verbatim}
138import codecs
139
140unistr = u'\u0660\u2000ab ...'
141
142(UTF8_encode, UTF8_decode,
143 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
144
145output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
146output.write( unistr )
147output.close()
148\end{verbatim}
149
150The following code would then read UTF-8 input from the file:
151
152\begin{verbatim}
153input = UTF8_streamread( open( '/tmp/output', 'rb') )
154print repr(input.read())
155input.close()
156\end{verbatim}
157
158Unicode-aware regular expressions are available through the
159\module{re} module, which has a new underlying implementation called
160SRE written by Fredrik Lundh of Secret Labs AB.
161
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000162A \code{-U} command line option was added which causes the Python
163compiler to interpret all string literals as Unicode string literals.
164This is intended to be used in testing and future-proofing your Python
165code, since some future version of Python may drop support for 8-bit
166strings and provide only Unicode strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000167
168% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000169\section{Distutils: Making Modules Easy to Install}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000170
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000171Before Python 2.0, installing modules was a tedious affair -- there
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000172was no way to figure out automatically where Python is installed, or
173what compiler options to use for extension modules. Software authors
174had to go through an ardous ritual of editing Makefiles and
175configuration files, which only really work on Unix and leave Windows
176and MacOS unsupported. Software users faced wildly differing
177installation instructions
178
179The SIG for distribution utilities, shepherded by Greg Ward, has
180created the Distutils, a system to make package installation much
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000181easier. They form the \module{distutils} package, a new part of
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000182Python's standard library. In the best case, installing a Python
183module from source will require the same steps: first you simply mean
184unpack the tarball or zip archive, and the run ``\code{python setup.py
185install}''. The platform will be automatically detected, the compiler
186will be recognized, C extension modules will be compiled, and the
187distribution installed into the proper directory. Optional
188command-line arguments provide more control over the installation
189process, the distutils package offers many places to override defaults
190-- separating the build from the install, building or installing in
191non-default directories, and more.
192
193In order to use the Distutils, you need to write a \file{setup.py}
194script. For the simple case, when the software contains only .py
195files, a minimal \file{setup.py} can be just a few lines long:
196
197\begin{verbatim}
198from distutils.core import setup
199setup (name = "foo", version = "1.0",
200 py_modules = ["module1", "module2"])
201\end{verbatim}
202
203The \file{setup.py} file isn't much more complicated if the software
204consists of a few packages:
205
206\begin{verbatim}
207from distutils.core import setup
208setup (name = "foo", version = "1.0",
209 packages = ["package", "package.subpackage"])
210\end{verbatim}
211
212A C extension can be the most complicated case; here's an example taken from
213the PyXML package:
214
215
216\begin{verbatim}
217from distutils.core import setup, Extension
218
219expat_extension = Extension('xml.parsers.pyexpat',
220 define_macros = [('XML_NS', None)],
221 include_dirs = [ 'extensions/expat/xmltok',
222 'extensions/expat/xmlparse' ],
223 sources = [ 'extensions/pyexpat.c',
224 'extensions/expat/xmltok/xmltok.c',
225 'extensions/expat/xmltok/xmlrole.c',
226 ]
227 )
228setup (name = "PyXML", version = "0.5.4",
229 ext_modules =[ expat_extension ] )
230
231\end{verbatim}
232
233The Distutils can also take care of creating source and binary
234distributions. The ``sdist'' command, run by ``\code{python setup.py
235sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
236Adding new commands isn't difficult, and a ``bdist_rpm'' command has
237already been contributed to create an RPM distribution for the
238software. Commands to create Windows installer programs, Debian
239packages, and Solaris .pkg files have been discussed and are in
240various stages of development.
241
242All this is documented in a new manual, \textit{Distributing Python
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000243Modules}, that joins the basic set of Python documentation.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000244
245% ======================================================================
246\section{String Methods}
247
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000248Until now string-manipulation functionality was in the \module{string}
249Python module, which was usually a front-end for the \module{strop}
250module written in C. The addition of Unicode posed a difficulty for
251the \module{strop} module, because the functions would all need to be
252rewritten in order to accept either 8-bit or Unicode strings. For
253functions such as \function{string.replace()}, which takes 3 string
254arguments, that means eight possible permutations, and correspondingly
255complicated code.
256
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000257Instead, Python 2.0 pushes the problem onto the string type, making
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000258string manipulation functionality available through methods on both
2598-bit strings and Unicode strings.
260
261\begin{verbatim}
262>>> 'andrew'.capitalize()
263'Andrew'
264>>> 'hostname'.replace('os', 'linux')
265'hlinuxtname'
266>>> 'moshe'.find('sh')
2672
268\end{verbatim}
269
270One thing that hasn't changed, April Fools' jokes notwithstanding, is
271that Python strings are immutable. Thus, the string methods return new
272strings, and do not modify the string on which they operate.
273
274The old \module{string} module is still around for backwards
275compatibility, but it mostly acts as a front-end to the new string
276methods.
277
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000278Two methods which have no parallel in pre-2.0 versions, although they
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000279did exist in JPython for quite some time, are \method{startswith()}
280and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
281== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
282
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000283%One other method which deserves special mention is \method{join}. The
284%\method{join} method of a string receives one parameter, a sequence of
285%strings, and is equivalent to the \function{string.join} function from
286%the old \module{string} module, with the arguments reversed. In other
287%words, \code{s.join(seq)} is equivalent to the old
288%\code{string.join(seq, s)}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000289
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000290% ======================================================================
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000291\section{Porting to 2.0}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000292
293New Python releases try hard to be compatible with previous releases,
294and the record has been pretty good. However, some changes are
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000295considered useful enough, often fixing initial design decisions that
296turned to be actively mistaken, that breaking backward compatibility
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000297can't always be avoided. This section lists the changes in Python 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000298that may cause old Python code to break.
299
300The change which will probably break the most code is tightening up
301the arguments accepted by some methods. Some methods would take
302multiple arguments and treat them as a tuple, particularly various
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000303list methods such as \method{.append()} and \method{.insert()}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000304In earlier versions of Python, if \code{L} is a list, \code{L.append(
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003051,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000306causes a \exception{TypeError} exception to be raised, with the
307message: 'append requires exactly 1 argument; 2 given'. The fix is to
308simply add an extra set of parentheses to pass both values as a tuple:
309\code{L.append( (1,2) )}.
310
311The earlier versions of these methods were more forgiving because they
312used an old function in Python's C interface to parse their arguments;
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003132.0 modernizes them to use \function{PyArg_ParseTuple}, the current
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000314argument parsing function, which provides more helpful error messages
315and treats multi-argument calls as errors. If you absolutely must use
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003162.0 but can't fix your code, you can edit \file{Objects/listobject.c}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000317and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
318preserve the old behaviour; this isn't recommended.
319
320Some of the functions in the \module{socket} module are still
321forgiving in this way. For example, \function{socket.connect(
322('hostname', 25) )} is the correct form, passing a tuple representing
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000323an IP address, but \function{socket.connect( 'hostname', 25 )} also
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000324works. \function{socket.connect_ex()} and \function{socket.bind()} are
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000325similarly easy-going. 2.0alpha1 tightened these functions up, but
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000326because the documentation actually used the erroneous multiple
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000327argument form, many people wrote code which would break with the
328stricter checking. GvR backed out the changes in the face of public
329reaction, so for the\module{socket} module, the documentation was
330fixed and the multiple argument form is simply marked as deprecated;
331it \emph{will} be tightened up again in a future Python version.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000332
333Some work has been done to make integers and long integers a bit more
334interchangeable. In 1.5.2, large-file support was added for Solaris,
335to allow reading files larger than 2Gb; this made the \method{tell()}
336method of file objects return a long integer instead of a regular
337integer. Some code would subtract two file offsets and attempt to use
338the result to multiply a sequence or slice a string, but this raised a
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000339\exception{TypeError}. In 2.0, long integers can be used to multiply
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000340or slice a sequence, and it'll behave as you'd intuitively expect it
341to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
342(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
343various new places where previously only integers were accepted, such
344as in the \method{seek()} method of file objects.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000345
346The subtlest long integer change of all is that the \function{str()}
347of a long integer no longer has a trailing 'L' character, though
348\function{repr()} still includes it. The 'L' annoyed many people who
349wanted to print long integers that looked just like regular integers,
350since they had to go out of their way to chop off the character. This
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000351is no longer a problem in 2.0, but code which assumes the 'L' is
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000352there, and does \code{str(longval)[:-1]} will now lose the final
353digit.
354
355Taking the \function{repr()} of a float now uses a different
356formatting precision than \function{str()}. \function{repr()} uses
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000357\code{\%.17g} format string for C's \function{sprintf()}, while
358\function{str()} uses \code{\%.12g} as before. The effect is that
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000359\function{repr()} may occasionally show more decimal places than
360\function{str()}, for numbers
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000361For example, the number 8.1 can't be represented exactly in binary, so
362\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
363\code{'8.1'}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000364
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000365The \code{-X} command-line option, which turned all standard
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000366exceptions into strings instead of classes, has been removed; the
367standard exceptions will now always be classes. The
368\module{exceptions} module containing the standard exceptions was
369translated from Python to a built-in C module, written by Barry Warsaw
370and Fredrik Lundh.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000371
372% ======================================================================
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000373\section{Optional Collection of Cycles}
374
375The C implementation of Python uses reference counting to implement
376garbage collection. Every Python object maintains a count of the
377number of references pointing to itself, and adjusts the count as
378references are created or destroyed. Once the reference count reaches
379zero, the object is no longer accessible, since you need to have a
380reference to an object to access it, and if the count is zero, no
381references exist any longer.
382
383Reference counting has some pleasant properties: it's easy to
384understand and implement, and the resulting implementation is
385portable, fairly fast, and reacts well with other libraries that
386implement their own memory handling schemes. The major problem with
387reference counting is that it sometimes doesn't realise that objects
388are no longer accessible, resulting in a memory leak. This happens
389when there are cycles of references.
390
391Consider the simplest possible cycle,
392a class instance which has a reference to itself:
393
394\begin{verbatim}
395instance = SomeClass()
396instance.myself = instance
397\end{verbatim}
398
399After the above two lines of code have been executed, the reference
400count of \code{instance} is 2; one reference is from the variable
401named \samp{'instance'}, and the other is from the \samp{myself}
402attribute of the instance.
403
404If the next line of code is \code{del instance}, what happens? The
405reference count of \code{instance} is decreased by 1, so it has a
406reference count of 1; the reference in the \samp{myself} attribute
407still exists. Yet the instance is no longer accessible through Python
408code, and it could be deleted. Several objects can participate in a
409cycle if they have references to each other, causing all of the
410objects to be leaked.
411
412An experimental step has been made toward fixing this problem. When
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000413compiling Python, the \verb|--with-cycle-gc| option can be specified.
414This causes a cycle detection algorithm to be periodically executed,
415which looks for inaccessible cycles and deletes the objects involved.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000416A new \module{gc} module provides functions to perform a garbage
417collection, obtain debugging statistics, and tuning the collector's parameters.
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000418
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000419Why isn't cycle detection enabled by default? Running the cycle detection
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000420algorithm takes some time, and some tuning will be required to
421minimize the overhead cost. It's not yet obvious how much performance
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000422is lost, because benchmarking this is tricky and depends crucially
423on how often the program creates and destroys objects.
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000424
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000425Several people tackled this problem and contributed to a solution. An
426early implementation of the cycle detection approach was written by
427Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
428during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
429wrote two different implementations, which were later integrated by
430Neil. Lots of other people offered suggestions along the way; the
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000431March 2000 archives of the python-dev mailing list contain most of the
432relevant discussion, especially in the threads titled ``Reference
433cycle collection for Python'' and ``Finalization again''.
434
435
436% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000437\section{Core Changes}
438
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000439Various minor changes have been made to Python's syntax and built-in
440functions. None of the changes are very far-reaching, but they're
441handy conveniences.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000442
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000443A change to syntax makes it more convenient to call a given function
444with a tuple of arguments and/or a dictionary of keyword arguments.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000445In Python 1.5 and earlier, you do this with the \function{apply()}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000446built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
447function \function{f()} with the argument tuple \var{args} and the
448keyword arguments in the dictionary \var{kw}. Thanks to a patch from
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000449Greg Ewing, 2.0 adds \code{f(*\var{args}, **\var{kw})} as a shorter
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000450and clearer way to achieve the same effect. This syntax is
451symmetrical with the syntax for defining functions:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000452
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000453\begin{verbatim}
454def f(*args, **kw):
455 # args is a tuple of positional args,
456 # kw is a dictionary of keyword args
457 ...
458\end{verbatim}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000459
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000460A new format style is available when using the \code{\%} operator.
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000461'\%r' will insert the \function{repr()} of its argument. This was
462also added from symmetry considerations, this time for symmetry with
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000463the existing '\%s' format style, which inserts the \function{str()} of
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000464its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000465string containing \verb|'abc' abc|.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000466
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000467The \function{int()} and \function{long()} functions now accept an
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000468optional ``base'' parameter when the first argument is a string.
469\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
470291. \code{int(123, 16)} raises a \exception{TypeError} exception
471with the message ``can't convert non-string with explicit base''.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000472
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000473Previously there was no way to implement a class that overrode
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000474Python's built-in \keyword{in} operator and implemented a custom
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000475version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
476present in the sequence \var{seq}; Python computes this by simply
477trying every index of the sequence until either \var{obj} is found or
478an \exception{IndexError} is encountered. Moshe Zadka contributed a
479patch which adds a \method{__contains__} magic method for providing a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000480custom implementation for \keyword{in}. Additionally, new built-in
481objects written in C can define what \keyword{in} means for them via a
482new slot in the sequence protocol.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000483
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000484Earlier versions of Python used a recursive algorithm for deleting
485objects. Deeply nested data structures could cause the interpreter to
486fill up the C stack and crash; Christian Tismer rewrote the deletion
487logic to fix this problem. On a related note, comparing recursive
488objects recursed infinitely and crashed; Jeremy Hylton rewrote the
489code to no longer crash, producing a useful result instead. For
490example, after this code:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000491
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000492\begin{verbatim}
493a = []
494b = []
495a.append(a)
496b.append(b)
497\end{verbatim}
498
499The comparison \code{a==b} returns true, because the two recursive
500data structures are isomorphic.
501\footnote{See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links.
502%http://www.python.org/pipermail/python-dev/2000-April/004834.html
503}
504
505Work has been done on porting Python to 64-bit Windows on the Itanium
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000506processor, mostly by Trent Mick of ActiveState. (Confusingly, \code{sys.platform} is still \code{'win32'} on
507Win64 because it seems that for ease of porting, MS Visual C++ treats code
508as 32 bit.
509) PythonWin also supports Windows CE; see the Python CE page at
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000510\url{http://starship.python.net/crew/mhammond/ce/} for more information.
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000511
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000512An attempt has been made to alleviate one of Python's warts, the
513often-confusing \exception{NameError} exception when code refers to a
514local variable before the variable has been assigned a value. For
515example, the following code raises an exception on the \keyword{print}
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000516statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
517exception is raised, while 2.0 raises a new
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000518\exception{UnboundLocalError} exception.
519\exception{UnboundLocalError} is a subclass of \exception{NameError},
520so any existing code that expects \exception{NameError} to be raised
521should still work.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000522
523\begin{verbatim}
524def f():
525 print "i=",i
526 i = i + 1
527f()
528\end{verbatim}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000529
530A new variable holding more detailed version information has been
531added to the \module{sys} module. \code{sys.version_info} is a tuple
532\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
Andrew M. Kuchling8357c4c2000-07-01 00:14:43 +0000533\var{serial})} For example, in a hypothetical 2.0.1beta1,
534\code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
535\var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
536\code{""} for a final release.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000537
538% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000539\section{Extending/Embedding Changes}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000540
541Some of the changes are under the covers, and will only be apparent to
Andrew M. Kuchling8357c4c2000-07-01 00:14:43 +0000542people writing C extension modules or embedding a Python interpreter
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000543in a larger application. If you aren't dealing with Python's C API,
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000544you can safely skip this section.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000545
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000546The version number of the Python C API was incremented, so C
547extensions compiled for 1.5.2 must be recompiled in order to work with
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005482.0. On Windows, attempting to import a third party extension built
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000549for Python 1.5.x usually results in an immediate crash; there's not
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000550much we can do about this. (Here's Mark Hammond's explanation of the
551reasons for the crash. The 1.5 module is linked against
552\file{Python15.dll}. When \file{Python.exe} , linked against
553\file{Python16.dll}, starts up, it initializes the Python data
554structures in \file{Python16.dll}. When Python then imports the
555module \file{foo.pyd} linked against \file{Python15.dll}, it
556immediately tries to call the functions in that DLL. As Python has
557not been initialized in that DLL, the program immediately crashes.)
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000558
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000559Users of Jim Fulton's ExtensionClass module will be pleased to find
560out that hooks have been added so that ExtensionClasses are now
561supported by \function{isinstance()} and \function{issubclass()}.
562This means you no longer have to remember to write code such as
563\code{if type(obj) == myExtensionClass}, but can use the more natural
564\code{if isinstance(obj, myExtensionClass)}.
565
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000566The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000567support dynamic loading on many different platforms, was cleaned up
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000568and reorganised by Greg Stein. \file{importdl.c} is now quite small,
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000569and platform-specific code has been moved into a bunch of
570\file{Python/dynload_*.c} files.
571
572Vladimir Marangozov's long-awaited malloc restructuring was completed,
573to make it easy to have the Python interpreter use a custom allocator
574instead of C's standard \function{malloc()}. For documentation, read
575the comments in \file{Include/mymalloc.h} and
576\file{Include/objimpl.h}. For the lengthy discussions during which
577the interface was hammered out, see the Web archives of the 'patches'
578and 'python-dev' lists at python.org.
579
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000580Recent versions of the GUSI development environment for MacOS support
581POSIX threads. Therefore, Python's POSIX threading support now works
582on the Macintosh. Threading support using the user-space GNU \texttt{pth}
583library was also contributed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000584
585Threading support on Windows was enhanced, too. Windows supports
586thread locks that use kernel objects only in case of contention; in
587the common case when there's no contention, they use simpler functions
588which are an order of magnitude faster. A threaded version of Python
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005891.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000590changes, the difference is only 10\%. These improvements were
591contributed by Yakov Markovitch.
592
593% ======================================================================
594\section{Module changes}
595
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000596Lots of improvements and bugfixes were made to Python's extensive
597standard library; some of the affected modules include
598\module{readline}, \module{ConfigParser}, \module{cgi},
599\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
600\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
601and \module{nntplib}. Consult the CVS logs for the exact
602patch-by-patch details.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000603
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000604Brian Gallew contributed OpenSSL support for the \module{socket}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000605module. OpenSSL is an implementation of the Secure Socket Layer,
606which encrypts the data being sent over a socket. When compiling
607Python, you can edit \file{Modules/Setup} to include SSL support,
608which adds an additional function to the \module{socket} module:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000609\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000610which takes a socket object and returns an SSL socket. The
611\module{httplib} and \module{urllib} modules were also changed to
612support ``https://'' URLs, though no one has implemented FTP or SMTP
613over SSL.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000614
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000615The \module{httplib} module has been rewritten by Greg Stein to
616support HTTP/1.1. Backward compatibility with the 1.5 version of
617\module{httplib} is provided, though using HTTP/1.1 features such as
618pipelining will require rewriting code to use a different set of
619interfaces.
620
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000621The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
6228.3, and support for the older 7.x versions has been dropped. The
623Tkinter module also supports displaying Unicode strings in Tk
624widgets.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000625
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000626The \module{curses} module has been greatly extended, starting from
627Oliver Andrich's enhanced version, to provide many additional
628functions from ncurses and SYSV curses, such as colour, alternative
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000629character set support, pads, and mouse support. This means the module
630is no longer compatible with operating systems that only have BSD
631curses, but there don't seem to be any currently maintained OSes that
632fall into this category.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000633
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000634As mentioned in the earlier discussion of 2.0's Unicode support, the
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000635underlying implementation of the regular expressions provided by the
636\module{re} module has been changed. SRE, a new regular expression
637engine written by Fredrik Lundh and partially funded by Hewlett
638Packard, supports matching against both 8-bit strings and Unicode
639strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000640
641% ======================================================================
642\section{New modules}
643
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000644A number of new modules were added. We'll simply list them with brief
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000645descriptions; consult the 2.0 documentation for the details of a
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000646particular module.
647
648\begin{itemize}
649
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000650\item{\module{atexit}}:
651For registering functions to be called before the Python interpreter exits.
652Code that currently sets
653\code{sys.exitfunc} directly should be changed to
654use the \module{atexit} module instead, importing \module{atexit}
655and calling \function{atexit.register()} with
656the function to be called on exit.
657(Contributed by Skip Montanaro.)
658
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000659\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
660
661\item{\module{filecmp}:} Supersedes the old \module{cmp} and
662\module{dircmp} modules, which have now become deprecated.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000663(Contributed by Gordon MacMillan and Moshe Zadka.)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000664
665\item{\module{linuxaudio}:} Support for the \file{/dev/audio} device on Linux,
666a twin to the existing \module{sunaudiodev} module.
667(Contributed by Peter Bosch.)
668
669\item{\module{mmap}:} An interface to memory-mapped files on both
670Windows and Unix. A file's contents can be mapped directly into
671memory, at which point it behaves like a mutable string, so its
672contents can be read and modified. They can even be passed to
673functions that expect ordinary strings, such as the \module{re}
674module. (Contributed by Sam Rushing, with some extensions by
675A.M. Kuchling.)
676
677\item{\module{PyExpat}:} An interface to the Expat XML parser.
678(Contributed by Paul Prescod.)
679
680\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
681used for writing Web spiders that politely avoid certain areas of a
682Web site. The parser accepts the contents of a \file{robots.txt} file
683builds a set of rules from it, and can then answer questions about
684the fetchability of a given URL. (Contributed by Skip Montanaro.)
685
686\item{\module{tabnanny}:} A module/script to
687checks Python source code for ambiguous indentation.
688(Contributed by Tim Peters.)
689
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000690\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
691
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000692\item{\module{winreg} and \module{_winreg}:} An interface to the
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000693Windows registry. \module{winreg} has been part of PythonWin since
6941995, but now has been added to the core distribution, and enhanced to
695support Unicode. \module{_winreg} is a low-level wrapper of the
696Windows registry functions, contributed by Bill Tutt and Mark Hammond,
697while \module{winreg} is a higher-level, more object-oriented API on top of
698\module{_winreg}, designed by Thomas Heller and implemented by Paul Prescod.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000699
700\item{\module{zipfile}:} A module for reading and writing ZIP-format
701archives. These are archives produced by \program{PKZIP} on
702DOS/Windows or \program{zip} on Unix, not to be confused with
703\program{gzip}-format files (which are supported by the \module{gzip}
704module)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000705(Contributed by James C. Ahlstrom.)
706
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000707\item{\module{imputil}:} A module that provides a simpler way for
708writing customised import hooks, in comparison to the existing
709\module{ihooks} module. (Implemented by Greg Stein, with much
710discussion on python-dev along the way.)
711
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000712\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000713
714% ======================================================================
715\section{IDLE Improvements}
716
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000717IDLE is the official Python cross-platform IDE, written using Tkinter.
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000718Python 2.0 includes IDLE 0.6, which adds a number of new features and
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000719improvements. A partial list:
720
721\begin{itemize}
722\item UI improvements and optimizations,
723especially in the area of syntax highlighting and auto-indentation.
724
725\item The class browser now shows more information, such as the top
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000726level functions in a module.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000727
728\item Tab width is now a user settable option. When opening an existing Python
729file, IDLE automatically detects the indentation conventions, and adapts.
730
731\item There is now support for calling browsers on various platforms,
732used to open the Python documentation in a browser.
733
734\item IDLE now has a command line, which is largely similar to
735the vanilla Python interpreter.
736
737\item Call tips were added in many places.
738
739\item IDLE can now be installed as a package.
740
741\item In the editor window, there is now a line/column bar at the bottom.
742
743\item Three new keystroke commands: Check module (Alt-F5), Import
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000744module (F5) and Run script (Ctrl-F5).
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000745
746\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000747
748% ======================================================================
749\section{Deleted and Deprecated Modules}
750
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000751A few modules have been dropped because they're obsolete, or because
752there are now better ways to do the same thing. The \module{stdwin}
753module is gone; it was for a platform-independent windowing toolkit
754that's no longer developed.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000755
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000756A number of modules have been moved to the
757\file{lib-old} subdirectory:
758\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
759\module{find}, \module{grep}, \module{packmail},
760\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
761If you have code which relies on a module that's been moved to
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000762\file{lib-old}, you can simply add that directory to \code{sys.path}
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000763to get them back, but you're encouraged to update any code that uses
764these modules.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000765
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000766\section{Acknowledgements}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000767
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000768The author would like to thank the following people for offering
769suggestions on drafts of this article: Fredrik Lundh, Skip
770Montanaro, Vladimir Marangozov, Guido van Rossum, Neil Schemenauer.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000771
772\end{document}
773