blob: 08ce67e07022872c7e873169a54f13aa376f1a94 [file] [log] [blame]
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001\documentclass{howto}
2
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003\title{What's New in Python 2.0}
4\release{0.04}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00005\author{A.M. Kuchling and Moshe Zadka}
6\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00007\begin{document}
8\maketitle\tableofcontents
9
10\section{Introduction}
11
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000012{\large This is a draft document; please report inaccuracies and
13omissions to the authors. This document should not be treated as
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000014definitive; features described here might be removed or changed during
15the beta cycle before the final release of Python 2.0.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +000016}
17
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000018A new release of Python, version 2.0, will be released some time this
Andrew M. Kuchling70ba3822000-07-01 00:13:30 +000019summer. Beta versions are already available from
20\url{http://www.pythonlabs.com/tech/python2.html}. This article
21covers the exciting new features in 2.0, highlights some other useful
22changes, and points out a few incompatible changes that may require
23rewriting code.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000024
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000025Python's development never completely stops between releases, and a
26steady flow of bug fixes and improvements are always being submitted.
27A host of minor fixes, a few optimizations, additional docstrings, and
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000028better error messages went into 2.0; to list them all would be
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000029impossible, but they're certainly significant. Consult the
30publicly-available CVS logs if you want to see the full list.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000031
32% ======================================================================
33\section{Unicode}
34
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000035The largest new feature in Python 2.0 is a new fundamental data type:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000036Unicode strings. Unicode uses 16-bit numbers to represent characters
37instead of the 8-bit number used by ASCII, meaning that 65,536
38distinct characters can be supported.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000039
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000040The final interface for Unicode support was arrived at through
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000041countless often-stormy discussions on the python-dev mailing list, and
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000042mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
43type implementation by Fredrik Lundh. A detailed explanation of the
44interface is in the file \file{Misc/unicode.txt} in the Python source
45distribution; it's also available on the Web at
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000046\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000047This article will simply cover the most significant points from the
48full interface.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000049
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000050In Python source code, Unicode strings are written as
51\code{u"string"}. Arbitrary Unicode characters can be written using a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000052new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000534-digit hexadecimal number from 0000 to FFFF. The existing
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000054\code{\e x\var{HHHH}} escape sequence can also be used, and octal
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000055escapes can be used for characters up to U+01FF, which is represented
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000056by \code{\e 777}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000057
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000058Unicode strings, just like regular strings, are an immutable sequence
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000059type. They can be indexed and sliced, but not modified in place.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000060Unicode strings have an \method{encode( \optional{encoding} )} method
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000061that returns an 8-bit string in the desired encoding. Encodings are
62named by strings, such as \code{'ascii'}, \code{'utf-8'},
63\code{'iso-8859-1'}, or whatever. A codec API is defined for
64implementing and registering new encodings that are then available
65throughout a Python program. If an encoding isn't specified, the
66default encoding is usually 7-bit ASCII, though it can be changed for
67your Python installation by calling the
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +000068\function{sys.setdefaultencoding(\var{encoding})} function in a
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000069customised version of \file{site.py}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000070
71Combining 8-bit and Unicode strings always coerces to Unicode, using
72the default ASCII encoding; the result of \code{'a' + u'bc'} is
Andrew M. Kuchling7f6270d2000-06-09 02:48:18 +000073\code{u'abc'}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000074
75New built-in functions have been added, and existing built-ins
76modified to support Unicode:
77
78\begin{itemize}
79\item \code{unichr(\var{ch})} returns a Unicode string 1 character
80long, containing the character \var{ch}.
81
82\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
83
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000084\item \code{unicode(\var{string}, \optional{\var{encoding},}
85\optional{\var{errors}} ) } creates a Unicode string from an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000086string. \code{encoding} is a string naming the encoding to use.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000087The \code{errors} parameter specifies the treatment of characters that
88are invalid for the current encoding; passing \code{'strict'} as the
89value causes an exception to be raised on any encoding error, while
90\code{'ignore'} causes errors to be silently ignored and
91\code{'replace'} uses U+FFFD, the official replacement character, in
92case of any problems.
93
94\end{itemize}
95
96A new module, \module{unicodedata}, provides an interface to Unicode
97character properties. For example, \code{unicodedata.category(u'A')}
98returns the 2-character string 'Lu', the 'L' denoting it's a letter,
99and 'u' meaning that it's uppercase.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000100\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000101an Arabic number.
102
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000103The \module{codecs} module contains functions to look up existing encodings
104and register new ones. Unless you want to implement a
105new encoding, you'll most often use the
106\function{codecs.lookup(\var{encoding})} function, which returns a
1074-element tuple: \code{(\var{encode_func},
108\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000109
110\begin{itemize}
111\item \var{encode_func} is a function that takes a Unicode string, and
112returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
113is an 8-bit string containing a portion (perhaps all) of the Unicode
114string converted into the given encoding, and \var{length} tells you how much of the Unicode string was converted.
115
116\item \var{decode_func} is the mirror of \var{encode_func},
117taking a Unicode string and
118returns a 2-tuple \code{(\var{ustring}, \var{length})} containing a Unicode string
119and \var{length} telling you how much of the string was consumed.
120
121\item \var{stream_reader} is a class that supports decoding input from
122a stream. \var{stream_reader(\var{file_obj})} returns an object that
123supports the \method{read()}, \method{readline()}, and
124\method{readlines()} methods. These methods will all translate from
125the given encoding and return Unicode strings.
126
127\item \var{stream_writer}, similarly, is a class that supports
128encoding output to a stream. \var{stream_writer(\var{file_obj})}
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000129returns an object that supports the \method{write()} and
130\method{writelines()} methods. These methods expect Unicode strings,
131translating them to the given encoding on output.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000132\end{itemize}
133
134For example, the following code writes a Unicode string into a file,
135encoding it as UTF-8:
136
137\begin{verbatim}
138import codecs
139
140unistr = u'\u0660\u2000ab ...'
141
142(UTF8_encode, UTF8_decode,
143 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
144
145output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
146output.write( unistr )
147output.close()
148\end{verbatim}
149
150The following code would then read UTF-8 input from the file:
151
152\begin{verbatim}
153input = UTF8_streamread( open( '/tmp/output', 'rb') )
154print repr(input.read())
155input.close()
156\end{verbatim}
157
158Unicode-aware regular expressions are available through the
159\module{re} module, which has a new underlying implementation called
160SRE written by Fredrik Lundh of Secret Labs AB.
161
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000162A \code{-U} command line option was added which causes the Python
163compiler to interpret all string literals as Unicode string literals.
164This is intended to be used in testing and future-proofing your Python
165code, since some future version of Python may drop support for 8-bit
166strings and provide only Unicode strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000167
168% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000169\section{Distutils: Making Modules Easy to Install}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000170
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000171Before Python 2.0, installing modules was a tedious affair -- there
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000172was no way to figure out automatically where Python is installed, or
173what compiler options to use for extension modules. Software authors
174had to go through an ardous ritual of editing Makefiles and
175configuration files, which only really work on Unix and leave Windows
176and MacOS unsupported. Software users faced wildly differing
177installation instructions
178
179The SIG for distribution utilities, shepherded by Greg Ward, has
180created the Distutils, a system to make package installation much
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000181easier. They form the \module{distutils} package, a new part of
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000182Python's standard library. In the best case, installing a Python
183module from source will require the same steps: first you simply mean
184unpack the tarball or zip archive, and the run ``\code{python setup.py
185install}''. The platform will be automatically detected, the compiler
186will be recognized, C extension modules will be compiled, and the
187distribution installed into the proper directory. Optional
188command-line arguments provide more control over the installation
189process, the distutils package offers many places to override defaults
190-- separating the build from the install, building or installing in
191non-default directories, and more.
192
193In order to use the Distutils, you need to write a \file{setup.py}
194script. For the simple case, when the software contains only .py
195files, a minimal \file{setup.py} can be just a few lines long:
196
197\begin{verbatim}
198from distutils.core import setup
199setup (name = "foo", version = "1.0",
200 py_modules = ["module1", "module2"])
201\end{verbatim}
202
203The \file{setup.py} file isn't much more complicated if the software
204consists of a few packages:
205
206\begin{verbatim}
207from distutils.core import setup
208setup (name = "foo", version = "1.0",
209 packages = ["package", "package.subpackage"])
210\end{verbatim}
211
212A C extension can be the most complicated case; here's an example taken from
213the PyXML package:
214
215
216\begin{verbatim}
217from distutils.core import setup, Extension
218
219expat_extension = Extension('xml.parsers.pyexpat',
220 define_macros = [('XML_NS', None)],
221 include_dirs = [ 'extensions/expat/xmltok',
222 'extensions/expat/xmlparse' ],
223 sources = [ 'extensions/pyexpat.c',
224 'extensions/expat/xmltok/xmltok.c',
225 'extensions/expat/xmltok/xmlrole.c',
226 ]
227 )
228setup (name = "PyXML", version = "0.5.4",
229 ext_modules =[ expat_extension ] )
230
231\end{verbatim}
232
233The Distutils can also take care of creating source and binary
234distributions. The ``sdist'' command, run by ``\code{python setup.py
235sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
236Adding new commands isn't difficult, and a ``bdist_rpm'' command has
237already been contributed to create an RPM distribution for the
238software. Commands to create Windows installer programs, Debian
239packages, and Solaris .pkg files have been discussed and are in
240various stages of development.
241
242All this is documented in a new manual, \textit{Distributing Python
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000243Modules}, that joins the basic set of Python documentation.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000244
245% ======================================================================
246\section{String Methods}
247
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000248Until now string-manipulation functionality was in the \module{string}
249Python module, which was usually a front-end for the \module{strop}
250module written in C. The addition of Unicode posed a difficulty for
251the \module{strop} module, because the functions would all need to be
252rewritten in order to accept either 8-bit or Unicode strings. For
253functions such as \function{string.replace()}, which takes 3 string
254arguments, that means eight possible permutations, and correspondingly
255complicated code.
256
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000257Instead, Python 2.0 pushes the problem onto the string type, making
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000258string manipulation functionality available through methods on both
2598-bit strings and Unicode strings.
260
261\begin{verbatim}
262>>> 'andrew'.capitalize()
263'Andrew'
264>>> 'hostname'.replace('os', 'linux')
265'hlinuxtname'
266>>> 'moshe'.find('sh')
2672
268\end{verbatim}
269
270One thing that hasn't changed, April Fools' jokes notwithstanding, is
271that Python strings are immutable. Thus, the string methods return new
272strings, and do not modify the string on which they operate.
273
274The old \module{string} module is still around for backwards
275compatibility, but it mostly acts as a front-end to the new string
276methods.
277
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000278Two methods which have no parallel in pre-2.0 versions, although they
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000279did exist in JPython for quite some time, are \method{startswith()}
280and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
281== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
282
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000283%One other method which deserves special mention is \method{join}. The
284%\method{join} method of a string receives one parameter, a sequence of
285%strings, and is equivalent to the \function{string.join} function from
286%the old \module{string} module, with the arguments reversed. In other
287%words, \code{s.join(seq)} is equivalent to the old
288%\code{string.join(seq, s)}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000289
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000290% ======================================================================
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000291\section{Porting to 2.0}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000292
293New Python releases try hard to be compatible with previous releases,
294and the record has been pretty good. However, some changes are
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000295considered useful enough, often fixing initial design decisions that
296turned to be actively mistaken, that breaking backward compatibility
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000297can't always be avoided. This section lists the changes in Python 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000298that may cause old Python code to break.
299
300The change which will probably break the most code is tightening up
301the arguments accepted by some methods. Some methods would take
302multiple arguments and treat them as a tuple, particularly various
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000303list methods such as \method{.append()} and \method{.insert()}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000304In earlier versions of Python, if \code{L} is a list, \code{L.append(
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003051,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000306causes a \exception{TypeError} exception to be raised, with the
307message: 'append requires exactly 1 argument; 2 given'. The fix is to
308simply add an extra set of parentheses to pass both values as a tuple:
309\code{L.append( (1,2) )}.
310
311The earlier versions of these methods were more forgiving because they
312used an old function in Python's C interface to parse their arguments;
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003132.0 modernizes them to use \function{PyArg_ParseTuple}, the current
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000314argument parsing function, which provides more helpful error messages
315and treats multi-argument calls as errors. If you absolutely must use
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00003162.0 but can't fix your code, you can edit \file{Objects/listobject.c}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000317and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
318preserve the old behaviour; this isn't recommended.
319
320Some of the functions in the \module{socket} module are still
321forgiving in this way. For example, \function{socket.connect(
322('hostname', 25) )} is the correct form, passing a tuple representing
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000323an IP address, but \function{socket.connect( 'hostname', 25 )} also
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000324works. \function{socket.connect_ex()} and \function{socket.bind()} are
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000325similarly easy-going. 2.0alpha1 tightened these functions up, but
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000326because the documentation actually used the erroneous multiple
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000327argument form, many people wrote code which would break with the
328stricter checking. GvR backed out the changes in the face of public
329reaction, so for the\module{socket} module, the documentation was
330fixed and the multiple argument form is simply marked as deprecated;
331it \emph{will} be tightened up again in a future Python version.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000332
333Some work has been done to make integers and long integers a bit more
334interchangeable. In 1.5.2, large-file support was added for Solaris,
335to allow reading files larger than 2Gb; this made the \method{tell()}
336method of file objects return a long integer instead of a regular
337integer. Some code would subtract two file offsets and attempt to use
338the result to multiply a sequence or slice a string, but this raised a
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000339\exception{TypeError}. In 2.0, long integers can be used to multiply
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000340or slice a sequence, and it'll behave as you'd intuitively expect it
341to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
342(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
343various new places where previously only integers were accepted, such
344as in the \method{seek()} method of file objects.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000345
346The subtlest long integer change of all is that the \function{str()}
347of a long integer no longer has a trailing 'L' character, though
348\function{repr()} still includes it. The 'L' annoyed many people who
349wanted to print long integers that looked just like regular integers,
350since they had to go out of their way to chop off the character. This
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000351is no longer a problem in 2.0, but code which assumes the 'L' is
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000352there, and does \code{str(longval)[:-1]} will now lose the final
353digit.
354
355Taking the \function{repr()} of a float now uses a different
356formatting precision than \function{str()}. \function{repr()} uses
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000357\code{\%.17g} format string for C's \function{sprintf()}, while
358\function{str()} uses \code{\%.12g} as before. The effect is that
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000359\function{repr()} may occasionally show more decimal places than
360\function{str()}, for numbers
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000361For example, the number 8.1 can't be represented exactly in binary, so
362\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
363\code{'8.1'}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000364
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000365The \code{-X} command-line option, which turned all standard
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000366exceptions into strings instead of classes, has been removed; the
367standard exceptions will now always be classes. The
368\module{exceptions} module containing the standard exceptions was
369translated from Python to a built-in C module, written by Barry Warsaw
370and Fredrik Lundh.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000371
372% ======================================================================
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000373\section{Optional Collection of Cycles}
374
375The C implementation of Python uses reference counting to implement
376garbage collection. Every Python object maintains a count of the
377number of references pointing to itself, and adjusts the count as
378references are created or destroyed. Once the reference count reaches
379zero, the object is no longer accessible, since you need to have a
380reference to an object to access it, and if the count is zero, no
381references exist any longer.
382
383Reference counting has some pleasant properties: it's easy to
384understand and implement, and the resulting implementation is
385portable, fairly fast, and reacts well with other libraries that
386implement their own memory handling schemes. The major problem with
387reference counting is that it sometimes doesn't realise that objects
388are no longer accessible, resulting in a memory leak. This happens
389when there are cycles of references.
390
391Consider the simplest possible cycle,
392a class instance which has a reference to itself:
393
394\begin{verbatim}
395instance = SomeClass()
396instance.myself = instance
397\end{verbatim}
398
399After the above two lines of code have been executed, the reference
400count of \code{instance} is 2; one reference is from the variable
401named \samp{'instance'}, and the other is from the \samp{myself}
402attribute of the instance.
403
404If the next line of code is \code{del instance}, what happens? The
405reference count of \code{instance} is decreased by 1, so it has a
406reference count of 1; the reference in the \samp{myself} attribute
407still exists. Yet the instance is no longer accessible through Python
408code, and it could be deleted. Several objects can participate in a
409cycle if they have references to each other, causing all of the
410objects to be leaked.
411
412An experimental step has been made toward fixing this problem. When
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000413compiling Python, the \verb|--with-cycle-gc| option can be specified.
414This causes a cycle detection algorithm to be periodically executed,
415which looks for inaccessible cycles and deletes the objects involved.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000416A new \module{gc} module provides functions to perform a garbage
417collection, obtain debugging statistics, and tuning the collector's parameters.
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000418
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000419Why isn't cycle detection enabled by default? Running the cycle detection
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000420algorithm takes some time, and some tuning will be required to
421minimize the overhead cost. It's not yet obvious how much performance
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000422is lost, because benchmarking this is tricky and depends crucially
423on how often the program creates and destroys objects.
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000424
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000425Several people tackled this problem and contributed to a solution. An
426early implementation of the cycle detection approach was written by
427Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
428during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
429wrote two different implementations, which were later integrated by
430Neil. Lots of other people offered suggestions along the way; the
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000431March 2000 archives of the python-dev mailing list contain most of the
432relevant discussion, especially in the threads titled ``Reference
433cycle collection for Python'' and ``Finalization again''.
434
435
436% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000437\section{Core Changes}
438
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000439Various minor changes have been made to Python's syntax and built-in
440functions. None of the changes are very far-reaching, but they're
441handy conveniences.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000442
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000443A change to syntax makes it more convenient to call a given function
444with a tuple of arguments and/or a dictionary of keyword arguments.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000445In Python 1.5 and earlier, you do this with the \function{apply()}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000446built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
447function \function{f()} with the argument tuple \var{args} and the
448keyword arguments in the dictionary \var{kw}. Thanks to a patch from
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000449Greg Ewing, 2.0 adds \code{f(*\var{args}, **\var{kw})} as a shorter
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000450and clearer way to achieve the same effect. This syntax is
451symmetrical with the syntax for defining functions:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000452
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000453\begin{verbatim}
454def f(*args, **kw):
455 # args is a tuple of positional args,
456 # kw is a dictionary of keyword args
457 ...
458\end{verbatim}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000459
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000460A new format style is available when using the \code{\%} operator.
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000461'\%r' will insert the \function{repr()} of its argument. This was
462also added from symmetry considerations, this time for symmetry with
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000463the existing '\%s' format style, which inserts the \function{str()} of
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000464its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000465string containing \verb|'abc' abc|.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000466
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000467The \function{int()} and \function{long()} functions now accept an
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000468optional ``base'' parameter when the first argument is a string.
469\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
470291. \code{int(123, 16)} raises a \exception{TypeError} exception
471with the message ``can't convert non-string with explicit base''.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000472
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000473Previously there was no way to implement a class that overrode
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000474Python's built-in \keyword{in} operator and implemented a custom
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000475version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
476present in the sequence \var{seq}; Python computes this by simply
477trying every index of the sequence until either \var{obj} is found or
478an \exception{IndexError} is encountered. Moshe Zadka contributed a
479patch which adds a \method{__contains__} magic method for providing a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000480custom implementation for \keyword{in}. Additionally, new built-in
481objects written in C can define what \keyword{in} means for them via a
482new slot in the sequence protocol.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000483
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000484Earlier versions of Python used a recursive algorithm for deleting
485objects. Deeply nested data structures could cause the interpreter to
486fill up the C stack and crash; Christian Tismer rewrote the deletion
487logic to fix this problem. On a related note, comparing recursive
488objects recursed infinitely and crashed; Jeremy Hylton rewrote the
489code to no longer crash, producing a useful result instead. For
490example, after this code:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000491
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000492\begin{verbatim}
493a = []
494b = []
495a.append(a)
496b.append(b)
497\end{verbatim}
498
499The comparison \code{a==b} returns true, because the two recursive
500data structures are isomorphic.
501\footnote{See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links.
502%http://www.python.org/pipermail/python-dev/2000-April/004834.html
503}
504
505Work has been done on porting Python to 64-bit Windows on the Itanium
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000506processor, mostly by Trent Mick of ActiveState. (Confusingly, \code{sys.platform} is still \code{'win32'} on
507Win64 because it seems that for ease of porting, MS Visual C++ treats code
508as 32 bit.
509) PythonWin also supports Windows CE; see the Python CE page at
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000510\url{http://starship.python.net/crew/mhammond/ce/} for more information.
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000511
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000512An attempt has been made to alleviate one of Python's warts, the
513often-confusing \exception{NameError} exception when code refers to a
514local variable before the variable has been assigned a value. For
515example, the following code raises an exception on the \keyword{print}
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000516statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
517exception is raised, while 2.0 raises a new
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000518\exception{UnboundLocalError} exception.
519\exception{UnboundLocalError} is a subclass of \exception{NameError},
520so any existing code that expects \exception{NameError} to be raised
521should still work.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000522
523\begin{verbatim}
524def f():
525 print "i=",i
526 i = i + 1
527f()
528\end{verbatim}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000529
530A new variable holding more detailed version information has been
531added to the \module{sys} module. \code{sys.version_info} is a tuple
532\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000533\var{serial})} For example, in 2.0a2 \code{sys.version_info} is
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000534\code{(1, 6, 0, 'alpha', 2)}. \var{level} is a string such as
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000535\code{"alpha"}, \code{"beta"}, or \code{""} for a final release.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000536
537% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000538\section{Extending/Embedding Changes}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000539
540Some of the changes are under the covers, and will only be apparent to
541people writing C extension modules, or embedding a Python interpreter
542in a larger application. If you aren't dealing with Python's C API,
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000543you can safely skip this section.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000544
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000545The version number of the Python C API was incremented, so C
546extensions compiled for 1.5.2 must be recompiled in order to work with
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005472.0. On Windows, attempting to import a third party extension built
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000548for Python 1.5.x usually results in an immediate crash; there's not
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000549much we can do about this. (Here's Mark Hammond's explanation of the
550reasons for the crash. The 1.5 module is linked against
551\file{Python15.dll}. When \file{Python.exe} , linked against
552\file{Python16.dll}, starts up, it initializes the Python data
553structures in \file{Python16.dll}. When Python then imports the
554module \file{foo.pyd} linked against \file{Python15.dll}, it
555immediately tries to call the functions in that DLL. As Python has
556not been initialized in that DLL, the program immediately crashes.)
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000557
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000558Users of Jim Fulton's ExtensionClass module will be pleased to find
559out that hooks have been added so that ExtensionClasses are now
560supported by \function{isinstance()} and \function{issubclass()}.
561This means you no longer have to remember to write code such as
562\code{if type(obj) == myExtensionClass}, but can use the more natural
563\code{if isinstance(obj, myExtensionClass)}.
564
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000565The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000566support dynamic loading on many different platforms, was cleaned up
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000567and reorganised by Greg Stein. \file{importdl.c} is now quite small,
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000568and platform-specific code has been moved into a bunch of
569\file{Python/dynload_*.c} files.
570
571Vladimir Marangozov's long-awaited malloc restructuring was completed,
572to make it easy to have the Python interpreter use a custom allocator
573instead of C's standard \function{malloc()}. For documentation, read
574the comments in \file{Include/mymalloc.h} and
575\file{Include/objimpl.h}. For the lengthy discussions during which
576the interface was hammered out, see the Web archives of the 'patches'
577and 'python-dev' lists at python.org.
578
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000579Recent versions of the GUSI development environment for MacOS support
580POSIX threads. Therefore, Python's POSIX threading support now works
581on the Macintosh. Threading support using the user-space GNU \texttt{pth}
582library was also contributed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000583
584Threading support on Windows was enhanced, too. Windows supports
585thread locks that use kernel objects only in case of contention; in
586the common case when there's no contention, they use simpler functions
587which are an order of magnitude faster. A threaded version of Python
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005881.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000589changes, the difference is only 10\%. These improvements were
590contributed by Yakov Markovitch.
591
592% ======================================================================
593\section{Module changes}
594
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000595Lots of improvements and bugfixes were made to Python's extensive
596standard library; some of the affected modules include
597\module{readline}, \module{ConfigParser}, \module{cgi},
598\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
599\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
600and \module{nntplib}. Consult the CVS logs for the exact
601patch-by-patch details.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000602
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000603Brian Gallew contributed OpenSSL support for the \module{socket}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000604module. OpenSSL is an implementation of the Secure Socket Layer,
605which encrypts the data being sent over a socket. When compiling
606Python, you can edit \file{Modules/Setup} to include SSL support,
607which adds an additional function to the \module{socket} module:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000608\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000609which takes a socket object and returns an SSL socket. The
610\module{httplib} and \module{urllib} modules were also changed to
611support ``https://'' URLs, though no one has implemented FTP or SMTP
612over SSL.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000613
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000614The \module{httplib} module has been rewritten by Greg Stein to
615support HTTP/1.1. Backward compatibility with the 1.5 version of
616\module{httplib} is provided, though using HTTP/1.1 features such as
617pipelining will require rewriting code to use a different set of
618interfaces.
619
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000620The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
6218.3, and support for the older 7.x versions has been dropped. The
622Tkinter module also supports displaying Unicode strings in Tk
623widgets.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000624
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000625The \module{curses} module has been greatly extended, starting from
626Oliver Andrich's enhanced version, to provide many additional
627functions from ncurses and SYSV curses, such as colour, alternative
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000628character set support, pads, and mouse support. This means the module
629is no longer compatible with operating systems that only have BSD
630curses, but there don't seem to be any currently maintained OSes that
631fall into this category.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000632
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000633As mentioned in the earlier discussion of 2.0's Unicode support, the
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000634underlying implementation of the regular expressions provided by the
635\module{re} module has been changed. SRE, a new regular expression
636engine written by Fredrik Lundh and partially funded by Hewlett
637Packard, supports matching against both 8-bit strings and Unicode
638strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000639
640% ======================================================================
641\section{New modules}
642
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000643A number of new modules were added. We'll simply list them with brief
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000644descriptions; consult the 2.0 documentation for the details of a
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000645particular module.
646
647\begin{itemize}
648
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000649\item{\module{atexit}}:
650For registering functions to be called before the Python interpreter exits.
651Code that currently sets
652\code{sys.exitfunc} directly should be changed to
653use the \module{atexit} module instead, importing \module{atexit}
654and calling \function{atexit.register()} with
655the function to be called on exit.
656(Contributed by Skip Montanaro.)
657
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000658\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
659
660\item{\module{filecmp}:} Supersedes the old \module{cmp} and
661\module{dircmp} modules, which have now become deprecated.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000662(Contributed by Gordon MacMillan and Moshe Zadka.)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000663
664\item{\module{linuxaudio}:} Support for the \file{/dev/audio} device on Linux,
665a twin to the existing \module{sunaudiodev} module.
666(Contributed by Peter Bosch.)
667
668\item{\module{mmap}:} An interface to memory-mapped files on both
669Windows and Unix. A file's contents can be mapped directly into
670memory, at which point it behaves like a mutable string, so its
671contents can be read and modified. They can even be passed to
672functions that expect ordinary strings, such as the \module{re}
673module. (Contributed by Sam Rushing, with some extensions by
674A.M. Kuchling.)
675
676\item{\module{PyExpat}:} An interface to the Expat XML parser.
677(Contributed by Paul Prescod.)
678
679\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
680used for writing Web spiders that politely avoid certain areas of a
681Web site. The parser accepts the contents of a \file{robots.txt} file
682builds a set of rules from it, and can then answer questions about
683the fetchability of a given URL. (Contributed by Skip Montanaro.)
684
685\item{\module{tabnanny}:} A module/script to
686checks Python source code for ambiguous indentation.
687(Contributed by Tim Peters.)
688
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000689\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
690
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000691\item{\module{winreg} and \module{_winreg}:} An interface to the
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000692Windows registry. \module{winreg} has been part of PythonWin since
6931995, but now has been added to the core distribution, and enhanced to
694support Unicode. \module{_winreg} is a low-level wrapper of the
695Windows registry functions, contributed by Bill Tutt and Mark Hammond,
696while \module{winreg} is a higher-level, more object-oriented API on top of
697\module{_winreg}, designed by Thomas Heller and implemented by Paul Prescod.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000698
699\item{\module{zipfile}:} A module for reading and writing ZIP-format
700archives. These are archives produced by \program{PKZIP} on
701DOS/Windows or \program{zip} on Unix, not to be confused with
702\program{gzip}-format files (which are supported by the \module{gzip}
703module)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000704(Contributed by James C. Ahlstrom.)
705
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000706\item{\module{imputil}:} A module that provides a simpler way for
707writing customised import hooks, in comparison to the existing
708\module{ihooks} module. (Implemented by Greg Stein, with much
709discussion on python-dev along the way.)
710
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000711\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000712
713% ======================================================================
714\section{IDLE Improvements}
715
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000716IDLE is the official Python cross-platform IDE, written using Tkinter.
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000717Python 2.0 includes IDLE 0.6, which adds a number of new features and
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000718improvements. A partial list:
719
720\begin{itemize}
721\item UI improvements and optimizations,
722especially in the area of syntax highlighting and auto-indentation.
723
724\item The class browser now shows more information, such as the top
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000725level functions in a module.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000726
727\item Tab width is now a user settable option. When opening an existing Python
728file, IDLE automatically detects the indentation conventions, and adapts.
729
730\item There is now support for calling browsers on various platforms,
731used to open the Python documentation in a browser.
732
733\item IDLE now has a command line, which is largely similar to
734the vanilla Python interpreter.
735
736\item Call tips were added in many places.
737
738\item IDLE can now be installed as a package.
739
740\item In the editor window, there is now a line/column bar at the bottom.
741
742\item Three new keystroke commands: Check module (Alt-F5), Import
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000743module (F5) and Run script (Ctrl-F5).
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000744
745\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000746
747% ======================================================================
748\section{Deleted and Deprecated Modules}
749
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000750A few modules have been dropped because they're obsolete, or because
751there are now better ways to do the same thing. The \module{stdwin}
752module is gone; it was for a platform-independent windowing toolkit
753that's no longer developed.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000754
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000755A number of modules have been moved to the
756\file{lib-old} subdirectory:
757\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
758\module{find}, \module{grep}, \module{packmail},
759\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
760If you have code which relies on a module that's been moved to
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000761\file{lib-old}, you can simply add that directory to \code{sys.path}
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000762to get them back, but you're encouraged to update any code that uses
763these modules.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000764
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000765\section{Acknowledgements}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000766
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000767The author would like to thank the following people for offering
768suggestions on drafts of this article: Fredrik Lundh, Skip
769Montanaro, Vladimir Marangozov, Guido van Rossum, Neil Schemenauer.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000770
771\end{document}
772