blob: 7043b58a74c1fe61c2ace0b7a155c853359ddd98 [file] [log] [blame]
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001\documentclass{howto}
2
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +00003% $Id$
4
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005\title{What's New in Python 2.0}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +00006\release{0.05}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00007\author{A.M. Kuchling and Moshe Zadka}
8\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00009\begin{document}
10\maketitle\tableofcontents
11
12\section{Introduction}
13
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000014{\large This is a draft document; please report inaccuracies and
15omissions to the authors. This document should not be treated as
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000016definitive; features described here might be removed or changed during
17the beta cycle before the final release of Python 2.0.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +000018}
19
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000020A new release of Python, version 2.0, will be released some time this
Andrew M. Kuchling70ba3822000-07-01 00:13:30 +000021summer. Beta versions are already available from
Andrew M. Kuchling6d4addd2000-09-25 14:40:15 +000022\url{http://www.pythonlabs.com/products/python2.0/}. This article
Andrew M. Kuchling70ba3822000-07-01 00:13:30 +000023covers the exciting new features in 2.0, highlights some other useful
24changes, and points out a few incompatible changes that may require
25rewriting code.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000026
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000027Python's development never completely stops between releases, and a
28steady flow of bug fixes and improvements are always being submitted.
29A host of minor fixes, a few optimizations, additional docstrings, and
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000030better error messages went into 2.0; to list them all would be
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000031impossible, but they're certainly significant. Consult the
32publicly-available CVS logs if you want to see the full list.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000033
34% ======================================================================
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +000035\section{What About Python 1.6?}
36
37Python 1.6 can be thought of as the Contractual Obligations Python
38release. After the core development team left CNRI in May 2000, CNRI
39requested that a 1.6 release be created, containing all the work on
40Python that had been performed at CNRI. Python 1.6 therefore
41represents the state of the CVS tree as of May 2000, with the most
42significant new feature being Unicode support. Development continued
43after May, of course, so the 1.6 tree received a few fixes to ensure
44that it's forward-compatible with Python 2.0. 1.6 is therefore part
45of Python's evolution, and not a side branch.
46
47So, should you take much interest in Python 1.6? Probably not. The
481.6final and 2.0beta1 releases were made on the same day (September 5,
492000), the plan being to finalize Python 2.0 within a month or so. If
50you have applications to maintain, there seems little point in
51breaking things by moving to 1.6, fixing them, and then having another
52round of breakage within a month by moving to 2.0; you're better off
53just going straight to 2.0. Most of the really interesting features
54described in this document are only in 2.0, because a lot of work was
55done between May and September.
56
57% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000058\section{Unicode}
59
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000060The largest new feature in Python 2.0 is a new fundamental data type:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000061Unicode strings. Unicode uses 16-bit numbers to represent characters
62instead of the 8-bit number used by ASCII, meaning that 65,536
63distinct characters can be supported.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000064
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000065The final interface for Unicode support was arrived at through
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000066countless often-stormy discussions on the python-dev mailing list, and
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000067mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
68type implementation by Fredrik Lundh. A detailed explanation of the
69interface is in the file \file{Misc/unicode.txt} in the Python source
70distribution; it's also available on the Web at
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000071\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000072This article will simply cover the most significant points from the
73full interface.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000074
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000075In Python source code, Unicode strings are written as
76\code{u"string"}. Arbitrary Unicode characters can be written using a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000077new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000784-digit hexadecimal number from 0000 to FFFF. The existing
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000079\code{\e x\var{HHHH}} escape sequence can also be used, and octal
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000080escapes can be used for characters up to U+01FF, which is represented
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000081by \code{\e 777}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000082
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000083Unicode strings, just like regular strings, are an immutable sequence
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000084type. They can be indexed and sliced, but not modified in place.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +000085Unicode strings have an \method{encode( \optional{encoding} )} method
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +000086that returns an 8-bit string in the desired encoding. Encodings are
87named by strings, such as \code{'ascii'}, \code{'utf-8'},
88\code{'iso-8859-1'}, or whatever. A codec API is defined for
89implementing and registering new encodings that are then available
90throughout a Python program. If an encoding isn't specified, the
91default encoding is usually 7-bit ASCII, though it can be changed for
92your Python installation by calling the
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +000093\function{sys.setdefaultencoding(\var{encoding})} function in a
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000094customised version of \file{site.py}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000095
96Combining 8-bit and Unicode strings always coerces to Unicode, using
97the default ASCII encoding; the result of \code{'a' + u'bc'} is
Andrew M. Kuchling7f6270d2000-06-09 02:48:18 +000098\code{u'abc'}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000099
100New built-in functions have been added, and existing built-ins
101modified to support Unicode:
102
103\begin{itemize}
104\item \code{unichr(\var{ch})} returns a Unicode string 1 character
105long, containing the character \var{ch}.
106
107\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
108
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000109\item \code{unicode(\var{string} \optional{, \var{encoding}}
110\optional{, \var{errors}} ) } creates a Unicode string from an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000111string. \code{encoding} is a string naming the encoding to use.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000112The \code{errors} parameter specifies the treatment of characters that
113are invalid for the current encoding; passing \code{'strict'} as the
114value causes an exception to be raised on any encoding error, while
115\code{'ignore'} causes errors to be silently ignored and
116\code{'replace'} uses U+FFFD, the official replacement character, in
117case of any problems.
118
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000119\item The \keyword{exec} statement, and various built-ins such as
120\code{eval()}, \code{getattr()}, and \code{setattr()} will also
121accept Unicode strings as well as regular strings. (It's possible
122that the process of fixing this missed some built-ins; if you find a
123built-in function that accepts strings but doesn't accept Unicode
124strings at all, please report it as a bug.)
125
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000126\end{itemize}
127
128A new module, \module{unicodedata}, provides an interface to Unicode
129character properties. For example, \code{unicodedata.category(u'A')}
130returns the 2-character string 'Lu', the 'L' denoting it's a letter,
131and 'u' meaning that it's uppercase.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000132\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000133an Arabic number.
134
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000135The \module{codecs} module contains functions to look up existing encodings
136and register new ones. Unless you want to implement a
137new encoding, you'll most often use the
138\function{codecs.lookup(\var{encoding})} function, which returns a
1394-element tuple: \code{(\var{encode_func},
140\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000141
142\begin{itemize}
143\item \var{encode_func} is a function that takes a Unicode string, and
144returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
145is an 8-bit string containing a portion (perhaps all) of the Unicode
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000146string converted into the given encoding, and \var{length} tells you
147how much of the Unicode string was converted.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000148
Andrew M. Kuchling118ee962000-09-27 01:01:18 +0000149\item \var{decode_func} is the opposite of \var{encode_func}, taking
150an 8-bit string and returning a 2-tuple \code{(\var{ustring},
151\var{length})}, consisting of the resulting Unicode string
152\var{ustring} and the integer \var{length} telling how much of the
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +00001538-bit string was consumed.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000154
155\item \var{stream_reader} is a class that supports decoding input from
156a stream. \var{stream_reader(\var{file_obj})} returns an object that
157supports the \method{read()}, \method{readline()}, and
158\method{readlines()} methods. These methods will all translate from
159the given encoding and return Unicode strings.
160
161\item \var{stream_writer}, similarly, is a class that supports
162encoding output to a stream. \var{stream_writer(\var{file_obj})}
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000163returns an object that supports the \method{write()} and
164\method{writelines()} methods. These methods expect Unicode strings,
165translating them to the given encoding on output.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000166\end{itemize}
167
168For example, the following code writes a Unicode string into a file,
169encoding it as UTF-8:
170
171\begin{verbatim}
172import codecs
173
174unistr = u'\u0660\u2000ab ...'
175
176(UTF8_encode, UTF8_decode,
177 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
178
179output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
180output.write( unistr )
181output.close()
182\end{verbatim}
183
184The following code would then read UTF-8 input from the file:
185
186\begin{verbatim}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000187input = UTF8_streamreader( open( '/tmp/output', 'rb') )
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000188print repr(input.read())
189input.close()
190\end{verbatim}
191
192Unicode-aware regular expressions are available through the
193\module{re} module, which has a new underlying implementation called
194SRE written by Fredrik Lundh of Secret Labs AB.
195
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000196A \code{-U} command line option was added which causes the Python
197compiler to interpret all string literals as Unicode string literals.
198This is intended to be used in testing and future-proofing your Python
199code, since some future version of Python may drop support for 8-bit
200strings and provide only Unicode strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000201
202% ======================================================================
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000203\section{List Comprehensions}
204
205Lists are a workhorse data type in Python, and many programs
206manipulate a list at some point. Two common operations on lists are
207to loop over them, and either pick out the elements that meet a
208certain criterion, or apply some function to each element. For
209example, given a list of strings, you might want to pull out all the
210strings containing a given substring, or strip off trailing whitespace
211from each line.
212
213The existing \function{map()} and \function{filter()} functions can be
214used for this purpose, but they require a function as one of their
215arguments. This is fine if there's an existing built-in function that
216can be passed directly, but if there isn't, you have to create a
217little function to do the required work, and Python's scoping rules
218make the result ugly if the little function needs additional
219information. Take the first example in the previous paragraph,
220finding all the strings in the list containing a given substring. You
221could write the following to do it:
222
223\begin{verbatim}
224# Given the list L, make a list of all strings
225# containing the substring S.
226sublist = filter( lambda s, substring=S:
227 string.find(s, substring) != -1,
228 L)
229\end{verbatim}
230
231Because of Python's scoping rules, a default argument is used so that
232the anonymous function created by the \keyword{lambda} statement knows
233what substring is being searched for. List comprehensions make this
234cleaner:
235
236\begin{verbatim}
237sublist = [ s for s in L if string.find(s, S) != -1 ]
238\end{verbatim}
239
240List comprehensions have the form:
241
242\begin{verbatim}
243[ expression for expr in sequence1
244 for expr2 in sequence2 ...
245 for exprN in sequenceN
246 if condition
247\end{verbatim}
248
249The \keyword{for}...\keyword{in} clauses contain the sequences to be
250iterated over. The sequences do not have to be the same length,
251because they are \emph{not} iterated over in parallel, but
252from left to right; this is explained more clearly in the following
253paragraphs. The elements of the generated list will be the successive
254values of \var{expression}. The final \keyword{if} clause is
255optional; if present, \var{expression} is only evaluated and added to
256the result if \var{condition} is true.
257
258To make the semantics very clear, a list comprehension is equivalent
259to the following Python code:
260
261\begin{verbatim}
262for expr1 in sequence1:
263 for expr2 in sequence2:
264 ...
265 for exprN in sequenceN:
266 if (condition):
267 # Append the value of
268 # the expression to the
269 # resulting list.
270\end{verbatim}
271
272This means that when there are \keyword{for}...\keyword{in} clauses,
273the resulting list will be equal to the product of the lengths of all
274the sequences. If you have two lists of length 3, the output list is
2759 elements long:
276
277\begin{verbatim}
278seq1 = 'abc'
279seq2 = (1,2,3)
280>>> [ (x,y) for x in seq1 for y in seq2]
281[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1),
282('c', 2), ('c', 3)]
283\end{verbatim}
284
285To avoid introducing an ambiguity into Python's grammar, if
286\var{expression} is creating a tuple, it must be surrounded with
287parentheses. The first list comprehension below is a syntax error,
288while the second one is correct:
289
290\begin{verbatim}
291# Syntax error
292[ x,y for x in seq1 for y in seq2]
293# Correct
294[ (x,y) for x in seq1 for y in seq2]
295\end{verbatim}
296
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000297The idea of list comprehensions originally comes from the functional
298programming language Haskell (\url{http://www.haskell.org}). Greg
299Ewing argued most effectively for adding them to Python and wrote the
300initial list comprehension patch, which was then discussed for a
301seemingly endless time on the python-dev mailing list and kept
302up-to-date by Skip Montanaro.
303
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000304% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000305\section{Augmented Assignment}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000306
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000307Augmented assignment operators, another long-requested feature, have
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000308been added to Python 2.0. Augmented assignment operators include
309\code{+=}, \code{-=}, \code{*=}, and so forth. For example, the
310statement \code{a += 2} increments the value of the variable
311\code{a} by 2, equivalent to the slightly lengthier \code{a = a + 2}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000312
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000313The full list of supported assignment operators is \code{+=},
314\code{-=}, \code{*=}, \code{/=}, \code{\%=}, \code{**=}, \code{\&=},
Andrew M. Kuchling3cdb5762000-08-30 12:55:42 +0000315\code{|=}, \verb|^=|, \code{>>=}, and \code{<<=}. Python classes can
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000316override the augmented assignment operators by defining methods named
317\method{__iadd__}, \method{__isub__}, etc. For example, the following
318\class{Number} class stores a number and supports using += to create a
319new instance with an incremented value.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000320
321\begin{verbatim}
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000322class Number:
323 def __init__(self, value):
324 self.value = value
325 def __iadd__(self, increment):
326 return Number( self.value + increment)
327
328n = Number(5)
329n += 3
330print n.value
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000331\end{verbatim}
332
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000333The \method{__iadd__} special method is called with the value of the
334increment, and should return a new instance with an appropriately
335modified value; this return value is bound as the new value of the
336variable on the left-hand side.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000337
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000338Augmented assignment operators were first introduced in the C
339programming language, and most C-derived languages, such as
340\program{awk}, C++, Java, Perl, and PHP also support them. The augmented
341assignment patch was implemented by Thomas Wouters.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000342
343% ======================================================================
344\section{String Methods}
345
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000346Until now string-manipulation functionality was in the \module{string}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000347module, which was usually a front-end for the \module{strop}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000348module written in C. The addition of Unicode posed a difficulty for
349the \module{strop} module, because the functions would all need to be
350rewritten in order to accept either 8-bit or Unicode strings. For
351functions such as \function{string.replace()}, which takes 3 string
352arguments, that means eight possible permutations, and correspondingly
353complicated code.
354
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000355Instead, Python 2.0 pushes the problem onto the string type, making
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000356string manipulation functionality available through methods on both
3578-bit strings and Unicode strings.
358
359\begin{verbatim}
360>>> 'andrew'.capitalize()
361'Andrew'
362>>> 'hostname'.replace('os', 'linux')
363'hlinuxtname'
364>>> 'moshe'.find('sh')
3652
366\end{verbatim}
367
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000368One thing that hasn't changed, a noteworthy April Fools' joke
369notwithstanding, is that Python strings are immutable. Thus, the
370string methods return new strings, and do not modify the string on
371which they operate.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000372
373The old \module{string} module is still around for backwards
374compatibility, but it mostly acts as a front-end to the new string
375methods.
376
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000377Two methods which have no parallel in pre-2.0 versions, although they
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000378did exist in JPython for quite some time, are \method{startswith()}
379and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
380== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
381
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +0000382One other method which deserves special mention is \method{join}. The
383\method{join} method of a string receives one parameter, a sequence of
384strings, and is equivalent to the \function{string.join} function from
385the old \module{string} module, with the arguments reversed. In other
386words, \code{s.join(seq)} is equivalent to the old
387\code{string.join(seq, s)}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000388
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000389% ======================================================================
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000390\section{Optional Collection of Cycles}
391
392The C implementation of Python uses reference counting to implement
393garbage collection. Every Python object maintains a count of the
394number of references pointing to itself, and adjusts the count as
395references are created or destroyed. Once the reference count reaches
396zero, the object is no longer accessible, since you need to have a
397reference to an object to access it, and if the count is zero, no
398references exist any longer.
399
400Reference counting has some pleasant properties: it's easy to
401understand and implement, and the resulting implementation is
402portable, fairly fast, and reacts well with other libraries that
403implement their own memory handling schemes. The major problem with
404reference counting is that it sometimes doesn't realise that objects
405are no longer accessible, resulting in a memory leak. This happens
406when there are cycles of references.
407
408Consider the simplest possible cycle,
409a class instance which has a reference to itself:
410
411\begin{verbatim}
412instance = SomeClass()
413instance.myself = instance
414\end{verbatim}
415
416After the above two lines of code have been executed, the reference
417count of \code{instance} is 2; one reference is from the variable
418named \samp{'instance'}, and the other is from the \samp{myself}
419attribute of the instance.
420
421If the next line of code is \code{del instance}, what happens? The
422reference count of \code{instance} is decreased by 1, so it has a
423reference count of 1; the reference in the \samp{myself} attribute
424still exists. Yet the instance is no longer accessible through Python
425code, and it could be deleted. Several objects can participate in a
426cycle if they have references to each other, causing all of the
427objects to be leaked.
428
429An experimental step has been made toward fixing this problem. When
430compiling Python, the \verb|--with-cycle-gc| option can be specified.
431This causes a cycle detection algorithm to be periodically executed,
432which looks for inaccessible cycles and deletes the objects involved.
433A new \module{gc} module provides functions to perform a garbage
434collection, obtain debugging statistics, and tuning the collector's parameters.
435
436Why isn't cycle detection enabled by default? Running the cycle detection
437algorithm takes some time, and some tuning will be required to
438minimize the overhead cost. It's not yet obvious how much performance
439is lost, because benchmarking this is tricky and depends crucially
440on how often the program creates and destroys objects.
441
442Several people tackled this problem and contributed to a solution. An
443early implementation of the cycle detection approach was written by
444Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
445during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
446wrote two different implementations, which were later integrated by
447Neil. Lots of other people offered suggestions along the way; the
448March 2000 archives of the python-dev mailing list contain most of the
449relevant discussion, especially in the threads titled ``Reference
450cycle collection for Python'' and ``Finalization again''.
451
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000452% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000453\section{Other Core Changes}
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000454
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000455Various minor changes have been made to Python's syntax and built-in
456functions. None of the changes are very far-reaching, but they're
457handy conveniences.
458
459\subsection{Minor Language Changes}
460
461A new syntax makes it more convenient to call a given function
462with a tuple of arguments and/or a dictionary of keyword arguments.
463In Python 1.5 and earlier, you'd use the \function{apply()}
464built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
465function \function{f()} with the argument tuple \var{args} and the
466keyword arguments in the dictionary \var{kw}. \function{apply()}
467is the same in 2.0, but thanks to a patch from
468Greg Ewing, \code{f(*\var{args}, **\var{kw})} as a shorter
469and clearer way to achieve the same effect. This syntax is
470symmetrical with the syntax for defining functions:
471
472\begin{verbatim}
473def f(*args, **kw):
474 # args is a tuple of positional args,
475 # kw is a dictionary of keyword args
476 ...
477\end{verbatim}
478
479The \keyword{print} statement can now have its output directed to a
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000480file-like object by following the \keyword{print} with
481\verb|>> file|, similar to the redirection operator in Unix shells.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000482Previously you'd either have to use the \method{write()} method of the
483file-like object, which lacks the convenience and simplicity of
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000484\keyword{print}, or you could assign a new value to
485\code{sys.stdout} and then restore the old value. For sending output to standard error,
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000486it's much easier to write this:
487
488\begin{verbatim}
489print >> sys.stderr, "Warning: action field not supplied"
490\end{verbatim}
491
492Modules can now be renamed on importing them, using the syntax
493\code{import \var{module} as \var{name}} or \code{from \var{module}
494import \var{name} as \var{othername}}. The patch was submitted by
495Thomas Wouters.
496
497A new format style is available when using the \code{\%} operator;
498'\%r' will insert the \function{repr()} of its argument. This was
499also added from symmetry considerations, this time for symmetry with
500the existing '\%s' format style, which inserts the \function{str()} of
501its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
502string containing \verb|'abc' abc|.
503
504Previously there was no way to implement a class that overrode
505Python's built-in \keyword{in} operator and implemented a custom
506version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
507present in the sequence \var{seq}; Python computes this by simply
508trying every index of the sequence until either \var{obj} is found or
509an \exception{IndexError} is encountered. Moshe Zadka contributed a
510patch which adds a \method{__contains__} magic method for providing a
511custom implementation for \keyword{in}. Additionally, new built-in
512objects written in C can define what \keyword{in} means for them via a
513new slot in the sequence protocol.
514
515Earlier versions of Python used a recursive algorithm for deleting
516objects. Deeply nested data structures could cause the interpreter to
517fill up the C stack and crash; Christian Tismer rewrote the deletion
518logic to fix this problem. On a related note, comparing recursive
519objects recursed infinitely and crashed; Jeremy Hylton rewrote the
520code to no longer crash, producing a useful result instead. For
521example, after this code:
522
523\begin{verbatim}
524a = []
525b = []
526a.append(a)
527b.append(b)
528\end{verbatim}
529
530The comparison \code{a==b} returns true, because the two recursive
531data structures are isomorphic. \footnote{See the thread ``trashcan
532and PR\#7'' in the April 2000 archives of the python-dev mailing list
533for the discussion leading up to this implementation, and some useful
534relevant links.
535%http://www.python.org/pipermail/python-dev/2000-April/004834.html
536}
537
538Work has been done on porting Python to 64-bit Windows on the Itanium
539processor, mostly by Trent Mick of ActiveState. (Confusingly,
540\code{sys.platform} is still \code{'win32'} on Win64 because it seems
541that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.)
542PythonWin also supports Windows CE; see the Python CE page at
543\url{http://starship.python.net/crew/mhammond/ce/} for more
544information.
545
546An attempt has been made to alleviate one of Python's warts, the
547often-confusing \exception{NameError} exception when code refers to a
548local variable before the variable has been assigned a value. For
549example, the following code raises an exception on the \keyword{print}
550statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
551exception is raised, while 2.0 raises a new
552\exception{UnboundLocalError} exception.
553\exception{UnboundLocalError} is a subclass of \exception{NameError},
554so any existing code that expects \exception{NameError} to be raised
555should still work.
556
557\begin{verbatim}
558def f():
559 print "i=",i
560 i = i + 1
561f()
562\end{verbatim}
563
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000564Two new exceptions, \exception{TabError} and
565\exception{IndentationError}, have been introduced. They're both
566subclasses of \exception{SyntaxError}, and are raised when Python code
567is found to be improperly indented.
568
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000569\subsection{Changes to Built-in Functions}
570
571A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been
572added. \function{zip()} returns a list of tuples where each tuple
573contains the i-th element from each of the argument sequences. The
574difference between \function{zip()} and \code{map(None, \var{seq1},
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000575\var{seq2})} is that \function{map()} pads the sequences with
576\code{None} if the sequences aren't all of the same length, while
577\function{zip()} truncates the returned list to the length of the
578shortest argument sequence.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000579
580The \function{int()} and \function{long()} functions now accept an
581optional ``base'' parameter when the first argument is a string.
582\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
583291. \code{int(123, 16)} raises a \exception{TypeError} exception
584with the message ``can't convert non-string with explicit base''.
585
586A new variable holding more detailed version information has been
587added to the \module{sys} module. \code{sys.version_info} is a tuple
588\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
589\var{serial})} For example, in a hypothetical 2.0.1beta1,
590\code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
591\var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000592\code{"final"} for a final release.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000593
594Dictionaries have an odd new method, \method{setdefault(\var{key},
595\var{default})}, which behaves similarly to the existing
596\method{get()} method. However, if the key is missing,
597\method{setdefault()} both returns the value of \var{default} as
598\method{get()} would do, and also inserts it into the dictionary as
599the value for \var{key}. Thus, the following lines of code:
600
601\begin{verbatim}
602if dict.has_key( key ): return dict[key]
603else:
604 dict[key] = []
605 return dict[key]
606\end{verbatim}
607
608can be reduced to a single \code{return dict.setdefault(key, [])} statement.
609
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000610The interpreter sets a maximum recursion depth in order to catch
611runaway recursion before filling the C stack and causing a core dump
612or GPF.. Previously this limit was fixed when you compiled Python,
613but in 2.0 the maximum recursion depth can be read and modified using
614\function{sys.getrecursionlimit} and \function{sys.setrecursionlimit}.
615The default value is 1000, and a rough maximum value for a given
616platform can be found by running a new script,
617\file{Misc/find_recursionlimit.py}.
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000618
619% ======================================================================
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000620\section{Porting to 2.0}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000621
622New Python releases try hard to be compatible with previous releases,
623and the record has been pretty good. However, some changes are
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000624considered useful enough, usually because they fix initial design decisions that
625turned out to be actively mistaken, that breaking backward compatibility
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000626can't always be avoided. This section lists the changes in Python 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000627that may cause old Python code to break.
628
629The change which will probably break the most code is tightening up
630the arguments accepted by some methods. Some methods would take
631multiple arguments and treat them as a tuple, particularly various
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000632list methods such as \method{.append()} and \method{.insert()}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000633In earlier versions of Python, if \code{L} is a list, \code{L.append(
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00006341,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000635causes a \exception{TypeError} exception to be raised, with the
636message: 'append requires exactly 1 argument; 2 given'. The fix is to
637simply add an extra set of parentheses to pass both values as a tuple:
638\code{L.append( (1,2) )}.
639
640The earlier versions of these methods were more forgiving because they
641used an old function in Python's C interface to parse their arguments;
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00006422.0 modernizes them to use \function{PyArg_ParseTuple}, the current
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000643argument parsing function, which provides more helpful error messages
644and treats multi-argument calls as errors. If you absolutely must use
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00006452.0 but can't fix your code, you can edit \file{Objects/listobject.c}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000646and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
647preserve the old behaviour; this isn't recommended.
648
649Some of the functions in the \module{socket} module are still
650forgiving in this way. For example, \function{socket.connect(
651('hostname', 25) )} is the correct form, passing a tuple representing
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000652an IP address, but \function{socket.connect( 'hostname', 25 )} also
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000653works. \function{socket.connect_ex()} and \function{socket.bind()} are
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000654similarly easy-going. 2.0alpha1 tightened these functions up, but
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000655because the documentation actually used the erroneous multiple
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000656argument form, many people wrote code which would break with the
657stricter checking. GvR backed out the changes in the face of public
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000658reaction, so for the \module{socket} module, the documentation was
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000659fixed and the multiple argument form is simply marked as deprecated;
660it \emph{will} be tightened up again in a future Python version.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000661
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000662The \code{\e x} escape in string literals now takes exactly 2 hex
663digits. Previously it would consume all the hex digits following the
664'x' and take the lowest 8 bits of the result, so \code{\e x123456} was
665equivalent to \code{\e x56}.
666
667The \exception{AttributeError} exception has a more friendly error message,
668whose text will be something like \code{'Spam' instance has no attribute 'eggs'}.
669Previously the error message was just the missing attribute name \code{eggs}, and
670code written to take advantage of this fact will break in 2.0.
671
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000672Some work has been done to make integers and long integers a bit more
673interchangeable. In 1.5.2, large-file support was added for Solaris,
674to allow reading files larger than 2Gb; this made the \method{tell()}
675method of file objects return a long integer instead of a regular
676integer. Some code would subtract two file offsets and attempt to use
677the result to multiply a sequence or slice a string, but this raised a
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000678\exception{TypeError}. In 2.0, long integers can be used to multiply
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000679or slice a sequence, and it'll behave as you'd intuitively expect it
680to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
681(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000682various contexts where previously only integers were accepted, such
683as in the \method{seek()} method of file objects, and in the formats
684supported by the \verb|%| operator (\verb|%d|, \verb|%i|, \verb|%x|,
685etc.). For example, \code{"\%d" \% 2L**64} will produce the string
686\samp{18446744073709551616}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000687
688The subtlest long integer change of all is that the \function{str()}
689of a long integer no longer has a trailing 'L' character, though
690\function{repr()} still includes it. The 'L' annoyed many people who
691wanted to print long integers that looked just like regular integers,
692since they had to go out of their way to chop off the character. This
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000693is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose
694the final digit.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000695
696Taking the \function{repr()} of a float now uses a different
697formatting precision than \function{str()}. \function{repr()} uses
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000698\code{\%.17g} format string for C's \function{sprintf()}, while
699\function{str()} uses \code{\%.12g} as before. The effect is that
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000700\function{repr()} may occasionally show more decimal places than
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000701\function{str()}, for certain numbers.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000702For example, the number 8.1 can't be represented exactly in binary, so
703\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
704\code{'8.1'}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000705
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000706The \code{-X} command-line option, which turned all standard
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000707exceptions into strings instead of classes, has been removed; the
708standard exceptions will now always be classes. The
709\module{exceptions} module containing the standard exceptions was
710translated from Python to a built-in C module, written by Barry Warsaw
711and Fredrik Lundh.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000712
Andrew M. Kuchling791b3662000-07-01 15:04:18 +0000713% Commented out for now -- I don't think anyone will care.
714%The pattern and match objects provided by SRE are C types, not Python
715%class instances as in 1.5. This means you can no longer inherit from
716%\class{RegexObject} or \class{MatchObject}, but that shouldn't be much
717%of a problem since no one should have been doing that in the first
718%place.
719
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000720% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000721\section{Extending/Embedding Changes}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000722
723Some of the changes are under the covers, and will only be apparent to
Andrew M. Kuchling8357c4c2000-07-01 00:14:43 +0000724people writing C extension modules or embedding a Python interpreter
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000725in a larger application. If you aren't dealing with Python's C API,
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000726you can safely skip this section.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000727
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000728The version number of the Python C API was incremented, so C
729extensions compiled for 1.5.2 must be recompiled in order to work with
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00007302.0. On Windows, attempting to import a third party extension built
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000731for Python 1.5.x usually results in an immediate crash; there's not
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000732much we can do about this. (Here's Mark Hammond's explanation of the
733reasons for the crash. The 1.5 module is linked against
734\file{Python15.dll}. When \file{Python.exe} , linked against
735\file{Python16.dll}, starts up, it initializes the Python data
736structures in \file{Python16.dll}. When Python then imports the
737module \file{foo.pyd} linked against \file{Python15.dll}, it
738immediately tries to call the functions in that DLL. As Python has
739not been initialized in that DLL, the program immediately crashes.)
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000740
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000741Users of Jim Fulton's ExtensionClass module will be pleased to find
742out that hooks have been added so that ExtensionClasses are now
743supported by \function{isinstance()} and \function{issubclass()}.
744This means you no longer have to remember to write code such as
745\code{if type(obj) == myExtensionClass}, but can use the more natural
746\code{if isinstance(obj, myExtensionClass)}.
747
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000748The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000749support dynamic loading on many different platforms, was cleaned up
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000750and reorganised by Greg Stein. \file{importdl.c} is now quite small,
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000751and platform-specific code has been moved into a bunch of
Andrew M. Kuchlingb9fb1f22000-08-04 12:40:35 +0000752\file{Python/dynload_*.c} files. Another cleanup: there were also a
753number of \file{my*.h} files in the Include/ directory that held
754various portability hacks; they've been merged into a single file,
755\file{Include/pyport.h}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000756
757Vladimir Marangozov's long-awaited malloc restructuring was completed,
758to make it easy to have the Python interpreter use a custom allocator
759instead of C's standard \function{malloc()}. For documentation, read
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000760the comments in \file{Include/pymem.h} and
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000761\file{Include/objimpl.h}. For the lengthy discussions during which
762the interface was hammered out, see the Web archives of the 'patches'
763and 'python-dev' lists at python.org.
764
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000765Recent versions of the GUSI development environment for MacOS support
766POSIX threads. Therefore, Python's POSIX threading support now works
767on the Macintosh. Threading support using the user-space GNU \texttt{pth}
768library was also contributed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000769
770Threading support on Windows was enhanced, too. Windows supports
771thread locks that use kernel objects only in case of contention; in
772the common case when there's no contention, they use simpler functions
773which are an order of magnitude faster. A threaded version of Python
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00007741.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000775changes, the difference is only 10\%. These improvements were
776contributed by Yakov Markovitch.
777
Andrew M. Kuchling08d87c62000-07-09 15:05:15 +0000778Python 2.0's source now uses only ANSI C prototypes, so compiling Python now
779requires an ANSI C compiler, and can no longer be done using a compiler that
780only supports K\&R C.
781
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000782Previously the Python virtual machine used 16-bit numbers in its
783bytecode, limiting the size of source files. In particular, this
784affected the maximum size of literal lists and dictionaries in Python
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000785source; occasionally people who are generating Python code would run
786into this limit. A patch by Charles G. Waldman raises the limit from
787\verb|2^16| to \verb|2^{32}|.
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000788
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000789Three new convenience functions intended for adding constants to a
790module's dictionary at module initialization time were added:
791\function{PyModule_AddObject()}, \function{PyModule_AddIntConstant()},
792and \function{PyModule_AddStringConstant()}. Each of these functions
793takes a module object, a null-terminated C string containing the name
794to be added, and a third argument for the value to be assigned to the
795name. This third argument is, respectively, a Python object, a C
796long, or a C string.
797
798A wrapper API was added for Unix-style signal handlers.
799\function{PyOS_getsig()} gets a signal handler and
800\function{PyOS_setsig()} will set a new handler.
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000801
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000802% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000803\section{Distutils: Making Modules Easy to Install}
804
805Before Python 2.0, installing modules was a tedious affair -- there
806was no way to figure out automatically where Python is installed, or
807what compiler options to use for extension modules. Software authors
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000808had to go through an arduous ritual of editing Makefiles and
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000809configuration files, which only really work on Unix and leave Windows
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000810and MacOS unsupported. Python users faced wildly differing
811installation instructions which varied between different extension
812packages, which made adminstering a Python installation something of a
813chore.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000814
815The SIG for distribution utilities, shepherded by Greg Ward, has
816created the Distutils, a system to make package installation much
817easier. They form the \module{distutils} package, a new part of
818Python's standard library. In the best case, installing a Python
819module from source will require the same steps: first you simply mean
820unpack the tarball or zip archive, and the run ``\code{python setup.py
821install}''. The platform will be automatically detected, the compiler
822will be recognized, C extension modules will be compiled, and the
823distribution installed into the proper directory. Optional
824command-line arguments provide more control over the installation
825process, the distutils package offers many places to override defaults
826-- separating the build from the install, building or installing in
827non-default directories, and more.
828
829In order to use the Distutils, you need to write a \file{setup.py}
830script. For the simple case, when the software contains only .py
831files, a minimal \file{setup.py} can be just a few lines long:
832
833\begin{verbatim}
834from distutils.core import setup
835setup (name = "foo", version = "1.0",
836 py_modules = ["module1", "module2"])
837\end{verbatim}
838
839The \file{setup.py} file isn't much more complicated if the software
840consists of a few packages:
841
842\begin{verbatim}
843from distutils.core import setup
844setup (name = "foo", version = "1.0",
845 packages = ["package", "package.subpackage"])
846\end{verbatim}
847
848A C extension can be the most complicated case; here's an example taken from
849the PyXML package:
850
851
852\begin{verbatim}
853from distutils.core import setup, Extension
854
855expat_extension = Extension('xml.parsers.pyexpat',
856 define_macros = [('XML_NS', None)],
857 include_dirs = [ 'extensions/expat/xmltok',
858 'extensions/expat/xmlparse' ],
859 sources = [ 'extensions/pyexpat.c',
860 'extensions/expat/xmltok/xmltok.c',
861 'extensions/expat/xmltok/xmlrole.c',
862 ]
863 )
864setup (name = "PyXML", version = "0.5.4",
865 ext_modules =[ expat_extension ] )
866
867\end{verbatim}
868
869The Distutils can also take care of creating source and binary
870distributions. The ``sdist'' command, run by ``\code{python setup.py
871sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
872Adding new commands isn't difficult, ``bdist_rpm'' and
873``bdist_wininst'' commands have already been contributed to create an
874RPM distribution and a Windows installer for the software,
875respectively. Commands to create other distribution formats such as
876Debian packages and Solaris \file{.pkg} files are in various stages of
877development.
878
879All this is documented in a new manual, \textit{Distributing Python
880Modules}, that joins the basic set of Python documentation.
881
882% ======================================================================
883%\section{New XML Code}
884
885%XXX write this section...
886
887% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000888\section{Module changes}
889
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000890Lots of improvements and bugfixes were made to Python's extensive
891standard library; some of the affected modules include
892\module{readline}, \module{ConfigParser}, \module{cgi},
893\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
894\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
895and \module{nntplib}. Consult the CVS logs for the exact
896patch-by-patch details.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000897
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000898Brian Gallew contributed OpenSSL support for the \module{socket}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000899module. OpenSSL is an implementation of the Secure Socket Layer,
900which encrypts the data being sent over a socket. When compiling
901Python, you can edit \file{Modules/Setup} to include SSL support,
902which adds an additional function to the \module{socket} module:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000903\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000904which takes a socket object and returns an SSL socket. The
905\module{httplib} and \module{urllib} modules were also changed to
906support ``https://'' URLs, though no one has implemented FTP or SMTP
907over SSL.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000908
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000909The \module{httplib} module has been rewritten by Greg Stein to
910support HTTP/1.1. Backward compatibility with the 1.5 version of
911\module{httplib} is provided, though using HTTP/1.1 features such as
912pipelining will require rewriting code to use a different set of
913interfaces.
914
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000915The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
9168.3, and support for the older 7.x versions has been dropped. The
Andrew M. Kuchling791b3662000-07-01 15:04:18 +0000917Tkinter module now supports displaying Unicode strings in Tk widgets.
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000918Also, Fredrik Lundh contributed an optimization which makes operations
919like \code{create_line} and \code{create_polygon} much faster,
Andrew M. Kuchling791b3662000-07-01 15:04:18 +0000920especially when using lots of coordinates.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000921
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000922The \module{curses} module has been greatly extended, starting from
923Oliver Andrich's enhanced version, to provide many additional
924functions from ncurses and SYSV curses, such as colour, alternative
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000925character set support, pads, and mouse support. This means the module
926is no longer compatible with operating systems that only have BSD
927curses, but there don't seem to be any currently maintained OSes that
928fall into this category.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000929
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000930As mentioned in the earlier discussion of 2.0's Unicode support, the
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000931underlying implementation of the regular expressions provided by the
932\module{re} module has been changed. SRE, a new regular expression
933engine written by Fredrik Lundh and partially funded by Hewlett
934Packard, supports matching against both 8-bit strings and Unicode
935strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000936
937% ======================================================================
938\section{New modules}
939
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000940A number of new modules were added. We'll simply list them with brief
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000941descriptions; consult the 2.0 documentation for the details of a
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000942particular module.
943
944\begin{itemize}
945
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000946\item{\module{atexit}}:
947For registering functions to be called before the Python interpreter exits.
948Code that currently sets
949\code{sys.exitfunc} directly should be changed to
950use the \module{atexit} module instead, importing \module{atexit}
951and calling \function{atexit.register()} with
952the function to be called on exit.
953(Contributed by Skip Montanaro.)
954
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000955\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
956
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +0000957\item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000958\module{dircmp} modules, which have now become deprecated.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000959(Contributed by Gordon MacMillan and Moshe Zadka.)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000960
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000961\item{\module{linuxaudiodev}:} Support for the \file{/dev/audio}
962device on Linux, a twin to the existing \module{sunaudiodev} module.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000963(Contributed by Peter Bosch.)
964
965\item{\module{mmap}:} An interface to memory-mapped files on both
966Windows and Unix. A file's contents can be mapped directly into
967memory, at which point it behaves like a mutable string, so its
968contents can be read and modified. They can even be passed to
969functions that expect ordinary strings, such as the \module{re}
970module. (Contributed by Sam Rushing, with some extensions by
971A.M. Kuchling.)
972
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000973\item{\module{pyexpat}:} An interface to the Expat XML parser.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000974(Contributed by Paul Prescod.)
975
976\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
977used for writing Web spiders that politely avoid certain areas of a
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000978Web site. The parser accepts the contents of a \file{robots.txt} file,
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000979builds a set of rules from it, and can then answer questions about
980the fetchability of a given URL. (Contributed by Skip Montanaro.)
981
982\item{\module{tabnanny}:} A module/script to
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000983check Python source code for ambiguous indentation.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000984(Contributed by Tim Peters.)
985
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000986\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
987
Andrew M. Kuchling08d87c62000-07-09 15:05:15 +0000988\item{\module{webbrowser}:} A module that provides a platform independent
989way to launch a web browser on a specific URL. For each platform, various
990browsers are tried in a specific order. The user can alter which browser
991is launched by setting the \var{BROWSER} environment variable.
992(Originally inspired by Eric S. Raymond's patch to \module{urllib}
993which added similar functionality, but
994the final module comes from code originally
995implemented by Fred Drake as \file{Tools/idle/BrowserControl.py},
996and adapted for the standard library by Fred.)
997
Andrew M. Kuchlingd500e442000-09-06 12:30:25 +0000998\item{\module{_winreg}:} An interface to the
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +0000999Windows registry. \module{_winreg} is an adaptation of functions that
1000have been part of PythonWin since 1995, but has now been added to the core
Andrew M. Kuchlingd500e442000-09-06 12:30:25 +00001001distribution, and enhanced to support Unicode.
1002\module{_winreg} was written by Bill Tutt and Mark Hammond.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001003
1004\item{\module{zipfile}:} A module for reading and writing ZIP-format
1005archives. These are archives produced by \program{PKZIP} on
1006DOS/Windows or \program{zip} on Unix, not to be confused with
1007\program{gzip}-format files (which are supported by the \module{gzip}
1008module)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001009(Contributed by James C. Ahlstrom.)
1010
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +00001011\item{\module{imputil}:} A module that provides a simpler way for
1012writing customised import hooks, in comparison to the existing
1013\module{ihooks} module. (Implemented by Greg Stein, with much
1014discussion on python-dev along the way.)
1015
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001016\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001017
1018% ======================================================================
1019\section{IDLE Improvements}
1020
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001021IDLE is the official Python cross-platform IDE, written using Tkinter.
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001022Python 2.0 includes IDLE 0.6, which adds a number of new features and
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001023improvements. A partial list:
1024
1025\begin{itemize}
1026\item UI improvements and optimizations,
1027especially in the area of syntax highlighting and auto-indentation.
1028
1029\item The class browser now shows more information, such as the top
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001030level functions in a module.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001031
1032\item Tab width is now a user settable option. When opening an existing Python
1033file, IDLE automatically detects the indentation conventions, and adapts.
1034
1035\item There is now support for calling browsers on various platforms,
1036used to open the Python documentation in a browser.
1037
1038\item IDLE now has a command line, which is largely similar to
1039the vanilla Python interpreter.
1040
1041\item Call tips were added in many places.
1042
1043\item IDLE can now be installed as a package.
1044
1045\item In the editor window, there is now a line/column bar at the bottom.
1046
1047\item Three new keystroke commands: Check module (Alt-F5), Import
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001048module (F5) and Run script (Ctrl-F5).
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001049
1050\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001051
1052% ======================================================================
1053\section{Deleted and Deprecated Modules}
1054
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001055A few modules have been dropped because they're obsolete, or because
1056there are now better ways to do the same thing. The \module{stdwin}
1057module is gone; it was for a platform-independent windowing toolkit
1058that's no longer developed.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001059
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +00001060A number of modules have been moved to the
1061\file{lib-old} subdirectory:
1062\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
1063\module{find}, \module{grep}, \module{packmail},
1064\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
1065If you have code which relies on a module that's been moved to
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001066\file{lib-old}, you can simply add that directory to \code{sys.path}
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +00001067to get them back, but you're encouraged to update any code that uses
1068these modules.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001069
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001070\section{Acknowledgements}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001071
Andrew M. Kuchlinga6161ed2000-07-01 00:23:02 +00001072The authors would like to thank the following people for offering
Andrew M. Kuchling118ee962000-09-27 01:01:18 +00001073suggestions on drafts of this article: Mark Hammond, Gregg Hauser,
1074Fredrik Lundh, Detlef Lannert, Skip Montanaro, Vladimir Marangozov,
1075Guido van Rossum, and Neil Schemenauer.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001076
1077\end{document}