blob: 40d7c625f78fa138945cd8a28ee604819c179b51 [file] [log] [blame]
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001\documentclass{howto}
2
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +00003% $Id$
4
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00005\title{What's New in Python 2.0}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +00006\release{0.05}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00007\author{A.M. Kuchling and Moshe Zadka}
8\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00009\begin{document}
10\maketitle\tableofcontents
11
12\section{Introduction}
13
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +000014{\large This is a draft document; please report inaccuracies and
15omissions to the authors. This document should not be treated as
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000016definitive; features described here might be removed or changed during
17the beta cycle before the final release of Python 2.0.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +000018}
19
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000020A new release of Python, version 2.0, will be released some time this
Andrew M. Kuchlingbe870dd2000-09-27 02:36:10 +000021autumn. Beta versions are already available from
Andrew M. Kuchling6d4addd2000-09-25 14:40:15 +000022\url{http://www.pythonlabs.com/products/python2.0/}. This article
Andrew M. Kuchling70ba3822000-07-01 00:13:30 +000023covers the exciting new features in 2.0, highlights some other useful
24changes, and points out a few incompatible changes that may require
25rewriting code.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000026
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000027Python's development never completely stops between releases, and a
28steady flow of bug fixes and improvements are always being submitted.
29A host of minor fixes, a few optimizations, additional docstrings, and
Andrew M. Kuchling730067e2000-06-30 01:44:05 +000030better error messages went into 2.0; to list them all would be
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000031impossible, but they're certainly significant. Consult the
32publicly-available CVS logs if you want to see the full list.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000033
34% ======================================================================
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +000035\section{What About Python 1.6?}
36
37Python 1.6 can be thought of as the Contractual Obligations Python
38release. After the core development team left CNRI in May 2000, CNRI
39requested that a 1.6 release be created, containing all the work on
40Python that had been performed at CNRI. Python 1.6 therefore
41represents the state of the CVS tree as of May 2000, with the most
42significant new feature being Unicode support. Development continued
43after May, of course, so the 1.6 tree received a few fixes to ensure
44that it's forward-compatible with Python 2.0. 1.6 is therefore part
45of Python's evolution, and not a side branch.
46
47So, should you take much interest in Python 1.6? Probably not. The
481.6final and 2.0beta1 releases were made on the same day (September 5,
492000), the plan being to finalize Python 2.0 within a month or so. If
50you have applications to maintain, there seems little point in
51breaking things by moving to 1.6, fixing them, and then having another
52round of breakage within a month by moving to 2.0; you're better off
53just going straight to 2.0. Most of the really interesting features
54described in this document are only in 2.0, because a lot of work was
55done between May and September.
56
57% ======================================================================
Andrew M. Kuchlingbe870dd2000-09-27 02:36:10 +000058\section{New Development Process}
59
60The most important change in Python 2.0 may not be to the code at all,
61but to how Python is developed.
62
63In May of 2000, the Python CVS tree was moved to SourceForge.
64Previously, there were roughly 7 or so people who had write access to
65the CVS tree, and all patches had to be inspected and checked in by
66one of the people on this short list. Obviously, this wasn't very
67scalable. By moving the CVS tree to SourceForge, it became possible
68to grant write access to more people; as of September 2000 there were
6927 people able to check in changes, a fourfold increase. This makes
70possible large-scale changes that wouldn't be attempted if they'd have
71to be filtered through the small group of core developers. For
72example, one day Peter Schneider-Kamp took it into his head to drop
73K\&R C compatibility and convert the C source for Python to ANSI
74C. After getting approval on the python-dev mailing list, he launched
75into a flurry of checkins that lasted about a week, other developers
76joined in to help, and the job was done. If there were only 5 people
77with write access, probably that task would have been viewed as
78``nice, but not worth the time and effort needed'' and it wouldn't
79never have been done.
80
81SourceForge also provides tools for tracking bug and patch
82submissions, and in combination with the public CVS tree, they've
83resulted in a remarkable increase in the speed of development.
84Patches now get submitted, commented on, revised by people other than
85the original submitter, and bounced back and forth between people
86until the patch is deemed worth checking in. This didn't come without
87a cost: developers now have more e-mail to deal with, more mailing
88lists to follow, and special tools had to be written for the new
89environment. For example, SourceForge sends default patch and bug
90notification e-mail messages that are completely unhelpful, so Ka-Ping
91Yee wrote an HTML screen-scraper that sends more useful messages.
92
93The ease of adding code caused a few initial growing pains, such as
94code was checked in before it was ready or without getting clear
95agreement from the developer group. The approval process that has
96emerged is somewhat similar to that used by the Apache group.
97Developers can vote +1, +0, -0, or -1 on a patch; +1 and -1 denote
98acceptance or rejection, while +0 and -0 mean the developer is mostly
99indifferent to the change, though with a slight positive or negative
100slant. The most significant change from the Apache model is that
101Guido van Rossum, who has Benevolent Dictator For Life status, can
102ignore the votes of the other developers and approve or reject a
103change, effectively giving him a +Infinity / -Infinity vote.
104
105Producing an actual patch is the last step in adding a new feature,
106and is usually easy compared to the earlier task of coming up with a
107good design. Discussions of new features can often explode into
108lengthy mailing list threads, making the discussion hard to follow,
109and no one can read every posting to python-dev. Therefore, a
110relatively formal process has been set up to write Python Enhancement
111Proposals (PEPs), modelled on the Internet RFC process. PEPs are
112draft documents that describe a proposed new feature, and are
113continually revised until the community reaches a consensus, either
114accepting or rejecting the proposal. Quoting from the introduction to
115PEP 1, ``PEP Purpose and Guidelines'':
116
117\begin{quotation}
118 PEP stands for Python Enhancement Proposal. A PEP is a design
119 document providing information to the Python community, or
120 describing a new feature for Python. The PEP should provide a
121 concise technical specification of the feature and a rationale for
122 the feature.
123
124 We intend PEPs to be the primary mechanisms for proposing new
125 features, for collecting community input on an issue, and for
126 documenting the design decisions that have gone into Python. The
127 PEP author is responsible for building consensus within the
128 community and documenting dissenting opinions.
129\end{quotation}
130
131Read the rest of PEP 1 for the details of the PEP editorial process,
132style, and format. PEPs are kept in the Python CVS tree on
133SourceForge, though they're not part of the Python 2.0 distribution,
134and are also available in HTML form from
135\url{http://python.sourceforge.net/peps/}. As of September 2000,
136there are 25 PEPS, ranging from PEP 201, ``Lockstep Iteration'', to
137PEP 225, ``Elementwise/Objectwise Operators''.
138
139To report bugs or submit patches for Python 2.0, use the bug tracking
140and patch manager tools available from the SourceForge project page,
141at \url{http://sourceforge.net/projects/python/}.
142
143% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000144\section{Unicode}
145
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000146The largest new feature in Python 2.0 is a new fundamental data type:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000147Unicode strings. Unicode uses 16-bit numbers to represent characters
148instead of the 8-bit number used by ASCII, meaning that 65,536
149distinct characters can be supported.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000150
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000151The final interface for Unicode support was arrived at through
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000152countless often-stormy discussions on the python-dev mailing list, and
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000153mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string
154type implementation by Fredrik Lundh. A detailed explanation of the
155interface is in the file \file{Misc/unicode.txt} in the Python source
156distribution; it's also available on the Web at
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000157\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000158This article will simply cover the most significant points from the
159full interface.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000160
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000161In Python source code, Unicode strings are written as
162\code{u"string"}. Arbitrary Unicode characters can be written using a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000163new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00001644-digit hexadecimal number from 0000 to FFFF. The existing
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000165\code{\e x\var{HHHH}} escape sequence can also be used, and octal
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000166escapes can be used for characters up to U+01FF, which is represented
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000167by \code{\e 777}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000168
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000169Unicode strings, just like regular strings, are an immutable sequence
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000170type. They can be indexed and sliced, but not modified in place.
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000171Unicode strings have an \method{encode( \optional{encoding} )} method
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000172that returns an 8-bit string in the desired encoding. Encodings are
173named by strings, such as \code{'ascii'}, \code{'utf-8'},
174\code{'iso-8859-1'}, or whatever. A codec API is defined for
175implementing and registering new encodings that are then available
176throughout a Python program. If an encoding isn't specified, the
177default encoding is usually 7-bit ASCII, though it can be changed for
178your Python installation by calling the
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000179\function{sys.setdefaultencoding(\var{encoding})} function in a
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000180customised version of \file{site.py}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000181
182Combining 8-bit and Unicode strings always coerces to Unicode, using
183the default ASCII encoding; the result of \code{'a' + u'bc'} is
Andrew M. Kuchling7f6270d2000-06-09 02:48:18 +0000184\code{u'abc'}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000185
186New built-in functions have been added, and existing built-ins
187modified to support Unicode:
188
189\begin{itemize}
190\item \code{unichr(\var{ch})} returns a Unicode string 1 character
191long, containing the character \var{ch}.
192
193\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
194
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000195\item \code{unicode(\var{string} \optional{, \var{encoding}}
196\optional{, \var{errors}} ) } creates a Unicode string from an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000197string. \code{encoding} is a string naming the encoding to use.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000198The \code{errors} parameter specifies the treatment of characters that
199are invalid for the current encoding; passing \code{'strict'} as the
200value causes an exception to be raised on any encoding error, while
201\code{'ignore'} causes errors to be silently ignored and
202\code{'replace'} uses U+FFFD, the official replacement character, in
203case of any problems.
204
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000205\item The \keyword{exec} statement, and various built-ins such as
206\code{eval()}, \code{getattr()}, and \code{setattr()} will also
207accept Unicode strings as well as regular strings. (It's possible
208that the process of fixing this missed some built-ins; if you find a
209built-in function that accepts strings but doesn't accept Unicode
210strings at all, please report it as a bug.)
211
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000212\end{itemize}
213
214A new module, \module{unicodedata}, provides an interface to Unicode
215character properties. For example, \code{unicodedata.category(u'A')}
216returns the 2-character string 'Lu', the 'L' denoting it's a letter,
217and 'u' meaning that it's uppercase.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000218\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000219an Arabic number.
220
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000221The \module{codecs} module contains functions to look up existing encodings
222and register new ones. Unless you want to implement a
223new encoding, you'll most often use the
224\function{codecs.lookup(\var{encoding})} function, which returns a
2254-element tuple: \code{(\var{encode_func},
226\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000227
228\begin{itemize}
229\item \var{encode_func} is a function that takes a Unicode string, and
230returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
231is an 8-bit string containing a portion (perhaps all) of the Unicode
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000232string converted into the given encoding, and \var{length} tells you
233how much of the Unicode string was converted.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000234
Andrew M. Kuchling118ee962000-09-27 01:01:18 +0000235\item \var{decode_func} is the opposite of \var{encode_func}, taking
236an 8-bit string and returning a 2-tuple \code{(\var{ustring},
237\var{length})}, consisting of the resulting Unicode string
238\var{ustring} and the integer \var{length} telling how much of the
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +00002398-bit string was consumed.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000240
241\item \var{stream_reader} is a class that supports decoding input from
242a stream. \var{stream_reader(\var{file_obj})} returns an object that
243supports the \method{read()}, \method{readline()}, and
244\method{readlines()} methods. These methods will all translate from
245the given encoding and return Unicode strings.
246
247\item \var{stream_writer}, similarly, is a class that supports
248encoding output to a stream. \var{stream_writer(\var{file_obj})}
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000249returns an object that supports the \method{write()} and
250\method{writelines()} methods. These methods expect Unicode strings,
251translating them to the given encoding on output.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000252\end{itemize}
253
254For example, the following code writes a Unicode string into a file,
255encoding it as UTF-8:
256
257\begin{verbatim}
258import codecs
259
260unistr = u'\u0660\u2000ab ...'
261
262(UTF8_encode, UTF8_decode,
263 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
264
265output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
266output.write( unistr )
267output.close()
268\end{verbatim}
269
270The following code would then read UTF-8 input from the file:
271
272\begin{verbatim}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000273input = UTF8_streamreader( open( '/tmp/output', 'rb') )
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000274print repr(input.read())
275input.close()
276\end{verbatim}
277
278Unicode-aware regular expressions are available through the
279\module{re} module, which has a new underlying implementation called
280SRE written by Fredrik Lundh of Secret Labs AB.
281
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +0000282A \code{-U} command line option was added which causes the Python
283compiler to interpret all string literals as Unicode string literals.
284This is intended to be used in testing and future-proofing your Python
285code, since some future version of Python may drop support for 8-bit
286strings and provide only Unicode strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000287
288% ======================================================================
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000289\section{List Comprehensions}
290
291Lists are a workhorse data type in Python, and many programs
292manipulate a list at some point. Two common operations on lists are
293to loop over them, and either pick out the elements that meet a
294certain criterion, or apply some function to each element. For
295example, given a list of strings, you might want to pull out all the
296strings containing a given substring, or strip off trailing whitespace
297from each line.
298
299The existing \function{map()} and \function{filter()} functions can be
300used for this purpose, but they require a function as one of their
301arguments. This is fine if there's an existing built-in function that
302can be passed directly, but if there isn't, you have to create a
303little function to do the required work, and Python's scoping rules
304make the result ugly if the little function needs additional
305information. Take the first example in the previous paragraph,
306finding all the strings in the list containing a given substring. You
307could write the following to do it:
308
309\begin{verbatim}
310# Given the list L, make a list of all strings
311# containing the substring S.
312sublist = filter( lambda s, substring=S:
313 string.find(s, substring) != -1,
314 L)
315\end{verbatim}
316
317Because of Python's scoping rules, a default argument is used so that
318the anonymous function created by the \keyword{lambda} statement knows
319what substring is being searched for. List comprehensions make this
320cleaner:
321
322\begin{verbatim}
323sublist = [ s for s in L if string.find(s, S) != -1 ]
324\end{verbatim}
325
326List comprehensions have the form:
327
328\begin{verbatim}
329[ expression for expr in sequence1
330 for expr2 in sequence2 ...
331 for exprN in sequenceN
332 if condition
333\end{verbatim}
334
335The \keyword{for}...\keyword{in} clauses contain the sequences to be
336iterated over. The sequences do not have to be the same length,
337because they are \emph{not} iterated over in parallel, but
338from left to right; this is explained more clearly in the following
339paragraphs. The elements of the generated list will be the successive
340values of \var{expression}. The final \keyword{if} clause is
341optional; if present, \var{expression} is only evaluated and added to
342the result if \var{condition} is true.
343
344To make the semantics very clear, a list comprehension is equivalent
345to the following Python code:
346
347\begin{verbatim}
348for expr1 in sequence1:
349 for expr2 in sequence2:
350 ...
351 for exprN in sequenceN:
352 if (condition):
353 # Append the value of
354 # the expression to the
355 # resulting list.
356\end{verbatim}
357
358This means that when there are \keyword{for}...\keyword{in} clauses,
359the resulting list will be equal to the product of the lengths of all
360the sequences. If you have two lists of length 3, the output list is
3619 elements long:
362
363\begin{verbatim}
364seq1 = 'abc'
365seq2 = (1,2,3)
366>>> [ (x,y) for x in seq1 for y in seq2]
367[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1),
368('c', 2), ('c', 3)]
369\end{verbatim}
370
371To avoid introducing an ambiguity into Python's grammar, if
372\var{expression} is creating a tuple, it must be surrounded with
373parentheses. The first list comprehension below is a syntax error,
374while the second one is correct:
375
376\begin{verbatim}
377# Syntax error
378[ x,y for x in seq1 for y in seq2]
379# Correct
380[ (x,y) for x in seq1 for y in seq2]
381\end{verbatim}
382
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000383The idea of list comprehensions originally comes from the functional
384programming language Haskell (\url{http://www.haskell.org}). Greg
385Ewing argued most effectively for adding them to Python and wrote the
386initial list comprehension patch, which was then discussed for a
387seemingly endless time on the python-dev mailing list and kept
388up-to-date by Skip Montanaro.
389
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000390% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000391\section{Augmented Assignment}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000392
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000393Augmented assignment operators, another long-requested feature, have
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000394been added to Python 2.0. Augmented assignment operators include
395\code{+=}, \code{-=}, \code{*=}, and so forth. For example, the
396statement \code{a += 2} increments the value of the variable
397\code{a} by 2, equivalent to the slightly lengthier \code{a = a + 2}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000398
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000399The full list of supported assignment operators is \code{+=},
400\code{-=}, \code{*=}, \code{/=}, \code{\%=}, \code{**=}, \code{\&=},
Andrew M. Kuchling3cdb5762000-08-30 12:55:42 +0000401\code{|=}, \verb|^=|, \code{>>=}, and \code{<<=}. Python classes can
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000402override the augmented assignment operators by defining methods named
403\method{__iadd__}, \method{__isub__}, etc. For example, the following
404\class{Number} class stores a number and supports using += to create a
405new instance with an incremented value.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000406
407\begin{verbatim}
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000408class Number:
409 def __init__(self, value):
410 self.value = value
411 def __iadd__(self, increment):
412 return Number( self.value + increment)
413
414n = Number(5)
415n += 3
416print n.value
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000417\end{verbatim}
418
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000419The \method{__iadd__} special method is called with the value of the
420increment, and should return a new instance with an appropriately
421modified value; this return value is bound as the new value of the
422variable on the left-hand side.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000423
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000424Augmented assignment operators were first introduced in the C
425programming language, and most C-derived languages, such as
426\program{awk}, C++, Java, Perl, and PHP also support them. The augmented
427assignment patch was implemented by Thomas Wouters.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000428
429% ======================================================================
430\section{String Methods}
431
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000432Until now string-manipulation functionality was in the \module{string}
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000433module, which was usually a front-end for the \module{strop}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000434module written in C. The addition of Unicode posed a difficulty for
435the \module{strop} module, because the functions would all need to be
436rewritten in order to accept either 8-bit or Unicode strings. For
437functions such as \function{string.replace()}, which takes 3 string
438arguments, that means eight possible permutations, and correspondingly
439complicated code.
440
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000441Instead, Python 2.0 pushes the problem onto the string type, making
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000442string manipulation functionality available through methods on both
4438-bit strings and Unicode strings.
444
445\begin{verbatim}
446>>> 'andrew'.capitalize()
447'Andrew'
448>>> 'hostname'.replace('os', 'linux')
449'hlinuxtname'
450>>> 'moshe'.find('sh')
4512
452\end{verbatim}
453
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000454One thing that hasn't changed, a noteworthy April Fools' joke
455notwithstanding, is that Python strings are immutable. Thus, the
456string methods return new strings, and do not modify the string on
457which they operate.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000458
459The old \module{string} module is still around for backwards
460compatibility, but it mostly acts as a front-end to the new string
461methods.
462
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000463Two methods which have no parallel in pre-2.0 versions, although they
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000464did exist in JPython for quite some time, are \method{startswith()}
465and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
466== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
467
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +0000468One other method which deserves special mention is \method{join}. The
469\method{join} method of a string receives one parameter, a sequence of
470strings, and is equivalent to the \function{string.join} function from
471the old \module{string} module, with the arguments reversed. In other
472words, \code{s.join(seq)} is equivalent to the old
473\code{string.join(seq, s)}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000474
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000475% ======================================================================
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000476\section{Optional Collection of Cycles}
477
478The C implementation of Python uses reference counting to implement
479garbage collection. Every Python object maintains a count of the
480number of references pointing to itself, and adjusts the count as
481references are created or destroyed. Once the reference count reaches
482zero, the object is no longer accessible, since you need to have a
483reference to an object to access it, and if the count is zero, no
484references exist any longer.
485
486Reference counting has some pleasant properties: it's easy to
487understand and implement, and the resulting implementation is
488portable, fairly fast, and reacts well with other libraries that
489implement their own memory handling schemes. The major problem with
490reference counting is that it sometimes doesn't realise that objects
491are no longer accessible, resulting in a memory leak. This happens
492when there are cycles of references.
493
494Consider the simplest possible cycle,
495a class instance which has a reference to itself:
496
497\begin{verbatim}
498instance = SomeClass()
499instance.myself = instance
500\end{verbatim}
501
502After the above two lines of code have been executed, the reference
503count of \code{instance} is 2; one reference is from the variable
504named \samp{'instance'}, and the other is from the \samp{myself}
505attribute of the instance.
506
507If the next line of code is \code{del instance}, what happens? The
508reference count of \code{instance} is decreased by 1, so it has a
509reference count of 1; the reference in the \samp{myself} attribute
510still exists. Yet the instance is no longer accessible through Python
511code, and it could be deleted. Several objects can participate in a
512cycle if they have references to each other, causing all of the
513objects to be leaked.
514
515An experimental step has been made toward fixing this problem. When
516compiling Python, the \verb|--with-cycle-gc| option can be specified.
517This causes a cycle detection algorithm to be periodically executed,
518which looks for inaccessible cycles and deletes the objects involved.
519A new \module{gc} module provides functions to perform a garbage
520collection, obtain debugging statistics, and tuning the collector's parameters.
521
522Why isn't cycle detection enabled by default? Running the cycle detection
523algorithm takes some time, and some tuning will be required to
524minimize the overhead cost. It's not yet obvious how much performance
525is lost, because benchmarking this is tricky and depends crucially
526on how often the program creates and destroys objects.
527
528Several people tackled this problem and contributed to a solution. An
529early implementation of the cycle detection approach was written by
530Toby Kelsey. The current algorithm was suggested by Eric Tiedemann
531during a visit to CNRI, and Guido van Rossum and Neil Schemenauer
532wrote two different implementations, which were later integrated by
533Neil. Lots of other people offered suggestions along the way; the
534March 2000 archives of the python-dev mailing list contain most of the
535relevant discussion, especially in the threads titled ``Reference
536cycle collection for Python'' and ``Finalization again''.
537
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000538% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000539\section{Other Core Changes}
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000540
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000541Various minor changes have been made to Python's syntax and built-in
542functions. None of the changes are very far-reaching, but they're
543handy conveniences.
544
545\subsection{Minor Language Changes}
546
547A new syntax makes it more convenient to call a given function
548with a tuple of arguments and/or a dictionary of keyword arguments.
549In Python 1.5 and earlier, you'd use the \function{apply()}
550built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
551function \function{f()} with the argument tuple \var{args} and the
552keyword arguments in the dictionary \var{kw}. \function{apply()}
553is the same in 2.0, but thanks to a patch from
554Greg Ewing, \code{f(*\var{args}, **\var{kw})} as a shorter
555and clearer way to achieve the same effect. This syntax is
556symmetrical with the syntax for defining functions:
557
558\begin{verbatim}
559def f(*args, **kw):
560 # args is a tuple of positional args,
561 # kw is a dictionary of keyword args
562 ...
563\end{verbatim}
564
565The \keyword{print} statement can now have its output directed to a
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000566file-like object by following the \keyword{print} with
567\verb|>> file|, similar to the redirection operator in Unix shells.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000568Previously you'd either have to use the \method{write()} method of the
569file-like object, which lacks the convenience and simplicity of
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000570\keyword{print}, or you could assign a new value to
571\code{sys.stdout} and then restore the old value. For sending output to standard error,
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000572it's much easier to write this:
573
574\begin{verbatim}
575print >> sys.stderr, "Warning: action field not supplied"
576\end{verbatim}
577
578Modules can now be renamed on importing them, using the syntax
579\code{import \var{module} as \var{name}} or \code{from \var{module}
580import \var{name} as \var{othername}}. The patch was submitted by
581Thomas Wouters.
582
583A new format style is available when using the \code{\%} operator;
584'\%r' will insert the \function{repr()} of its argument. This was
585also added from symmetry considerations, this time for symmetry with
586the existing '\%s' format style, which inserts the \function{str()} of
587its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
588string containing \verb|'abc' abc|.
589
590Previously there was no way to implement a class that overrode
591Python's built-in \keyword{in} operator and implemented a custom
592version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
593present in the sequence \var{seq}; Python computes this by simply
594trying every index of the sequence until either \var{obj} is found or
595an \exception{IndexError} is encountered. Moshe Zadka contributed a
596patch which adds a \method{__contains__} magic method for providing a
597custom implementation for \keyword{in}. Additionally, new built-in
598objects written in C can define what \keyword{in} means for them via a
599new slot in the sequence protocol.
600
601Earlier versions of Python used a recursive algorithm for deleting
602objects. Deeply nested data structures could cause the interpreter to
603fill up the C stack and crash; Christian Tismer rewrote the deletion
604logic to fix this problem. On a related note, comparing recursive
605objects recursed infinitely and crashed; Jeremy Hylton rewrote the
606code to no longer crash, producing a useful result instead. For
607example, after this code:
608
609\begin{verbatim}
610a = []
611b = []
612a.append(a)
613b.append(b)
614\end{verbatim}
615
616The comparison \code{a==b} returns true, because the two recursive
617data structures are isomorphic. \footnote{See the thread ``trashcan
618and PR\#7'' in the April 2000 archives of the python-dev mailing list
619for the discussion leading up to this implementation, and some useful
620relevant links.
621%http://www.python.org/pipermail/python-dev/2000-April/004834.html
622}
623
624Work has been done on porting Python to 64-bit Windows on the Itanium
625processor, mostly by Trent Mick of ActiveState. (Confusingly,
626\code{sys.platform} is still \code{'win32'} on Win64 because it seems
627that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.)
628PythonWin also supports Windows CE; see the Python CE page at
629\url{http://starship.python.net/crew/mhammond/ce/} for more
630information.
631
632An attempt has been made to alleviate one of Python's warts, the
633often-confusing \exception{NameError} exception when code refers to a
634local variable before the variable has been assigned a value. For
635example, the following code raises an exception on the \keyword{print}
636statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError}
637exception is raised, while 2.0 raises a new
638\exception{UnboundLocalError} exception.
639\exception{UnboundLocalError} is a subclass of \exception{NameError},
640so any existing code that expects \exception{NameError} to be raised
641should still work.
642
643\begin{verbatim}
644def f():
645 print "i=",i
646 i = i + 1
647f()
648\end{verbatim}
649
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000650Two new exceptions, \exception{TabError} and
651\exception{IndentationError}, have been introduced. They're both
652subclasses of \exception{SyntaxError}, and are raised when Python code
653is found to be improperly indented.
654
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000655\subsection{Changes to Built-in Functions}
656
657A new built-in, \function{zip(\var{seq1}, \var{seq2}, ...)}, has been
658added. \function{zip()} returns a list of tuples where each tuple
659contains the i-th element from each of the argument sequences. The
660difference between \function{zip()} and \code{map(None, \var{seq1},
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000661\var{seq2})} is that \function{map()} pads the sequences with
662\code{None} if the sequences aren't all of the same length, while
663\function{zip()} truncates the returned list to the length of the
664shortest argument sequence.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000665
666The \function{int()} and \function{long()} functions now accept an
667optional ``base'' parameter when the first argument is a string.
668\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
669291. \code{int(123, 16)} raises a \exception{TypeError} exception
670with the message ``can't convert non-string with explicit base''.
671
672A new variable holding more detailed version information has been
673added to the \module{sys} module. \code{sys.version_info} is a tuple
674\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
675\var{serial})} For example, in a hypothetical 2.0.1beta1,
676\code{sys.version_info} would be \code{(2, 0, 1, 'beta', 1)}.
677\var{level} is a string such as \code{"alpha"}, \code{"beta"}, or
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000678\code{"final"} for a final release.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000679
680Dictionaries have an odd new method, \method{setdefault(\var{key},
681\var{default})}, which behaves similarly to the existing
682\method{get()} method. However, if the key is missing,
683\method{setdefault()} both returns the value of \var{default} as
684\method{get()} would do, and also inserts it into the dictionary as
685the value for \var{key}. Thus, the following lines of code:
686
687\begin{verbatim}
688if dict.has_key( key ): return dict[key]
689else:
690 dict[key] = []
691 return dict[key]
692\end{verbatim}
693
694can be reduced to a single \code{return dict.setdefault(key, [])} statement.
695
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000696The interpreter sets a maximum recursion depth in order to catch
697runaway recursion before filling the C stack and causing a core dump
698or GPF.. Previously this limit was fixed when you compiled Python,
699but in 2.0 the maximum recursion depth can be read and modified using
700\function{sys.getrecursionlimit} and \function{sys.setrecursionlimit}.
701The default value is 1000, and a rough maximum value for a given
702platform can be found by running a new script,
703\file{Misc/find_recursionlimit.py}.
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +0000704
705% ======================================================================
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000706\section{Porting to 2.0}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000707
708New Python releases try hard to be compatible with previous releases,
709and the record has been pretty good. However, some changes are
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000710considered useful enough, usually because they fix initial design decisions that
711turned out to be actively mistaken, that breaking backward compatibility
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000712can't always be avoided. This section lists the changes in Python 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000713that may cause old Python code to break.
714
715The change which will probably break the most code is tightening up
716the arguments accepted by some methods. Some methods would take
717multiple arguments and treat them as a tuple, particularly various
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000718list methods such as \method{.append()} and \method{.insert()}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000719In earlier versions of Python, if \code{L} is a list, \code{L.append(
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00007201,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000721causes a \exception{TypeError} exception to be raised, with the
722message: 'append requires exactly 1 argument; 2 given'. The fix is to
723simply add an extra set of parentheses to pass both values as a tuple:
724\code{L.append( (1,2) )}.
725
726The earlier versions of these methods were more forgiving because they
727used an old function in Python's C interface to parse their arguments;
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00007282.0 modernizes them to use \function{PyArg_ParseTuple}, the current
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000729argument parsing function, which provides more helpful error messages
730and treats multi-argument calls as errors. If you absolutely must use
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00007312.0 but can't fix your code, you can edit \file{Objects/listobject.c}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000732and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
733preserve the old behaviour; this isn't recommended.
734
735Some of the functions in the \module{socket} module are still
736forgiving in this way. For example, \function{socket.connect(
737('hostname', 25) )} is the correct form, passing a tuple representing
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000738an IP address, but \function{socket.connect( 'hostname', 25 )} also
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000739works. \function{socket.connect_ex()} and \function{socket.bind()} are
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000740similarly easy-going. 2.0alpha1 tightened these functions up, but
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000741because the documentation actually used the erroneous multiple
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000742argument form, many people wrote code which would break with the
743stricter checking. GvR backed out the changes in the face of public
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000744reaction, so for the \module{socket} module, the documentation was
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000745fixed and the multiple argument form is simply marked as deprecated;
746it \emph{will} be tightened up again in a future Python version.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000747
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000748The \code{\e x} escape in string literals now takes exactly 2 hex
749digits. Previously it would consume all the hex digits following the
750'x' and take the lowest 8 bits of the result, so \code{\e x123456} was
751equivalent to \code{\e x56}.
752
753The \exception{AttributeError} exception has a more friendly error message,
754whose text will be something like \code{'Spam' instance has no attribute 'eggs'}.
755Previously the error message was just the missing attribute name \code{eggs}, and
756code written to take advantage of this fact will break in 2.0.
757
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000758Some work has been done to make integers and long integers a bit more
759interchangeable. In 1.5.2, large-file support was added for Solaris,
760to allow reading files larger than 2Gb; this made the \method{tell()}
761method of file objects return a long integer instead of a regular
762integer. Some code would subtract two file offsets and attempt to use
763the result to multiply a sequence or slice a string, but this raised a
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000764\exception{TypeError}. In 2.0, long integers can be used to multiply
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000765or slice a sequence, and it'll behave as you'd intuitively expect it
766to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
767(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000768various contexts where previously only integers were accepted, such
769as in the \method{seek()} method of file objects, and in the formats
770supported by the \verb|%| operator (\verb|%d|, \verb|%i|, \verb|%x|,
771etc.). For example, \code{"\%d" \% 2L**64} will produce the string
772\samp{18446744073709551616}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000773
774The subtlest long integer change of all is that the \function{str()}
775of a long integer no longer has a trailing 'L' character, though
776\function{repr()} still includes it. The 'L' annoyed many people who
777wanted to print long integers that looked just like regular integers,
778since they had to go out of their way to chop off the character. This
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000779is no longer a problem in 2.0, but code which does \code{str(longval)[:-1]} and assumes the 'L' is there, will now lose
780the final digit.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000781
782Taking the \function{repr()} of a float now uses a different
783formatting precision than \function{str()}. \function{repr()} uses
Andrew M. Kuchling662d76e2000-06-25 14:32:48 +0000784\code{\%.17g} format string for C's \function{sprintf()}, while
785\function{str()} uses \code{\%.12g} as before. The effect is that
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000786\function{repr()} may occasionally show more decimal places than
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000787\function{str()}, for certain numbers.
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000788For example, the number 8.1 can't be represented exactly in binary, so
789\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
790\code{'8.1'}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000791
Andrew M. Kuchling730067e2000-06-30 01:44:05 +0000792The \code{-X} command-line option, which turned all standard
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000793exceptions into strings instead of classes, has been removed; the
794standard exceptions will now always be classes. The
795\module{exceptions} module containing the standard exceptions was
796translated from Python to a built-in C module, written by Barry Warsaw
797and Fredrik Lundh.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000798
Andrew M. Kuchling791b3662000-07-01 15:04:18 +0000799% Commented out for now -- I don't think anyone will care.
800%The pattern and match objects provided by SRE are C types, not Python
801%class instances as in 1.5. This means you can no longer inherit from
802%\class{RegexObject} or \class{MatchObject}, but that shouldn't be much
803%of a problem since no one should have been doing that in the first
804%place.
805
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000806% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000807\section{Extending/Embedding Changes}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000808
809Some of the changes are under the covers, and will only be apparent to
Andrew M. Kuchling8357c4c2000-07-01 00:14:43 +0000810people writing C extension modules or embedding a Python interpreter
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000811in a larger application. If you aren't dealing with Python's C API,
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000812you can safely skip this section.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000813
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000814The version number of the Python C API was incremented, so C
815extensions compiled for 1.5.2 must be recompiled in order to work with
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00008162.0. On Windows, attempting to import a third party extension built
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000817for Python 1.5.x usually results in an immediate crash; there's not
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +0000818much we can do about this. (Here's Mark Hammond's explanation of the
819reasons for the crash. The 1.5 module is linked against
820\file{Python15.dll}. When \file{Python.exe} , linked against
821\file{Python16.dll}, starts up, it initializes the Python data
822structures in \file{Python16.dll}. When Python then imports the
823module \file{foo.pyd} linked against \file{Python15.dll}, it
824immediately tries to call the functions in that DLL. As Python has
825not been initialized in that DLL, the program immediately crashes.)
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000826
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000827Users of Jim Fulton's ExtensionClass module will be pleased to find
828out that hooks have been added so that ExtensionClasses are now
829supported by \function{isinstance()} and \function{issubclass()}.
830This means you no longer have to remember to write code such as
831\code{if type(obj) == myExtensionClass}, but can use the more natural
832\code{if isinstance(obj, myExtensionClass)}.
833
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000834The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000835support dynamic loading on many different platforms, was cleaned up
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000836and reorganised by Greg Stein. \file{importdl.c} is now quite small,
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000837and platform-specific code has been moved into a bunch of
Andrew M. Kuchlingb9fb1f22000-08-04 12:40:35 +0000838\file{Python/dynload_*.c} files. Another cleanup: there were also a
839number of \file{my*.h} files in the Include/ directory that held
840various portability hacks; they've been merged into a single file,
841\file{Include/pyport.h}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000842
843Vladimir Marangozov's long-awaited malloc restructuring was completed,
844to make it easy to have the Python interpreter use a custom allocator
845instead of C's standard \function{malloc()}. For documentation, read
Andrew M. Kuchling2d2dc9f2000-08-17 00:27:06 +0000846the comments in \file{Include/pymem.h} and
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000847\file{Include/objimpl.h}. For the lengthy discussions during which
848the interface was hammered out, see the Web archives of the 'patches'
849and 'python-dev' lists at python.org.
850
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000851Recent versions of the GUSI development environment for MacOS support
852POSIX threads. Therefore, Python's POSIX threading support now works
853on the Macintosh. Threading support using the user-space GNU \texttt{pth}
854library was also contributed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000855
856Threading support on Windows was enhanced, too. Windows supports
857thread locks that use kernel objects only in case of contention; in
858the common case when there's no contention, they use simpler functions
859which are an order of magnitude faster. A threaded version of Python
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00008601.5.2 on NT is twice as slow as an unthreaded version; with the 2.0
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000861changes, the difference is only 10\%. These improvements were
862contributed by Yakov Markovitch.
863
Andrew M. Kuchling08d87c62000-07-09 15:05:15 +0000864Python 2.0's source now uses only ANSI C prototypes, so compiling Python now
865requires an ANSI C compiler, and can no longer be done using a compiler that
866only supports K\&R C.
867
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000868Previously the Python virtual machine used 16-bit numbers in its
869bytecode, limiting the size of source files. In particular, this
870affected the maximum size of literal lists and dictionaries in Python
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000871source; occasionally people who are generating Python code would run
872into this limit. A patch by Charles G. Waldman raises the limit from
873\verb|2^16| to \verb|2^{32}|.
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000874
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000875Three new convenience functions intended for adding constants to a
876module's dictionary at module initialization time were added:
877\function{PyModule_AddObject()}, \function{PyModule_AddIntConstant()},
878and \function{PyModule_AddStringConstant()}. Each of these functions
879takes a module object, a null-terminated C string containing the name
880to be added, and a third argument for the value to be assigned to the
881name. This third argument is, respectively, a Python object, a C
882long, or a C string.
883
884A wrapper API was added for Unix-style signal handlers.
885\function{PyOS_getsig()} gets a signal handler and
886\function{PyOS_setsig()} will set a new handler.
Andrew M. Kuchling4d46d382000-09-06 17:58:49 +0000887
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000888% ======================================================================
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000889\section{Distutils: Making Modules Easy to Install}
890
891Before Python 2.0, installing modules was a tedious affair -- there
892was no way to figure out automatically where Python is installed, or
893what compiler options to use for extension modules. Software authors
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +0000894had to go through an arduous ritual of editing Makefiles and
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000895configuration files, which only really work on Unix and leave Windows
Andrew M. Kuchling3ad4e742000-09-27 01:33:41 +0000896and MacOS unsupported. Python users faced wildly differing
897installation instructions which varied between different extension
898packages, which made adminstering a Python installation something of a
899chore.
Andrew M. Kuchling43737642000-08-30 00:51:02 +0000900
901The SIG for distribution utilities, shepherded by Greg Ward, has
902created the Distutils, a system to make package installation much
903easier. They form the \module{distutils} package, a new part of
904Python's standard library. In the best case, installing a Python
905module from source will require the same steps: first you simply mean
906unpack the tarball or zip archive, and the run ``\code{python setup.py
907install}''. The platform will be automatically detected, the compiler
908will be recognized, C extension modules will be compiled, and the
909distribution installed into the proper directory. Optional
910command-line arguments provide more control over the installation
911process, the distutils package offers many places to override defaults
912-- separating the build from the install, building or installing in
913non-default directories, and more.
914
915In order to use the Distutils, you need to write a \file{setup.py}
916script. For the simple case, when the software contains only .py
917files, a minimal \file{setup.py} can be just a few lines long:
918
919\begin{verbatim}
920from distutils.core import setup
921setup (name = "foo", version = "1.0",
922 py_modules = ["module1", "module2"])
923\end{verbatim}
924
925The \file{setup.py} file isn't much more complicated if the software
926consists of a few packages:
927
928\begin{verbatim}
929from distutils.core import setup
930setup (name = "foo", version = "1.0",
931 packages = ["package", "package.subpackage"])
932\end{verbatim}
933
934A C extension can be the most complicated case; here's an example taken from
935the PyXML package:
936
937
938\begin{verbatim}
939from distutils.core import setup, Extension
940
941expat_extension = Extension('xml.parsers.pyexpat',
942 define_macros = [('XML_NS', None)],
943 include_dirs = [ 'extensions/expat/xmltok',
944 'extensions/expat/xmlparse' ],
945 sources = [ 'extensions/pyexpat.c',
946 'extensions/expat/xmltok/xmltok.c',
947 'extensions/expat/xmltok/xmlrole.c',
948 ]
949 )
950setup (name = "PyXML", version = "0.5.4",
951 ext_modules =[ expat_extension ] )
952
953\end{verbatim}
954
955The Distutils can also take care of creating source and binary
956distributions. The ``sdist'' command, run by ``\code{python setup.py
957sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
958Adding new commands isn't difficult, ``bdist_rpm'' and
959``bdist_wininst'' commands have already been contributed to create an
960RPM distribution and a Windows installer for the software,
961respectively. Commands to create other distribution formats such as
962Debian packages and Solaris \file{.pkg} files are in various stages of
963development.
964
965All this is documented in a new manual, \textit{Distributing Python
966Modules}, that joins the basic set of Python documentation.
967
968% ======================================================================
969%\section{New XML Code}
970
971%XXX write this section...
972
973% ======================================================================
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000974\section{Module changes}
975
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000976Lots of improvements and bugfixes were made to Python's extensive
977standard library; some of the affected modules include
978\module{readline}, \module{ConfigParser}, \module{cgi},
979\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
980\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
981and \module{nntplib}. Consult the CVS logs for the exact
982patch-by-patch details.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000983
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000984Brian Gallew contributed OpenSSL support for the \module{socket}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000985module. OpenSSL is an implementation of the Secure Socket Layer,
986which encrypts the data being sent over a socket. When compiling
987Python, you can edit \file{Modules/Setup} to include SSL support,
988which adds an additional function to the \module{socket} module:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000989\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000990which takes a socket object and returns an SSL socket. The
991\module{httplib} and \module{urllib} modules were also changed to
992support ``https://'' URLs, though no one has implemented FTP or SMTP
993over SSL.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000994
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +0000995The \module{httplib} module has been rewritten by Greg Stein to
996support HTTP/1.1. Backward compatibility with the 1.5 version of
997\module{httplib} is provided, though using HTTP/1.1 features such as
998pipelining will require rewriting code to use a different set of
999interfaces.
1000
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00001001The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
10028.3, and support for the older 7.x versions has been dropped. The
Andrew M. Kuchling791b3662000-07-01 15:04:18 +00001003Tkinter module now supports displaying Unicode strings in Tk widgets.
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +00001004Also, Fredrik Lundh contributed an optimization which makes operations
1005like \code{create_line} and \code{create_polygon} much faster,
Andrew M. Kuchling791b3662000-07-01 15:04:18 +00001006especially when using lots of coordinates.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001007
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00001008The \module{curses} module has been greatly extended, starting from
1009Oliver Andrich's enhanced version, to provide many additional
1010functions from ncurses and SYSV curses, such as colour, alternative
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +00001011character set support, pads, and mouse support. This means the module
1012is no longer compatible with operating systems that only have BSD
1013curses, but there don't seem to be any currently maintained OSes that
1014fall into this category.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001015
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001016As mentioned in the earlier discussion of 2.0's Unicode support, the
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001017underlying implementation of the regular expressions provided by the
1018\module{re} module has been changed. SRE, a new regular expression
1019engine written by Fredrik Lundh and partially funded by Hewlett
1020Packard, supports matching against both 8-bit strings and Unicode
1021strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001022
1023% ======================================================================
1024\section{New modules}
1025
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001026A number of new modules were added. We'll simply list them with brief
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001027descriptions; consult the 2.0 documentation for the details of a
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001028particular module.
1029
1030\begin{itemize}
1031
Andrew M. Kuchling62cdd962000-06-30 12:46:41 +00001032\item{\module{atexit}}:
1033For registering functions to be called before the Python interpreter exits.
1034Code that currently sets
1035\code{sys.exitfunc} directly should be changed to
1036use the \module{atexit} module instead, importing \module{atexit}
1037and calling \function{atexit.register()} with
1038the function to be called on exit.
1039(Contributed by Skip Montanaro.)
1040
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001041\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
1042
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +00001043\item{\module{filecmp}:} Supersedes the old \module{cmp}, \module{cmpcache} and
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001044\module{dircmp} modules, which have now become deprecated.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001045(Contributed by Gordon MacMillan and Moshe Zadka.)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001046
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +00001047\item{\module{linuxaudiodev}:} Support for the \file{/dev/audio}
1048device on Linux, a twin to the existing \module{sunaudiodev} module.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001049(Contributed by Peter Bosch.)
1050
1051\item{\module{mmap}:} An interface to memory-mapped files on both
1052Windows and Unix. A file's contents can be mapped directly into
1053memory, at which point it behaves like a mutable string, so its
1054contents can be read and modified. They can even be passed to
1055functions that expect ordinary strings, such as the \module{re}
1056module. (Contributed by Sam Rushing, with some extensions by
1057A.M. Kuchling.)
1058
Andrew M. Kuchling35e8afb2000-07-08 12:06:31 +00001059\item{\module{pyexpat}:} An interface to the Expat XML parser.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001060(Contributed by Paul Prescod.)
1061
1062\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
1063used for writing Web spiders that politely avoid certain areas of a
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +00001064Web site. The parser accepts the contents of a \file{robots.txt} file,
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001065builds a set of rules from it, and can then answer questions about
1066the fetchability of a given URL. (Contributed by Skip Montanaro.)
1067
1068\item{\module{tabnanny}:} A module/script to
Andrew M. Kuchling5e08a012000-09-04 17:59:27 +00001069check Python source code for ambiguous indentation.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001070(Contributed by Tim Peters.)
1071
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +00001072\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
1073
Andrew M. Kuchling08d87c62000-07-09 15:05:15 +00001074\item{\module{webbrowser}:} A module that provides a platform independent
1075way to launch a web browser on a specific URL. For each platform, various
1076browsers are tried in a specific order. The user can alter which browser
1077is launched by setting the \var{BROWSER} environment variable.
1078(Originally inspired by Eric S. Raymond's patch to \module{urllib}
1079which added similar functionality, but
1080the final module comes from code originally
1081implemented by Fred Drake as \file{Tools/idle/BrowserControl.py},
1082and adapted for the standard library by Fred.)
1083
Andrew M. Kuchlingd500e442000-09-06 12:30:25 +00001084\item{\module{_winreg}:} An interface to the
Andrew M. Kuchlingfed4f1e2000-07-01 12:33:43 +00001085Windows registry. \module{_winreg} is an adaptation of functions that
1086have been part of PythonWin since 1995, but has now been added to the core
Andrew M. Kuchlingd500e442000-09-06 12:30:25 +00001087distribution, and enhanced to support Unicode.
1088\module{_winreg} was written by Bill Tutt and Mark Hammond.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001089
1090\item{\module{zipfile}:} A module for reading and writing ZIP-format
1091archives. These are archives produced by \program{PKZIP} on
1092DOS/Windows or \program{zip} on Unix, not to be confused with
1093\program{gzip}-format files (which are supported by the \module{gzip}
1094module)
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001095(Contributed by James C. Ahlstrom.)
1096
Andrew M. Kuchling69db0e42000-06-28 02:16:00 +00001097\item{\module{imputil}:} A module that provides a simpler way for
1098writing customised import hooks, in comparison to the existing
1099\module{ihooks} module. (Implemented by Greg Stein, with much
1100discussion on python-dev along the way.)
1101
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001102\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001103
1104% ======================================================================
1105\section{IDLE Improvements}
1106
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001107IDLE is the official Python cross-platform IDE, written using Tkinter.
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001108Python 2.0 includes IDLE 0.6, which adds a number of new features and
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001109improvements. A partial list:
1110
1111\begin{itemize}
1112\item UI improvements and optimizations,
1113especially in the area of syntax highlighting and auto-indentation.
1114
1115\item The class browser now shows more information, such as the top
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001116level functions in a module.
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001117
1118\item Tab width is now a user settable option. When opening an existing Python
1119file, IDLE automatically detects the indentation conventions, and adapts.
1120
1121\item There is now support for calling browsers on various platforms,
1122used to open the Python documentation in a browser.
1123
1124\item IDLE now has a command line, which is largely similar to
1125the vanilla Python interpreter.
1126
1127\item Call tips were added in many places.
1128
1129\item IDLE can now be installed as a package.
1130
1131\item In the editor window, there is now a line/column bar at the bottom.
1132
1133\item Three new keystroke commands: Check module (Alt-F5), Import
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001134module (F5) and Run script (Ctrl-F5).
Andrew M. Kuchlingc0328f02000-06-10 15:11:20 +00001135
1136\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001137
1138% ======================================================================
1139\section{Deleted and Deprecated Modules}
1140
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001141A few modules have been dropped because they're obsolete, or because
1142there are now better ways to do the same thing. The \module{stdwin}
1143module is gone; it was for a platform-independent windowing toolkit
1144that's no longer developed.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001145
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +00001146A number of modules have been moved to the
1147\file{lib-old} subdirectory:
1148\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
1149\module{find}, \module{grep}, \module{packmail},
1150\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
1151If you have code which relies on a module that's been moved to
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001152\file{lib-old}, you can simply add that directory to \code{sys.path}
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +00001153to get them back, but you're encouraged to update any code that uses
1154these modules.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001155
Andrew M. Kuchling730067e2000-06-30 01:44:05 +00001156\section{Acknowledgements}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00001157
Andrew M. Kuchlinga6161ed2000-07-01 00:23:02 +00001158The authors would like to thank the following people for offering
Andrew M. Kuchling118ee962000-09-27 01:01:18 +00001159suggestions on drafts of this article: Mark Hammond, Gregg Hauser,
1160Fredrik Lundh, Detlef Lannert, Skip Montanaro, Vladimir Marangozov,
1161Guido van Rossum, and Neil Schemenauer.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001162
1163\end{document}