blob: 021455d5505847be52933bfef2e8aa1aab5d60e8 [file] [log] [blame]
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00001\documentclass{howto}
2
3\title{What's New in Python 1.6}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +00004\release{0.02}
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +00005\author{A.M. Kuchling and Moshe Zadka}
6\authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} }
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +00007\begin{document}
8\maketitle\tableofcontents
9
10\section{Introduction}
11
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +000012{\large This is a draft document; please report inaccuracies and
13omissions to the authors. \\
14XXX marks locations where fact-checking or rewriting is still needed.
15}
16
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000017A new release of Python, version 1.6, will be released some time this
18summer. Alpha versions are already available from
19\url{http://www.python.org/1.6/}. This article talks about the
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000020exciting new features in 1.6, highlights some other useful changes,
21and points out a few incompatible changes that may require rewriting
22code.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000023
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000024Python's development never completely stops between releases, and a
25steady flow of bug fixes and improvements are always being submitted.
26A host of minor fixes, a few optimizations, additional docstrings, and
27better error messages went into 1.6; to list them all would be
28impossible, but they're certainly significant. Consult the
29publicly-available CVS logs if you want to see the full list.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000030
31% ======================================================================
32\section{Unicode}
33
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000034The largest new feature in Python 1.6 is a new fundamental data type:
35Unicode strings. Unicode uses 16-bit numbers to represent characters
36instead of the 8-bit number used by ASCII, meaning that 65,536
37distinct characters can be supported.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000038
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000039The final interface for Unicode support was arrived at through
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000040countless often-stormy discussions on the python-dev mailing list, and
41mostly implemented by Marc-Andr\'e Lemburg. A detailed explanation of
42the interface is in the file
43\file{Misc/unicode.txt} in the Python source distribution; it's also
44available on the Web at
45\url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000046This article will simply cover the most significant points from the
47full interface.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000048
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000049In Python source code, Unicode strings are written as
50\code{u"string"}. Arbitrary Unicode characters can be written using a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000051new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000524-digit hexadecimal number from 0000 to FFFF. The existing
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000053\code{\e x\var{HHHH}} escape sequence can also be used, and octal
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000054escapes can be used for characters up to U+01FF, which is represented
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000055by \code{\e 777}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +000056
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000057Unicode strings, just like regular strings, are an immutable sequence
58type, so they can be indexed and sliced. They also have an
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000059\method{encode( \optional{\var{encoding}} )} method that returns an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000060string in the desired encoding. Encodings are named by strings, such
61as \code{'ascii'}, \code{'utf-8'}, \code{'iso-8859-1'}, or whatever.
62A codec API is defined for implementing and registering new encodings
63that are then available throughout a Python program. If an encoding
64isn't specified, the default encoding is always 7-bit ASCII. (XXX is
65that the current default encoding?)
66
67Combining 8-bit and Unicode strings always coerces to Unicode, using
68the default ASCII encoding; the result of \code{'a' + u'bc'} is
Andrew M. Kuchling7f6270d2000-06-09 02:48:18 +000069\code{u'abc'}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000070
71New built-in functions have been added, and existing built-ins
72modified to support Unicode:
73
74\begin{itemize}
75\item \code{unichr(\var{ch})} returns a Unicode string 1 character
76long, containing the character \var{ch}.
77
78\item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer.
79
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000080\item \code{unicode(\var{string}, \optional{\var{encoding},}
81\optional{\var{errors}} ) } creates a Unicode string from an 8-bit
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000082string. \code{encoding} is a string naming the encoding to use.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000083The \code{errors} parameter specifies the treatment of characters that
84are invalid for the current encoding; passing \code{'strict'} as the
85value causes an exception to be raised on any encoding error, while
86\code{'ignore'} causes errors to be silently ignored and
87\code{'replace'} uses U+FFFD, the official replacement character, in
88case of any problems.
89
90\end{itemize}
91
92A new module, \module{unicodedata}, provides an interface to Unicode
93character properties. For example, \code{unicodedata.category(u'A')}
94returns the 2-character string 'Lu', the 'L' denoting it's a letter,
95and 'u' meaning that it's uppercase.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000096\code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +000097an Arabic number.
98
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +000099The \module{codecs} module contains functions to look up existing encodings
100and register new ones. Unless you want to implement a
101new encoding, you'll most often use the
102\function{codecs.lookup(\var{encoding})} function, which returns a
1034-element tuple: \code{(\var{encode_func},
104\var{decode_func}, \var{stream_reader}, \var{stream_writer})}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000105
106\begin{itemize}
107\item \var{encode_func} is a function that takes a Unicode string, and
108returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string}
109is an 8-bit string containing a portion (perhaps all) of the Unicode
110string converted into the given encoding, and \var{length} tells you how much of the Unicode string was converted.
111
112\item \var{decode_func} is the mirror of \var{encode_func},
113taking a Unicode string and
114returns a 2-tuple \code{(\var{ustring}, \var{length})} containing a Unicode string
115and \var{length} telling you how much of the string was consumed.
116
117\item \var{stream_reader} is a class that supports decoding input from
118a stream. \var{stream_reader(\var{file_obj})} returns an object that
119supports the \method{read()}, \method{readline()}, and
120\method{readlines()} methods. These methods will all translate from
121the given encoding and return Unicode strings.
122
123\item \var{stream_writer}, similarly, is a class that supports
124encoding output to a stream. \var{stream_writer(\var{file_obj})}
125returns an object that supports the \method{write()} and
126\method{writelines()} methods. These methods expect Unicode strings, translating them to the given encoding on output.
127\end{itemize}
128
129For example, the following code writes a Unicode string into a file,
130encoding it as UTF-8:
131
132\begin{verbatim}
133import codecs
134
135unistr = u'\u0660\u2000ab ...'
136
137(UTF8_encode, UTF8_decode,
138 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')
139
140output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
141output.write( unistr )
142output.close()
143\end{verbatim}
144
145The following code would then read UTF-8 input from the file:
146
147\begin{verbatim}
148input = UTF8_streamread( open( '/tmp/output', 'rb') )
149print repr(input.read())
150input.close()
151\end{verbatim}
152
153Unicode-aware regular expressions are available through the
154\module{re} module, which has a new underlying implementation called
155SRE written by Fredrik Lundh of Secret Labs AB.
156
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000157(XXX M.A. Lemburg added a -U command line option, which causes the
158Python compiler to interpret all "..." strings as u"..." (same with
159r"..." and ur"..."). Is this just for experimenting/testing, or is it
160actually a new feature?)
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000161
162% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000163\section{Distutils: Making Modules Easy to Install}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000164
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000165Before Python 1.6, installing modules was a tedious affair -- there
166was no way to figure out automatically where Python is installed, or
167what compiler options to use for extension modules. Software authors
168had to go through an ardous ritual of editing Makefiles and
169configuration files, which only really work on Unix and leave Windows
170and MacOS unsupported. Software users faced wildly differing
171installation instructions
172
173The SIG for distribution utilities, shepherded by Greg Ward, has
174created the Distutils, a system to make package installation much
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000175easier. They form the \module{distutils} package, a new part of
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000176Python's standard library. In the best case, installing a Python
177module from source will require the same steps: first you simply mean
178unpack the tarball or zip archive, and the run ``\code{python setup.py
179install}''. The platform will be automatically detected, the compiler
180will be recognized, C extension modules will be compiled, and the
181distribution installed into the proper directory. Optional
182command-line arguments provide more control over the installation
183process, the distutils package offers many places to override defaults
184-- separating the build from the install, building or installing in
185non-default directories, and more.
186
187In order to use the Distutils, you need to write a \file{setup.py}
188script. For the simple case, when the software contains only .py
189files, a minimal \file{setup.py} can be just a few lines long:
190
191\begin{verbatim}
192from distutils.core import setup
193setup (name = "foo", version = "1.0",
194 py_modules = ["module1", "module2"])
195\end{verbatim}
196
197The \file{setup.py} file isn't much more complicated if the software
198consists of a few packages:
199
200\begin{verbatim}
201from distutils.core import setup
202setup (name = "foo", version = "1.0",
203 packages = ["package", "package.subpackage"])
204\end{verbatim}
205
206A C extension can be the most complicated case; here's an example taken from
207the PyXML package:
208
209
210\begin{verbatim}
211from distutils.core import setup, Extension
212
213expat_extension = Extension('xml.parsers.pyexpat',
214 define_macros = [('XML_NS', None)],
215 include_dirs = [ 'extensions/expat/xmltok',
216 'extensions/expat/xmlparse' ],
217 sources = [ 'extensions/pyexpat.c',
218 'extensions/expat/xmltok/xmltok.c',
219 'extensions/expat/xmltok/xmlrole.c',
220 ]
221 )
222setup (name = "PyXML", version = "0.5.4",
223 ext_modules =[ expat_extension ] )
224
225\end{verbatim}
226
227The Distutils can also take care of creating source and binary
228distributions. The ``sdist'' command, run by ``\code{python setup.py
229sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}.
230Adding new commands isn't difficult, and a ``bdist_rpm'' command has
231already been contributed to create an RPM distribution for the
232software. Commands to create Windows installer programs, Debian
233packages, and Solaris .pkg files have been discussed and are in
234various stages of development.
235
236All this is documented in a new manual, \textit{Distributing Python
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000237Modules}, that joins the basic set of Python documentation.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000238
239% ======================================================================
240\section{String Methods}
241
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000242Until now string-manipulation functionality was in the \module{string}
243Python module, which was usually a front-end for the \module{strop}
244module written in C. The addition of Unicode posed a difficulty for
245the \module{strop} module, because the functions would all need to be
246rewritten in order to accept either 8-bit or Unicode strings. For
247functions such as \function{string.replace()}, which takes 3 string
248arguments, that means eight possible permutations, and correspondingly
249complicated code.
250
251Instead, Python 1.6 pushes the problem onto the string type, making
252string manipulation functionality available through methods on both
2538-bit strings and Unicode strings.
254
255\begin{verbatim}
256>>> 'andrew'.capitalize()
257'Andrew'
258>>> 'hostname'.replace('os', 'linux')
259'hlinuxtname'
260>>> 'moshe'.find('sh')
2612
262\end{verbatim}
263
264One thing that hasn't changed, April Fools' jokes notwithstanding, is
265that Python strings are immutable. Thus, the string methods return new
266strings, and do not modify the string on which they operate.
267
268The old \module{string} module is still around for backwards
269compatibility, but it mostly acts as a front-end to the new string
270methods.
271
272Two methods which have no parallel in pre-1.6 versions, although they
273did exist in JPython for quite some time, are \method{startswith()}
274and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)]
275== t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}.
276
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000277(XXX what'll happen to join? is this even worth mentioning?) One
278other method which deserves special mention is \method{join}. The
279\method{join} method of a string receives one parameter, a sequence of
280strings, and is equivalent to the \function{string.join} function from
281the old \module{string} module, with the arguments reversed. In other
282words, \code{s.join(seq)} is equivalent to the old
283\code{string.join(seq, s)}.
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000284
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000285% ======================================================================
286\section{Porting to 1.6}
287
288New Python releases try hard to be compatible with previous releases,
289and the record has been pretty good. However, some changes are
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000290considered useful enough, often fixing initial design decisions that
291turned to be actively mistaken, that breaking backward compatibility
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000292can't always be avoided. This section lists the changes in Python 1.6
293that may cause old Python code to break.
294
295The change which will probably break the most code is tightening up
296the arguments accepted by some methods. Some methods would take
297multiple arguments and treat them as a tuple, particularly various
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000298list methods such as \method{.append()} and \method{.insert()}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000299In earlier versions of Python, if \code{L} is a list, \code{L.append(
3001,2 )} appends the tuple \code{(1,2)} to the list. In Python 1.6 this
301causes a \exception{TypeError} exception to be raised, with the
302message: 'append requires exactly 1 argument; 2 given'. The fix is to
303simply add an extra set of parentheses to pass both values as a tuple:
304\code{L.append( (1,2) )}.
305
306The earlier versions of these methods were more forgiving because they
307used an old function in Python's C interface to parse their arguments;
3081.6 modernizes them to use \function{PyArg_ParseTuple}, the current
309argument parsing function, which provides more helpful error messages
310and treats multi-argument calls as errors. If you absolutely must use
3111.6 but can't fix your code, you can edit \file{Objects/listobject.c}
312and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to
313preserve the old behaviour; this isn't recommended.
314
315Some of the functions in the \module{socket} module are still
316forgiving in this way. For example, \function{socket.connect(
317('hostname', 25) )} is the correct form, passing a tuple representing
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000318an IP address, but \function{socket.connect( 'hostname', 25 )} also
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000319works. \function{socket.connect_ex()} and \function{socket.bind()} are
320similarly easy-going. 1.6alpha1 tightened these functions up, but
321because the documentation actually used the erroneous multiple
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000322argument form, many people wrote code which would break with the
323stricter checking. GvR backed out the changes in the face of public
324reaction, so for the\module{socket} module, the documentation was
325fixed and the multiple argument form is simply marked as deprecated;
326it \emph{will} be tightened up again in a future Python version.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000327
328Some work has been done to make integers and long integers a bit more
329interchangeable. In 1.5.2, large-file support was added for Solaris,
330to allow reading files larger than 2Gb; this made the \method{tell()}
331method of file objects return a long integer instead of a regular
332integer. Some code would subtract two file offsets and attempt to use
333the result to multiply a sequence or slice a string, but this raised a
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000334\exception{TypeError}. In 1.6, long integers can be used to multiply
335or slice a sequence, and it'll behave as you'd intuitively expect it
336to; \code{3L * 'abc'} produces 'abcabcabc', and \code{
337(0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in
338various new places where previously only integers were accepted, such
339as in the \method{seek()} method of file objects.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000340
341The subtlest long integer change of all is that the \function{str()}
342of a long integer no longer has a trailing 'L' character, though
343\function{repr()} still includes it. The 'L' annoyed many people who
344wanted to print long integers that looked just like regular integers,
345since they had to go out of their way to chop off the character. This
346is no longer a problem in 1.6, but code which assumes the 'L' is
347there, and does \code{str(longval)[:-1]} will now lose the final
348digit.
349
350Taking the \function{repr()} of a float now uses a different
351formatting precision than \function{str()}. \function{repr()} uses
352``%.17g'' format string for C's \function{sprintf()}, while
353\function{str()} uses ``%.12g'' as before. The effect is that
354\function{repr()} may occasionally show more decimal places than
355\function{str()}, for numbers
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000356For example, the number 8.1 can't be represented exactly in binary, so
357\code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is
358\code{'8.1'}.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000359
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000360%The \code{-X} command-line option, which turns all standard exceptions
361%into strings instead of classes, has been removed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000362
363% ======================================================================
364\section{Core Changes}
365
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000366Various minor changes have been made to Python's syntax and built-in
367functions. None of the changes are very far-reaching, but they're
368handy conveniences.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000369
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000370A change to syntax makes it more convenient to call a given function
371with a tuple of arguments and/or a dictionary of keyword arguments.
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000372In Python 1.5 and earlier, you do this with the \function{apply()}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000373built-in function: \code{apply(f, \var{args}, \var{kw})} calls the
374function \function{f()} with the argument tuple \var{args} and the
375keyword arguments in the dictionary \var{kw}. Thanks to a patch from
376Greg Ewing, 1.6 adds \code{f(*\var{args}, **\var{kw})} as a shorter
377and clearer way to achieve the same effect. This syntax is
378symmetrical with the syntax for defining functions:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000379
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000380\begin{verbatim}
381def f(*args, **kw):
382 # args is a tuple of positional args,
383 # kw is a dictionary of keyword args
384 ...
385\end{verbatim}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000386
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000387A new format style is available when using the \code{\%} operator.
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000388'\%r' will insert the \function{repr()} of its argument. This was
389also added from symmetry considerations, this time for symmetry with
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000390the existing '\%s' format style, which inserts the \function{str()} of
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000391its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000392string containing \verb|'abc' abc|.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000393
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000394The \function{int()} and \function{long()} functions now accept an
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000395optional ``base'' parameter when the first argument is a string.
396\code{int('123', 10)} returns 123, while \code{int('123', 16)} returns
397291. \code{int(123, 16)} raises a \exception{TypeError} exception
398with the message ``can't convert non-string with explicit base''.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000399
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000400Previously there was no way to implement a class that overrode
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000401Python's built-in \keyword{in} operator and implemented a custom
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000402version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is
403present in the sequence \var{seq}; Python computes this by simply
404trying every index of the sequence until either \var{obj} is found or
405an \exception{IndexError} is encountered. Moshe Zadka contributed a
406patch which adds a \method{__contains__} magic method for providing a
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000407custom implementation for \keyword{in}. Additionally, new built-in
408objects written in C can define what \keyword{in} means for them via a
409new slot in the sequence protocol.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000410
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000411Earlier versions of Python used a recursive algorithm for deleting
412objects. Deeply nested data structures could cause the interpreter to
413fill up the C stack and crash; Christian Tismer rewrote the deletion
414logic to fix this problem. On a related note, comparing recursive
415objects recursed infinitely and crashed; Jeremy Hylton rewrote the
416code to no longer crash, producing a useful result instead. For
417example, after this code:
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000418
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000419\begin{verbatim}
420a = []
421b = []
422a.append(a)
423b.append(b)
424\end{verbatim}
425
426The comparison \code{a==b} returns true, because the two recursive
427data structures are isomorphic.
428\footnote{See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links.
429%http://www.python.org/pipermail/python-dev/2000-April/004834.html
430}
431
432Work has been done on porting Python to 64-bit Windows on the Itanium
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000433processor, mostly by Trent Mick of ActiveState. (Confusingly, \code{sys.platform} is still \code{'win32'} on
434Win64 because it seems that for ease of porting, MS Visual C++ treats code
435as 32 bit.
436) PythonWin also supports Windows CE; see the Python CE page at
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000437\url{http://www.python.net/crew/mhammond/ce/} for more information.
438
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000439An attempt has been made to alleviate one of Python's warts, the
440often-confusing \exception{NameError} exception when code refers to a
441local variable before the variable has been assigned a value. For
442example, the following code raises an exception on the \keyword{print}
443statement in both 1.5.2 and 1.6; in 1.5.2 a \exception{NameError}
444exception is raised, while 1.6 raises \exception{UnboundLocalError}.
445
446\begin{verbatim}
447def f():
448 print "i=",i
449 i = i + 1
450f()
451\end{verbatim}
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000452
453A new variable holding more detailed version information has been
454added to the \module{sys} module. \code{sys.version_info} is a tuple
455\code{(\var{major}, \var{minor}, \var{micro}, \var{level},
456\var{serial})} For example, in 1.6a2 \code{sys.version_info} is
457\code{(1, 6, 0, 'alpha', 2)}. \var{level} is a string such as
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000458\code{"alpha"}, \code{"beta"}, or \code{""} for a final release.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000459
460% ======================================================================
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000461\section{Extending/Embedding Changes}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000462
463Some of the changes are under the covers, and will only be apparent to
464people writing C extension modules, or embedding a Python interpreter
465in a larger application. If you aren't dealing with Python's C API,
Andrew M. Kuchling5b8311e2000-05-31 03:28:42 +0000466you can safely skip this section.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000467
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000468The version number of the Python C API was incremented, so C
469extensions compiled for 1.5.2 must be recompiled in order to work with
4701.6. On Windows, attempting to import a third party extension built
471for Python 1.5.x usually results in an immediate crash; there's not
472much we can do about this. (XXX can anyone tell me why it crashes?)
473
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000474Users of Jim Fulton's ExtensionClass module will be pleased to find
475out that hooks have been added so that ExtensionClasses are now
476supported by \function{isinstance()} and \function{issubclass()}.
477This means you no longer have to remember to write code such as
478\code{if type(obj) == myExtensionClass}, but can use the more natural
479\code{if isinstance(obj, myExtensionClass)}.
480
Andrew M. Kuchlingb853ea02000-06-03 03:06:58 +0000481The \file{Python/importdl.c} file, which was a mass of \#ifdefs to
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000482support dynamic loading on many different platforms, was cleaned up
483are reorganized by Greg Stein. \file{importdl.c} is now quite small,
484and platform-specific code has been moved into a bunch of
485\file{Python/dynload_*.c} files.
486
487Vladimir Marangozov's long-awaited malloc restructuring was completed,
488to make it easy to have the Python interpreter use a custom allocator
489instead of C's standard \function{malloc()}. For documentation, read
490the comments in \file{Include/mymalloc.h} and
491\file{Include/objimpl.h}. For the lengthy discussions during which
492the interface was hammered out, see the Web archives of the 'patches'
493and 'python-dev' lists at python.org.
494
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000495Recent versions of the GUSI development environment for MacOS support
496POSIX threads. Therefore, Python's POSIX threading support now works
497on the Macintosh. Threading support using the user-space GNU \texttt{pth}
498library was also contributed.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000499
500Threading support on Windows was enhanced, too. Windows supports
501thread locks that use kernel objects only in case of contention; in
502the common case when there's no contention, they use simpler functions
503which are an order of magnitude faster. A threaded version of Python
5041.5.2 on NT is twice as slow as an unthreaded version; with the 1.6
505changes, the difference is only 10\%. These improvements were
506contributed by Yakov Markovitch.
507
508% ======================================================================
509\section{Module changes}
510
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000511Lots of improvements and bugfixes were made to Python's extensive
512standard library; some of the affected modules include
513\module{readline}, \module{ConfigParser}, \module{cgi},
514\module{calendar}, \module{posix}, \module{readline}, \module{xmllib},
515\module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve},
516and \module{nntplib}. Consult the CVS logs for the exact
517patch-by-patch details.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000518
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000519Brian Gallew contributed OpenSSL support for the \module{socket}
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000520module. OpenSSL is an implementation of the Secure Socket Layer,
521which encrypts the data being sent over a socket. When compiling
522Python, you can edit \file{Modules/Setup} to include SSL support,
523which adds an additional function to the \module{socket} module:
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000524\function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})},
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000525which takes a socket object and returns an SSL socket. The
526\module{httplib} and \module{urllib} modules were also changed to
527support ``https://'' URLs, though no one has implemented FTP or SMTP
528over SSL.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000529
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000530The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or
5318.3, and support for the older 7.x versions has been dropped. The
532Tkinter module also supports displaying Unicode strings in Tk
533widgets.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000534
Andrew M. Kuchlingfa33a4e2000-06-03 02:52:40 +0000535The \module{curses} module has been greatly extended, starting from
536Oliver Andrich's enhanced version, to provide many additional
537functions from ncurses and SYSV curses, such as colour, alternative
538character set support, pads, and other new features. This means the
539module is no longer compatible with operating systems that only have
540BSD curses, but there don't seem to be any currently maintained OSes
541that fall into this category.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000542
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000543As mentioned in the earlier discussion of 1.6's Unicode support, the
544underlying implementation of the regular expressions provided by the
545\module{re} module has been changed. SRE, a new regular expression
546engine written by Fredrik Lundh and partially funded by Hewlett
547Packard, supports matching against both 8-bit strings and Unicode
548strings.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000549
550% ======================================================================
551\section{New modules}
552
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000553A number of new modules were added. We'll simply list them with brief
554descriptions; consult the 1.6 documentation for the details of a
555particular module.
556
557\begin{itemize}
558
559\item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support.
560
561\item{\module{filecmp}:} Supersedes the old \module{cmp} and
562\module{dircmp} modules, which have now become deprecated.
563(Contributed by Moshe Zadka.)
564
565\item{\module{linuxaudio}:} Support for the \file{/dev/audio} device on Linux,
566a twin to the existing \module{sunaudiodev} module.
567(Contributed by Peter Bosch.)
568
569\item{\module{mmap}:} An interface to memory-mapped files on both
570Windows and Unix. A file's contents can be mapped directly into
571memory, at which point it behaves like a mutable string, so its
572contents can be read and modified. They can even be passed to
573functions that expect ordinary strings, such as the \module{re}
574module. (Contributed by Sam Rushing, with some extensions by
575A.M. Kuchling.)
576
577\item{\module{PyExpat}:} An interface to the Expat XML parser.
578(Contributed by Paul Prescod.)
579
580\item{\module{robotparser}:} Parse a \file{robots.txt} file, which is
581used for writing Web spiders that politely avoid certain areas of a
582Web site. The parser accepts the contents of a \file{robots.txt} file
583builds a set of rules from it, and can then answer questions about
584the fetchability of a given URL. (Contributed by Skip Montanaro.)
585
586\item{\module{tabnanny}:} A module/script to
587checks Python source code for ambiguous indentation.
588(Contributed by Tim Peters.)
589
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000590\item{\module{UserString}:} A base class useful for deriving objects that behave like strings.
591
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000592\item{\module{winreg}:} An interface to the Windows registry.
593\module{winreg} has been part of PythonWin since 1995, but now has
594been added to the core distribution, and enhanced to support Unicode.
595(Contributed by Bill Tutt and Mark Hammond.)
596
597\item{\module{zipfile}:} A module for reading and writing ZIP-format
598archives. These are archives produced by \program{PKZIP} on
599DOS/Windows or \program{zip} on Unix, not to be confused with
600\program{gzip}-format files (which are supported by the \module{gzip}
601module)
602
603(Contributed by James C. Ahlstrom.)
604
605\end{itemize}
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000606
607% ======================================================================
608\section{IDLE Improvements}
609
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000610XXX IDLE -- complete overhaul. I don't use IDLE; can anyone tell me
611what the changes are?
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000612
613% ======================================================================
614\section{Deleted and Deprecated Modules}
615
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000616A few modules have been dropped because they're obsolete, or because
617there are now better ways to do the same thing. The \module{stdwin}
618module is gone; it was for a platform-independent windowing toolkit
619that's no longer developed.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000620
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000621A number of modules have been moved to the
622\file{lib-old} subdirectory:
623\module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump},
624\module{find}, \module{grep}, \module{packmail},
625\module{poly}, \module{util}, \module{whatsound}, \module{zmod}.
626If you have code which relies on a module that's been moved to
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000627\file{lib-old}, you can simply add that directory to \code{sys.path}
Andrew M. Kuchlinga5bbb002000-06-10 02:41:46 +0000628to get them back, but you're encouraged to update any code that uses
629these modules.
Andrew M. Kuchling6c3cd8d2000-06-10 02:24:31 +0000630
631XXX any others deleted?
632
633XXX Other candidates for deletion in 1.6: sgimodule.c, glmodule.c (and hence
634cgenmodule.c), imgfile.c, svmodule.c, flmodule.c, fmmodule.c, almodule.c, clmodule.c,
635 knee.py.
Andrew M. Kuchling25bfd0e2000-05-27 11:28:26 +0000636
637\end{document}
638