Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 1 | \documentclass{howto} |
| 2 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 3 | \title{What's New in Python 2.0} |
| 4 | \release{0.04} |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 5 | \author{A.M. Kuchling and Moshe Zadka} |
| 6 | \authoraddress{\email{amk1@bigfoot.com}, \email{moshez@math.huji.ac.il} } |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 7 | \begin{document} |
| 8 | \maketitle\tableofcontents |
| 9 | |
| 10 | \section{Introduction} |
| 11 | |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 12 | {\large This is a draft document; please report inaccuracies and |
| 13 | omissions to the authors. This document should not be treated as |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 14 | definitive; features described here might be removed or changed during |
| 15 | the beta cycle before the final release of Python 2.0. |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 16 | } |
| 17 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 18 | A new release of Python, version 2.0, will be released some time this |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 19 | summer. Alpha versions are already available from |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 20 | \url{http://www.python.org/2.0/}. This article covers the exciting |
| 21 | new features in 2.0, highlights some other useful changes, and points |
| 22 | out a few incompatible changes that may require rewriting code. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 23 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 24 | Python's development never completely stops between releases, and a |
| 25 | steady flow of bug fixes and improvements are always being submitted. |
| 26 | A host of minor fixes, a few optimizations, additional docstrings, and |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 27 | better error messages went into 2.0; to list them all would be |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 28 | impossible, but they're certainly significant. Consult the |
| 29 | publicly-available CVS logs if you want to see the full list. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 30 | |
| 31 | % ====================================================================== |
| 32 | \section{Unicode} |
| 33 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 34 | The largest new feature in Python 2.0 is a new fundamental data type: |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 35 | Unicode strings. Unicode uses 16-bit numbers to represent characters |
| 36 | instead of the 8-bit number used by ASCII, meaning that 65,536 |
| 37 | distinct characters can be supported. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 38 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 39 | The final interface for Unicode support was arrived at through |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 40 | countless often-stormy discussions on the python-dev mailing list, and |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 41 | mostly implemented by Marc-Andr\'e Lemburg, based on a Unicode string |
| 42 | type implementation by Fredrik Lundh. A detailed explanation of the |
| 43 | interface is in the file \file{Misc/unicode.txt} in the Python source |
| 44 | distribution; it's also available on the Web at |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 45 | \url{http://starship.python.net/crew/lemburg/unicode-proposal.txt}. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 46 | This article will simply cover the most significant points from the |
| 47 | full interface. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 48 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 49 | In Python source code, Unicode strings are written as |
| 50 | \code{u"string"}. Arbitrary Unicode characters can be written using a |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 51 | new escape sequence, \code{\e u\var{HHHH}}, where \var{HHHH} is a |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 52 | 4-digit hexadecimal number from 0000 to FFFF. The existing |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 53 | \code{\e x\var{HHHH}} escape sequence can also be used, and octal |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 54 | escapes can be used for characters up to U+01FF, which is represented |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 55 | by \code{\e 777}. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 56 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 57 | Unicode strings, just like regular strings, are an immutable sequence |
Andrew M. Kuchling | 662d76e | 2000-06-25 14:32:48 +0000 | [diff] [blame] | 58 | type. They can be indexed and sliced, but not modified in place. |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 59 | Unicode strings have an \method{encode( \optional{encoding} )} method |
Andrew M. Kuchling | 662d76e | 2000-06-25 14:32:48 +0000 | [diff] [blame] | 60 | that returns an 8-bit string in the desired encoding. Encodings are |
| 61 | named by strings, such as \code{'ascii'}, \code{'utf-8'}, |
| 62 | \code{'iso-8859-1'}, or whatever. A codec API is defined for |
| 63 | implementing and registering new encodings that are then available |
| 64 | throughout a Python program. If an encoding isn't specified, the |
| 65 | default encoding is usually 7-bit ASCII, though it can be changed for |
| 66 | your Python installation by calling the |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 67 | \function{sys.setdefaultencoding(\var{encoding})} function in a |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 68 | customised version of \file{site.py}. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 69 | |
| 70 | Combining 8-bit and Unicode strings always coerces to Unicode, using |
| 71 | the default ASCII encoding; the result of \code{'a' + u'bc'} is |
Andrew M. Kuchling | 7f6270d | 2000-06-09 02:48:18 +0000 | [diff] [blame] | 72 | \code{u'abc'}. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 73 | |
| 74 | New built-in functions have been added, and existing built-ins |
| 75 | modified to support Unicode: |
| 76 | |
| 77 | \begin{itemize} |
| 78 | \item \code{unichr(\var{ch})} returns a Unicode string 1 character |
| 79 | long, containing the character \var{ch}. |
| 80 | |
| 81 | \item \code{ord(\var{u})}, where \var{u} is a 1-character regular or Unicode string, returns the number of the character as an integer. |
| 82 | |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 83 | \item \code{unicode(\var{string}, \optional{\var{encoding},} |
| 84 | \optional{\var{errors}} ) } creates a Unicode string from an 8-bit |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 85 | string. \code{encoding} is a string naming the encoding to use. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 86 | The \code{errors} parameter specifies the treatment of characters that |
| 87 | are invalid for the current encoding; passing \code{'strict'} as the |
| 88 | value causes an exception to be raised on any encoding error, while |
| 89 | \code{'ignore'} causes errors to be silently ignored and |
| 90 | \code{'replace'} uses U+FFFD, the official replacement character, in |
| 91 | case of any problems. |
| 92 | |
| 93 | \end{itemize} |
| 94 | |
| 95 | A new module, \module{unicodedata}, provides an interface to Unicode |
| 96 | character properties. For example, \code{unicodedata.category(u'A')} |
| 97 | returns the 2-character string 'Lu', the 'L' denoting it's a letter, |
| 98 | and 'u' meaning that it's uppercase. |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 99 | \code{u.bidirectional(u'\e x0660')} returns 'AN', meaning that U+0660 is |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 100 | an Arabic number. |
| 101 | |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 102 | The \module{codecs} module contains functions to look up existing encodings |
| 103 | and register new ones. Unless you want to implement a |
| 104 | new encoding, you'll most often use the |
| 105 | \function{codecs.lookup(\var{encoding})} function, which returns a |
| 106 | 4-element tuple: \code{(\var{encode_func}, |
| 107 | \var{decode_func}, \var{stream_reader}, \var{stream_writer})}. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 108 | |
| 109 | \begin{itemize} |
| 110 | \item \var{encode_func} is a function that takes a Unicode string, and |
| 111 | returns a 2-tuple \code{(\var{string}, \var{length})}. \var{string} |
| 112 | is an 8-bit string containing a portion (perhaps all) of the Unicode |
| 113 | string converted into the given encoding, and \var{length} tells you how much of the Unicode string was converted. |
| 114 | |
| 115 | \item \var{decode_func} is the mirror of \var{encode_func}, |
| 116 | taking a Unicode string and |
| 117 | returns a 2-tuple \code{(\var{ustring}, \var{length})} containing a Unicode string |
| 118 | and \var{length} telling you how much of the string was consumed. |
| 119 | |
| 120 | \item \var{stream_reader} is a class that supports decoding input from |
| 121 | a stream. \var{stream_reader(\var{file_obj})} returns an object that |
| 122 | supports the \method{read()}, \method{readline()}, and |
| 123 | \method{readlines()} methods. These methods will all translate from |
| 124 | the given encoding and return Unicode strings. |
| 125 | |
| 126 | \item \var{stream_writer}, similarly, is a class that supports |
| 127 | encoding output to a stream. \var{stream_writer(\var{file_obj})} |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 128 | returns an object that supports the \method{write()} and |
| 129 | \method{writelines()} methods. These methods expect Unicode strings, |
| 130 | translating them to the given encoding on output. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 131 | \end{itemize} |
| 132 | |
| 133 | For example, the following code writes a Unicode string into a file, |
| 134 | encoding it as UTF-8: |
| 135 | |
| 136 | \begin{verbatim} |
| 137 | import codecs |
| 138 | |
| 139 | unistr = u'\u0660\u2000ab ...' |
| 140 | |
| 141 | (UTF8_encode, UTF8_decode, |
| 142 | UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8') |
| 143 | |
| 144 | output = UTF8_streamwriter( open( '/tmp/output', 'wb') ) |
| 145 | output.write( unistr ) |
| 146 | output.close() |
| 147 | \end{verbatim} |
| 148 | |
| 149 | The following code would then read UTF-8 input from the file: |
| 150 | |
| 151 | \begin{verbatim} |
| 152 | input = UTF8_streamread( open( '/tmp/output', 'rb') ) |
| 153 | print repr(input.read()) |
| 154 | input.close() |
| 155 | \end{verbatim} |
| 156 | |
| 157 | Unicode-aware regular expressions are available through the |
| 158 | \module{re} module, which has a new underlying implementation called |
| 159 | SRE written by Fredrik Lundh of Secret Labs AB. |
| 160 | |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 161 | A \code{-U} command line option was added which causes the Python |
| 162 | compiler to interpret all string literals as Unicode string literals. |
| 163 | This is intended to be used in testing and future-proofing your Python |
| 164 | code, since some future version of Python may drop support for 8-bit |
| 165 | strings and provide only Unicode strings. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 166 | |
| 167 | % ====================================================================== |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 168 | \section{Distutils: Making Modules Easy to Install} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 169 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 170 | Before Python 2.0, installing modules was a tedious affair -- there |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 171 | was no way to figure out automatically where Python is installed, or |
| 172 | what compiler options to use for extension modules. Software authors |
| 173 | had to go through an ardous ritual of editing Makefiles and |
| 174 | configuration files, which only really work on Unix and leave Windows |
| 175 | and MacOS unsupported. Software users faced wildly differing |
| 176 | installation instructions |
| 177 | |
| 178 | The SIG for distribution utilities, shepherded by Greg Ward, has |
| 179 | created the Distutils, a system to make package installation much |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 180 | easier. They form the \module{distutils} package, a new part of |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 181 | Python's standard library. In the best case, installing a Python |
| 182 | module from source will require the same steps: first you simply mean |
| 183 | unpack the tarball or zip archive, and the run ``\code{python setup.py |
| 184 | install}''. The platform will be automatically detected, the compiler |
| 185 | will be recognized, C extension modules will be compiled, and the |
| 186 | distribution installed into the proper directory. Optional |
| 187 | command-line arguments provide more control over the installation |
| 188 | process, the distutils package offers many places to override defaults |
| 189 | -- separating the build from the install, building or installing in |
| 190 | non-default directories, and more. |
| 191 | |
| 192 | In order to use the Distutils, you need to write a \file{setup.py} |
| 193 | script. For the simple case, when the software contains only .py |
| 194 | files, a minimal \file{setup.py} can be just a few lines long: |
| 195 | |
| 196 | \begin{verbatim} |
| 197 | from distutils.core import setup |
| 198 | setup (name = "foo", version = "1.0", |
| 199 | py_modules = ["module1", "module2"]) |
| 200 | \end{verbatim} |
| 201 | |
| 202 | The \file{setup.py} file isn't much more complicated if the software |
| 203 | consists of a few packages: |
| 204 | |
| 205 | \begin{verbatim} |
| 206 | from distutils.core import setup |
| 207 | setup (name = "foo", version = "1.0", |
| 208 | packages = ["package", "package.subpackage"]) |
| 209 | \end{verbatim} |
| 210 | |
| 211 | A C extension can be the most complicated case; here's an example taken from |
| 212 | the PyXML package: |
| 213 | |
| 214 | |
| 215 | \begin{verbatim} |
| 216 | from distutils.core import setup, Extension |
| 217 | |
| 218 | expat_extension = Extension('xml.parsers.pyexpat', |
| 219 | define_macros = [('XML_NS', None)], |
| 220 | include_dirs = [ 'extensions/expat/xmltok', |
| 221 | 'extensions/expat/xmlparse' ], |
| 222 | sources = [ 'extensions/pyexpat.c', |
| 223 | 'extensions/expat/xmltok/xmltok.c', |
| 224 | 'extensions/expat/xmltok/xmlrole.c', |
| 225 | ] |
| 226 | ) |
| 227 | setup (name = "PyXML", version = "0.5.4", |
| 228 | ext_modules =[ expat_extension ] ) |
| 229 | |
| 230 | \end{verbatim} |
| 231 | |
| 232 | The Distutils can also take care of creating source and binary |
| 233 | distributions. The ``sdist'' command, run by ``\code{python setup.py |
| 234 | sdist}', builds a source distribution such as \file{foo-1.0.tar.gz}. |
| 235 | Adding new commands isn't difficult, and a ``bdist_rpm'' command has |
| 236 | already been contributed to create an RPM distribution for the |
| 237 | software. Commands to create Windows installer programs, Debian |
| 238 | packages, and Solaris .pkg files have been discussed and are in |
| 239 | various stages of development. |
| 240 | |
| 241 | All this is documented in a new manual, \textit{Distributing Python |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 242 | Modules}, that joins the basic set of Python documentation. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 243 | |
| 244 | % ====================================================================== |
| 245 | \section{String Methods} |
| 246 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 247 | Until now string-manipulation functionality was in the \module{string} |
| 248 | Python module, which was usually a front-end for the \module{strop} |
| 249 | module written in C. The addition of Unicode posed a difficulty for |
| 250 | the \module{strop} module, because the functions would all need to be |
| 251 | rewritten in order to accept either 8-bit or Unicode strings. For |
| 252 | functions such as \function{string.replace()}, which takes 3 string |
| 253 | arguments, that means eight possible permutations, and correspondingly |
| 254 | complicated code. |
| 255 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 256 | Instead, Python 2.0 pushes the problem onto the string type, making |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 257 | string manipulation functionality available through methods on both |
| 258 | 8-bit strings and Unicode strings. |
| 259 | |
| 260 | \begin{verbatim} |
| 261 | >>> 'andrew'.capitalize() |
| 262 | 'Andrew' |
| 263 | >>> 'hostname'.replace('os', 'linux') |
| 264 | 'hlinuxtname' |
| 265 | >>> 'moshe'.find('sh') |
| 266 | 2 |
| 267 | \end{verbatim} |
| 268 | |
| 269 | One thing that hasn't changed, April Fools' jokes notwithstanding, is |
| 270 | that Python strings are immutable. Thus, the string methods return new |
| 271 | strings, and do not modify the string on which they operate. |
| 272 | |
| 273 | The old \module{string} module is still around for backwards |
| 274 | compatibility, but it mostly acts as a front-end to the new string |
| 275 | methods. |
| 276 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 277 | Two methods which have no parallel in pre-2.0 versions, although they |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 278 | did exist in JPython for quite some time, are \method{startswith()} |
| 279 | and \method{endswith}. \code{s.startswith(t)} is equivalent to \code{s[:len(t)] |
| 280 | == t}, while \code{s.endswith(t)} is equivalent to \code{s[-len(t):] == t}. |
| 281 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 282 | %One other method which deserves special mention is \method{join}. The |
| 283 | %\method{join} method of a string receives one parameter, a sequence of |
| 284 | %strings, and is equivalent to the \function{string.join} function from |
| 285 | %the old \module{string} module, with the arguments reversed. In other |
| 286 | %words, \code{s.join(seq)} is equivalent to the old |
| 287 | %\code{string.join(seq, s)}. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 288 | |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 289 | % ====================================================================== |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 290 | \section{Porting to 2.0} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 291 | |
| 292 | New Python releases try hard to be compatible with previous releases, |
| 293 | and the record has been pretty good. However, some changes are |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 294 | considered useful enough, often fixing initial design decisions that |
| 295 | turned to be actively mistaken, that breaking backward compatibility |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 296 | can't always be avoided. This section lists the changes in Python 2.0 |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 297 | that may cause old Python code to break. |
| 298 | |
| 299 | The change which will probably break the most code is tightening up |
| 300 | the arguments accepted by some methods. Some methods would take |
| 301 | multiple arguments and treat them as a tuple, particularly various |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 302 | list methods such as \method{.append()} and \method{.insert()}. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 303 | In earlier versions of Python, if \code{L} is a list, \code{L.append( |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 304 | 1,2 )} appends the tuple \code{(1,2)} to the list. In Python 2.0 this |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 305 | causes a \exception{TypeError} exception to be raised, with the |
| 306 | message: 'append requires exactly 1 argument; 2 given'. The fix is to |
| 307 | simply add an extra set of parentheses to pass both values as a tuple: |
| 308 | \code{L.append( (1,2) )}. |
| 309 | |
| 310 | The earlier versions of these methods were more forgiving because they |
| 311 | used an old function in Python's C interface to parse their arguments; |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 312 | 2.0 modernizes them to use \function{PyArg_ParseTuple}, the current |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 313 | argument parsing function, which provides more helpful error messages |
| 314 | and treats multi-argument calls as errors. If you absolutely must use |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 315 | 2.0 but can't fix your code, you can edit \file{Objects/listobject.c} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 316 | and define the preprocessor symbol \code{NO_STRICT_LIST_APPEND} to |
| 317 | preserve the old behaviour; this isn't recommended. |
| 318 | |
| 319 | Some of the functions in the \module{socket} module are still |
| 320 | forgiving in this way. For example, \function{socket.connect( |
| 321 | ('hostname', 25) )} is the correct form, passing a tuple representing |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 322 | an IP address, but \function{socket.connect( 'hostname', 25 )} also |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 323 | works. \function{socket.connect_ex()} and \function{socket.bind()} are |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 324 | similarly easy-going. 2.0alpha1 tightened these functions up, but |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 325 | because the documentation actually used the erroneous multiple |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 326 | argument form, many people wrote code which would break with the |
| 327 | stricter checking. GvR backed out the changes in the face of public |
| 328 | reaction, so for the\module{socket} module, the documentation was |
| 329 | fixed and the multiple argument form is simply marked as deprecated; |
| 330 | it \emph{will} be tightened up again in a future Python version. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 331 | |
| 332 | Some work has been done to make integers and long integers a bit more |
| 333 | interchangeable. In 1.5.2, large-file support was added for Solaris, |
| 334 | to allow reading files larger than 2Gb; this made the \method{tell()} |
| 335 | method of file objects return a long integer instead of a regular |
| 336 | integer. Some code would subtract two file offsets and attempt to use |
| 337 | the result to multiply a sequence or slice a string, but this raised a |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 338 | \exception{TypeError}. In 2.0, long integers can be used to multiply |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 339 | or slice a sequence, and it'll behave as you'd intuitively expect it |
| 340 | to; \code{3L * 'abc'} produces 'abcabcabc', and \code{ |
| 341 | (0,1,2,3)[2L:4L]} produces (2,3). Long integers can also be used in |
| 342 | various new places where previously only integers were accepted, such |
| 343 | as in the \method{seek()} method of file objects. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 344 | |
| 345 | The subtlest long integer change of all is that the \function{str()} |
| 346 | of a long integer no longer has a trailing 'L' character, though |
| 347 | \function{repr()} still includes it. The 'L' annoyed many people who |
| 348 | wanted to print long integers that looked just like regular integers, |
| 349 | since they had to go out of their way to chop off the character. This |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 350 | is no longer a problem in 2.0, but code which assumes the 'L' is |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 351 | there, and does \code{str(longval)[:-1]} will now lose the final |
| 352 | digit. |
| 353 | |
| 354 | Taking the \function{repr()} of a float now uses a different |
| 355 | formatting precision than \function{str()}. \function{repr()} uses |
Andrew M. Kuchling | 662d76e | 2000-06-25 14:32:48 +0000 | [diff] [blame] | 356 | \code{\%.17g} format string for C's \function{sprintf()}, while |
| 357 | \function{str()} uses \code{\%.12g} as before. The effect is that |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 358 | \function{repr()} may occasionally show more decimal places than |
| 359 | \function{str()}, for numbers |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 360 | For example, the number 8.1 can't be represented exactly in binary, so |
| 361 | \code{repr(8.1)} is \code{'8.0999999999999996'}, while str(8.1) is |
| 362 | \code{'8.1'}. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 363 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 364 | The \code{-X} command-line option, which turned all standard |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 365 | exceptions into strings instead of classes, has been removed; the |
| 366 | standard exceptions will now always be classes. The |
| 367 | \module{exceptions} module containing the standard exceptions was |
| 368 | translated from Python to a built-in C module, written by Barry Warsaw |
| 369 | and Fredrik Lundh. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 370 | |
| 371 | % ====================================================================== |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 372 | \section{Optional Collection of Cycles} |
| 373 | |
| 374 | The C implementation of Python uses reference counting to implement |
| 375 | garbage collection. Every Python object maintains a count of the |
| 376 | number of references pointing to itself, and adjusts the count as |
| 377 | references are created or destroyed. Once the reference count reaches |
| 378 | zero, the object is no longer accessible, since you need to have a |
| 379 | reference to an object to access it, and if the count is zero, no |
| 380 | references exist any longer. |
| 381 | |
| 382 | Reference counting has some pleasant properties: it's easy to |
| 383 | understand and implement, and the resulting implementation is |
| 384 | portable, fairly fast, and reacts well with other libraries that |
| 385 | implement their own memory handling schemes. The major problem with |
| 386 | reference counting is that it sometimes doesn't realise that objects |
| 387 | are no longer accessible, resulting in a memory leak. This happens |
| 388 | when there are cycles of references. |
| 389 | |
| 390 | Consider the simplest possible cycle, |
| 391 | a class instance which has a reference to itself: |
| 392 | |
| 393 | \begin{verbatim} |
| 394 | instance = SomeClass() |
| 395 | instance.myself = instance |
| 396 | \end{verbatim} |
| 397 | |
| 398 | After the above two lines of code have been executed, the reference |
| 399 | count of \code{instance} is 2; one reference is from the variable |
| 400 | named \samp{'instance'}, and the other is from the \samp{myself} |
| 401 | attribute of the instance. |
| 402 | |
| 403 | If the next line of code is \code{del instance}, what happens? The |
| 404 | reference count of \code{instance} is decreased by 1, so it has a |
| 405 | reference count of 1; the reference in the \samp{myself} attribute |
| 406 | still exists. Yet the instance is no longer accessible through Python |
| 407 | code, and it could be deleted. Several objects can participate in a |
| 408 | cycle if they have references to each other, causing all of the |
| 409 | objects to be leaked. |
| 410 | |
| 411 | An experimental step has been made toward fixing this problem. When |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 412 | compiling Python, the \verb|--with-cycle-gc| option can be specified. |
| 413 | This causes a cycle detection algorithm to be periodically executed, |
| 414 | which looks for inaccessible cycles and deletes the objects involved. |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 415 | A new \module{gc} module provides functions to perform a garbage |
| 416 | collection, obtain debugging statistics, and tuning the collector's parameters. |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 417 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 418 | Why isn't cycle detection enabled by default? Running the cycle detection |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 419 | algorithm takes some time, and some tuning will be required to |
| 420 | minimize the overhead cost. It's not yet obvious how much performance |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 421 | is lost, because benchmarking this is tricky and depends crucially |
| 422 | on how often the program creates and destroys objects. |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 423 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 424 | Several people tackled this problem and contributed to a solution. An |
| 425 | early implementation of the cycle detection approach was written by |
| 426 | Toby Kelsey. The current algorithm was suggested by Eric Tiedemann |
| 427 | during a visit to CNRI, and Guido van Rossum and Neil Schemenauer |
| 428 | wrote two different implementations, which were later integrated by |
| 429 | Neil. Lots of other people offered suggestions along the way; the |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 430 | March 2000 archives of the python-dev mailing list contain most of the |
| 431 | relevant discussion, especially in the threads titled ``Reference |
| 432 | cycle collection for Python'' and ``Finalization again''. |
| 433 | |
| 434 | |
| 435 | % ====================================================================== |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 436 | \section{Core Changes} |
| 437 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 438 | Various minor changes have been made to Python's syntax and built-in |
| 439 | functions. None of the changes are very far-reaching, but they're |
| 440 | handy conveniences. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 441 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 442 | A change to syntax makes it more convenient to call a given function |
| 443 | with a tuple of arguments and/or a dictionary of keyword arguments. |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 444 | In Python 1.5 and earlier, you do this with the \function{apply()} |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 445 | built-in function: \code{apply(f, \var{args}, \var{kw})} calls the |
| 446 | function \function{f()} with the argument tuple \var{args} and the |
| 447 | keyword arguments in the dictionary \var{kw}. Thanks to a patch from |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 448 | Greg Ewing, 2.0 adds \code{f(*\var{args}, **\var{kw})} as a shorter |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 449 | and clearer way to achieve the same effect. This syntax is |
| 450 | symmetrical with the syntax for defining functions: |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 451 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 452 | \begin{verbatim} |
| 453 | def f(*args, **kw): |
| 454 | # args is a tuple of positional args, |
| 455 | # kw is a dictionary of keyword args |
| 456 | ... |
| 457 | \end{verbatim} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 458 | |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 459 | A new format style is available when using the \code{\%} operator. |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 460 | '\%r' will insert the \function{repr()} of its argument. This was |
| 461 | also added from symmetry considerations, this time for symmetry with |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 462 | the existing '\%s' format style, which inserts the \function{str()} of |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 463 | its argument. For example, \code{'\%r \%s' \% ('abc', 'abc')} returns a |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 464 | string containing \verb|'abc' abc|. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 465 | |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 466 | The \function{int()} and \function{long()} functions now accept an |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 467 | optional ``base'' parameter when the first argument is a string. |
| 468 | \code{int('123', 10)} returns 123, while \code{int('123', 16)} returns |
| 469 | 291. \code{int(123, 16)} raises a \exception{TypeError} exception |
| 470 | with the message ``can't convert non-string with explicit base''. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 471 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 472 | Previously there was no way to implement a class that overrode |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 473 | Python's built-in \keyword{in} operator and implemented a custom |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 474 | version. \code{\var{obj} in \var{seq}} returns true if \var{obj} is |
| 475 | present in the sequence \var{seq}; Python computes this by simply |
| 476 | trying every index of the sequence until either \var{obj} is found or |
| 477 | an \exception{IndexError} is encountered. Moshe Zadka contributed a |
| 478 | patch which adds a \method{__contains__} magic method for providing a |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 479 | custom implementation for \keyword{in}. Additionally, new built-in |
| 480 | objects written in C can define what \keyword{in} means for them via a |
| 481 | new slot in the sequence protocol. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 482 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 483 | Earlier versions of Python used a recursive algorithm for deleting |
| 484 | objects. Deeply nested data structures could cause the interpreter to |
| 485 | fill up the C stack and crash; Christian Tismer rewrote the deletion |
| 486 | logic to fix this problem. On a related note, comparing recursive |
| 487 | objects recursed infinitely and crashed; Jeremy Hylton rewrote the |
| 488 | code to no longer crash, producing a useful result instead. For |
| 489 | example, after this code: |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 490 | |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 491 | \begin{verbatim} |
| 492 | a = [] |
| 493 | b = [] |
| 494 | a.append(a) |
| 495 | b.append(b) |
| 496 | \end{verbatim} |
| 497 | |
| 498 | The comparison \code{a==b} returns true, because the two recursive |
| 499 | data structures are isomorphic. |
| 500 | \footnote{See the thread ``trashcan and PR\#7'' in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links. |
| 501 | %http://www.python.org/pipermail/python-dev/2000-April/004834.html |
| 502 | } |
| 503 | |
| 504 | Work has been done on porting Python to 64-bit Windows on the Itanium |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 505 | processor, mostly by Trent Mick of ActiveState. (Confusingly, \code{sys.platform} is still \code{'win32'} on |
| 506 | Win64 because it seems that for ease of porting, MS Visual C++ treats code |
| 507 | as 32 bit. |
| 508 | ) PythonWin also supports Windows CE; see the Python CE page at |
Andrew M. Kuchling | 662d76e | 2000-06-25 14:32:48 +0000 | [diff] [blame] | 509 | \url{http://starship.python.net/crew/mhammond/ce/} for more information. |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 510 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 511 | An attempt has been made to alleviate one of Python's warts, the |
| 512 | often-confusing \exception{NameError} exception when code refers to a |
| 513 | local variable before the variable has been assigned a value. For |
| 514 | example, the following code raises an exception on the \keyword{print} |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 515 | statement in both 1.5.2 and 2.0; in 1.5.2 a \exception{NameError} |
| 516 | exception is raised, while 2.0 raises a new |
Andrew M. Kuchling | 662d76e | 2000-06-25 14:32:48 +0000 | [diff] [blame] | 517 | \exception{UnboundLocalError} exception. |
| 518 | \exception{UnboundLocalError} is a subclass of \exception{NameError}, |
| 519 | so any existing code that expects \exception{NameError} to be raised |
| 520 | should still work. |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 521 | |
| 522 | \begin{verbatim} |
| 523 | def f(): |
| 524 | print "i=",i |
| 525 | i = i + 1 |
| 526 | f() |
| 527 | \end{verbatim} |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 528 | |
| 529 | A new variable holding more detailed version information has been |
| 530 | added to the \module{sys} module. \code{sys.version_info} is a tuple |
| 531 | \code{(\var{major}, \var{minor}, \var{micro}, \var{level}, |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 532 | \var{serial})} For example, in 2.0a2 \code{sys.version_info} is |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 533 | \code{(1, 6, 0, 'alpha', 2)}. \var{level} is a string such as |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 534 | \code{"alpha"}, \code{"beta"}, or \code{""} for a final release. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 535 | |
| 536 | % ====================================================================== |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 537 | \section{Extending/Embedding Changes} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 538 | |
| 539 | Some of the changes are under the covers, and will only be apparent to |
| 540 | people writing C extension modules, or embedding a Python interpreter |
| 541 | in a larger application. If you aren't dealing with Python's C API, |
Andrew M. Kuchling | 5b8311e | 2000-05-31 03:28:42 +0000 | [diff] [blame] | 542 | you can safely skip this section. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 543 | |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 544 | The version number of the Python C API was incremented, so C |
| 545 | extensions compiled for 1.5.2 must be recompiled in order to work with |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 546 | 2.0. On Windows, attempting to import a third party extension built |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 547 | for Python 1.5.x usually results in an immediate crash; there's not |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 548 | much we can do about this. (Here's Mark Hammond's explanation of the |
| 549 | reasons for the crash. The 1.5 module is linked against |
| 550 | \file{Python15.dll}. When \file{Python.exe} , linked against |
| 551 | \file{Python16.dll}, starts up, it initializes the Python data |
| 552 | structures in \file{Python16.dll}. When Python then imports the |
| 553 | module \file{foo.pyd} linked against \file{Python15.dll}, it |
| 554 | immediately tries to call the functions in that DLL. As Python has |
| 555 | not been initialized in that DLL, the program immediately crashes.) |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 556 | |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 557 | Users of Jim Fulton's ExtensionClass module will be pleased to find |
| 558 | out that hooks have been added so that ExtensionClasses are now |
| 559 | supported by \function{isinstance()} and \function{issubclass()}. |
| 560 | This means you no longer have to remember to write code such as |
| 561 | \code{if type(obj) == myExtensionClass}, but can use the more natural |
| 562 | \code{if isinstance(obj, myExtensionClass)}. |
| 563 | |
Andrew M. Kuchling | b853ea0 | 2000-06-03 03:06:58 +0000 | [diff] [blame] | 564 | The \file{Python/importdl.c} file, which was a mass of \#ifdefs to |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 565 | support dynamic loading on many different platforms, was cleaned up |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 566 | and reorganised by Greg Stein. \file{importdl.c} is now quite small, |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 567 | and platform-specific code has been moved into a bunch of |
| 568 | \file{Python/dynload_*.c} files. |
| 569 | |
| 570 | Vladimir Marangozov's long-awaited malloc restructuring was completed, |
| 571 | to make it easy to have the Python interpreter use a custom allocator |
| 572 | instead of C's standard \function{malloc()}. For documentation, read |
| 573 | the comments in \file{Include/mymalloc.h} and |
| 574 | \file{Include/objimpl.h}. For the lengthy discussions during which |
| 575 | the interface was hammered out, see the Web archives of the 'patches' |
| 576 | and 'python-dev' lists at python.org. |
| 577 | |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 578 | Recent versions of the GUSI development environment for MacOS support |
| 579 | POSIX threads. Therefore, Python's POSIX threading support now works |
| 580 | on the Macintosh. Threading support using the user-space GNU \texttt{pth} |
| 581 | library was also contributed. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 582 | |
| 583 | Threading support on Windows was enhanced, too. Windows supports |
| 584 | thread locks that use kernel objects only in case of contention; in |
| 585 | the common case when there's no contention, they use simpler functions |
| 586 | which are an order of magnitude faster. A threaded version of Python |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 587 | 1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0 |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 588 | changes, the difference is only 10\%. These improvements were |
| 589 | contributed by Yakov Markovitch. |
| 590 | |
| 591 | % ====================================================================== |
| 592 | \section{Module changes} |
| 593 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 594 | Lots of improvements and bugfixes were made to Python's extensive |
| 595 | standard library; some of the affected modules include |
| 596 | \module{readline}, \module{ConfigParser}, \module{cgi}, |
| 597 | \module{calendar}, \module{posix}, \module{readline}, \module{xmllib}, |
| 598 | \module{aifc}, \module{chunk, wave}, \module{random}, \module{shelve}, |
| 599 | and \module{nntplib}. Consult the CVS logs for the exact |
| 600 | patch-by-patch details. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 601 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 602 | Brian Gallew contributed OpenSSL support for the \module{socket} |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 603 | module. OpenSSL is an implementation of the Secure Socket Layer, |
| 604 | which encrypts the data being sent over a socket. When compiling |
| 605 | Python, you can edit \file{Modules/Setup} to include SSL support, |
| 606 | which adds an additional function to the \module{socket} module: |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 607 | \function{socket.ssl(\var{socket}, \var{keyfile}, \var{certfile})}, |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 608 | which takes a socket object and returns an SSL socket. The |
| 609 | \module{httplib} and \module{urllib} modules were also changed to |
| 610 | support ``https://'' URLs, though no one has implemented FTP or SMTP |
| 611 | over SSL. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 612 | |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 613 | The \module{httplib} module has been rewritten by Greg Stein to |
| 614 | support HTTP/1.1. Backward compatibility with the 1.5 version of |
| 615 | \module{httplib} is provided, though using HTTP/1.1 features such as |
| 616 | pipelining will require rewriting code to use a different set of |
| 617 | interfaces. |
| 618 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 619 | The \module{Tkinter} module now supports Tcl/Tk version 8.1, 8.2, or |
| 620 | 8.3, and support for the older 7.x versions has been dropped. The |
| 621 | Tkinter module also supports displaying Unicode strings in Tk |
| 622 | widgets. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 623 | |
Andrew M. Kuchling | fa33a4e | 2000-06-03 02:52:40 +0000 | [diff] [blame] | 624 | The \module{curses} module has been greatly extended, starting from |
| 625 | Oliver Andrich's enhanced version, to provide many additional |
| 626 | functions from ncurses and SYSV curses, such as colour, alternative |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 627 | character set support, pads, and mouse support. This means the module |
| 628 | is no longer compatible with operating systems that only have BSD |
| 629 | curses, but there don't seem to be any currently maintained OSes that |
| 630 | fall into this category. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 631 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 632 | As mentioned in the earlier discussion of 2.0's Unicode support, the |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 633 | underlying implementation of the regular expressions provided by the |
| 634 | \module{re} module has been changed. SRE, a new regular expression |
| 635 | engine written by Fredrik Lundh and partially funded by Hewlett |
| 636 | Packard, supports matching against both 8-bit strings and Unicode |
| 637 | strings. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 638 | |
| 639 | % ====================================================================== |
| 640 | \section{New modules} |
| 641 | |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 642 | A number of new modules were added. We'll simply list them with brief |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 643 | descriptions; consult the 2.0 documentation for the details of a |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 644 | particular module. |
| 645 | |
| 646 | \begin{itemize} |
| 647 | |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 648 | \item{\module{atexit}}: |
| 649 | For registering functions to be called before the Python interpreter exits. |
| 650 | Code that currently sets |
| 651 | \code{sys.exitfunc} directly should be changed to |
| 652 | use the \module{atexit} module instead, importing \module{atexit} |
| 653 | and calling \function{atexit.register()} with |
| 654 | the function to be called on exit. |
| 655 | (Contributed by Skip Montanaro.) |
| 656 | |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 657 | \item{\module{codecs}, \module{encodings}, \module{unicodedata}:} Added as part of the new Unicode support. |
| 658 | |
| 659 | \item{\module{filecmp}:} Supersedes the old \module{cmp} and |
| 660 | \module{dircmp} modules, which have now become deprecated. |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 661 | (Contributed by Gordon MacMillan and Moshe Zadka.) |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 662 | |
| 663 | \item{\module{linuxaudio}:} Support for the \file{/dev/audio} device on Linux, |
| 664 | a twin to the existing \module{sunaudiodev} module. |
| 665 | (Contributed by Peter Bosch.) |
| 666 | |
| 667 | \item{\module{mmap}:} An interface to memory-mapped files on both |
| 668 | Windows and Unix. A file's contents can be mapped directly into |
| 669 | memory, at which point it behaves like a mutable string, so its |
| 670 | contents can be read and modified. They can even be passed to |
| 671 | functions that expect ordinary strings, such as the \module{re} |
| 672 | module. (Contributed by Sam Rushing, with some extensions by |
| 673 | A.M. Kuchling.) |
| 674 | |
| 675 | \item{\module{PyExpat}:} An interface to the Expat XML parser. |
| 676 | (Contributed by Paul Prescod.) |
| 677 | |
| 678 | \item{\module{robotparser}:} Parse a \file{robots.txt} file, which is |
| 679 | used for writing Web spiders that politely avoid certain areas of a |
| 680 | Web site. The parser accepts the contents of a \file{robots.txt} file |
| 681 | builds a set of rules from it, and can then answer questions about |
| 682 | the fetchability of a given URL. (Contributed by Skip Montanaro.) |
| 683 | |
| 684 | \item{\module{tabnanny}:} A module/script to |
| 685 | checks Python source code for ambiguous indentation. |
| 686 | (Contributed by Tim Peters.) |
| 687 | |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 688 | \item{\module{UserString}:} A base class useful for deriving objects that behave like strings. |
| 689 | |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 690 | \item{\module{winreg} and \module{_winreg}:} An interface to the |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 691 | Windows registry. \module{winreg} has been part of PythonWin since |
| 692 | 1995, but now has been added to the core distribution, and enhanced to |
| 693 | support Unicode. \module{_winreg} is a low-level wrapper of the |
| 694 | Windows registry functions, contributed by Bill Tutt and Mark Hammond, |
| 695 | while \module{winreg} is a higher-level, more object-oriented API on top of |
| 696 | \module{_winreg}, designed by Thomas Heller and implemented by Paul Prescod. |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 697 | |
| 698 | \item{\module{zipfile}:} A module for reading and writing ZIP-format |
| 699 | archives. These are archives produced by \program{PKZIP} on |
| 700 | DOS/Windows or \program{zip} on Unix, not to be confused with |
| 701 | \program{gzip}-format files (which are supported by the \module{gzip} |
| 702 | module) |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 703 | (Contributed by James C. Ahlstrom.) |
| 704 | |
Andrew M. Kuchling | 69db0e4 | 2000-06-28 02:16:00 +0000 | [diff] [blame] | 705 | \item{\module{imputil}:} A module that provides a simpler way for |
| 706 | writing customised import hooks, in comparison to the existing |
| 707 | \module{ihooks} module. (Implemented by Greg Stein, with much |
| 708 | discussion on python-dev along the way.) |
| 709 | |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 710 | \end{itemize} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 711 | |
| 712 | % ====================================================================== |
| 713 | \section{IDLE Improvements} |
| 714 | |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 715 | IDLE is the official Python cross-platform IDE, written using Tkinter. |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 716 | Python 2.0 includes IDLE 0.6, which adds a number of new features and |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 717 | improvements. A partial list: |
| 718 | |
| 719 | \begin{itemize} |
| 720 | \item UI improvements and optimizations, |
| 721 | especially in the area of syntax highlighting and auto-indentation. |
| 722 | |
| 723 | \item The class browser now shows more information, such as the top |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 724 | level functions in a module. |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 725 | |
| 726 | \item Tab width is now a user settable option. When opening an existing Python |
| 727 | file, IDLE automatically detects the indentation conventions, and adapts. |
| 728 | |
| 729 | \item There is now support for calling browsers on various platforms, |
| 730 | used to open the Python documentation in a browser. |
| 731 | |
| 732 | \item IDLE now has a command line, which is largely similar to |
| 733 | the vanilla Python interpreter. |
| 734 | |
| 735 | \item Call tips were added in many places. |
| 736 | |
| 737 | \item IDLE can now be installed as a package. |
| 738 | |
| 739 | \item In the editor window, there is now a line/column bar at the bottom. |
| 740 | |
| 741 | \item Three new keystroke commands: Check module (Alt-F5), Import |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 742 | module (F5) and Run script (Ctrl-F5). |
Andrew M. Kuchling | c0328f0 | 2000-06-10 15:11:20 +0000 | [diff] [blame] | 743 | |
| 744 | \end{itemize} |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 745 | |
| 746 | % ====================================================================== |
| 747 | \section{Deleted and Deprecated Modules} |
| 748 | |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 749 | A few modules have been dropped because they're obsolete, or because |
| 750 | there are now better ways to do the same thing. The \module{stdwin} |
| 751 | module is gone; it was for a platform-independent windowing toolkit |
| 752 | that's no longer developed. |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 753 | |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 754 | A number of modules have been moved to the |
| 755 | \file{lib-old} subdirectory: |
| 756 | \module{cmp}, \module{cmpcache}, \module{dircmp}, \module{dump}, |
| 757 | \module{find}, \module{grep}, \module{packmail}, |
| 758 | \module{poly}, \module{util}, \module{whatsound}, \module{zmod}. |
| 759 | If you have code which relies on a module that's been moved to |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 760 | \file{lib-old}, you can simply add that directory to \code{sys.path} |
Andrew M. Kuchling | a5bbb00 | 2000-06-10 02:41:46 +0000 | [diff] [blame] | 761 | to get them back, but you're encouraged to update any code that uses |
| 762 | these modules. |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 763 | |
Andrew M. Kuchling | 730067e | 2000-06-30 01:44:05 +0000 | [diff] [blame] | 764 | \section{Acknowledgements} |
Andrew M. Kuchling | 6c3cd8d | 2000-06-10 02:24:31 +0000 | [diff] [blame] | 765 | |
Andrew M. Kuchling | 62cdd96 | 2000-06-30 12:46:41 +0000 | [diff] [blame] | 766 | The author would like to thank the following people for offering |
| 767 | suggestions on drafts of this article: Fredrik Lundh, Skip |
| 768 | Montanaro, Vladimir Marangozov, Guido van Rossum, Neil Schemenauer. |
Andrew M. Kuchling | 25bfd0e | 2000-05-27 11:28:26 +0000 | [diff] [blame] | 769 | |
| 770 | \end{document} |
| 771 | |