Andrew M. Kuchling | e8f44d6 | 2005-08-30 01:25:05 +0000 | [diff] [blame^] | 1 | \documentclass{howto} |
| 2 | |
| 3 | \title{Idioms and Anti-Idioms in Python} |
| 4 | |
| 5 | \release{0.00} |
| 6 | |
| 7 | \author{Moshe Zadka} |
| 8 | \authoraddress{howto@zadka.site.co.il} |
| 9 | |
| 10 | \begin{document} |
| 11 | \maketitle |
| 12 | |
| 13 | This document is placed in the public doman. |
| 14 | |
| 15 | \begin{abstract} |
| 16 | \noindent |
| 17 | This document can be considered a companion to the tutorial. It |
| 18 | shows how to use Python, and even more importantly, how {\em not} |
| 19 | to use Python. |
| 20 | \end{abstract} |
| 21 | |
| 22 | \tableofcontents |
| 23 | |
| 24 | \section{Language Constructs You Should Not Use} |
| 25 | |
| 26 | While Python has relatively few gotchas compared to other languages, it |
| 27 | still has some constructs which are only useful in corner cases, or are |
| 28 | plain dangerous. |
| 29 | |
| 30 | \subsection{from module import *} |
| 31 | |
| 32 | \subsubsection{Inside Function Definitions} |
| 33 | |
| 34 | \code{from module import *} is {\em invalid} inside function definitions. |
| 35 | While many versions of Python do no check for the invalidity, it does not |
| 36 | make it more valid, no more then having a smart lawyer makes a man innocent. |
| 37 | Do not use it like that ever. Even in versions where it was accepted, it made |
| 38 | the function execution slower, because the compiler could not be certain |
| 39 | which names are local and which are global. In Python 2.1 this construct |
| 40 | causes warnings, and sometimes even errors. |
| 41 | |
| 42 | \subsubsection{At Module Level} |
| 43 | |
| 44 | While it is valid to use \code{from module import *} at module level it |
| 45 | is usually a bad idea. For one, this loses an important property Python |
| 46 | otherwise has --- you can know where each toplevel name is defined by |
| 47 | a simple "search" function in your favourite editor. You also open yourself |
| 48 | to trouble in the future, if some module grows additional functions or |
| 49 | classes. |
| 50 | |
| 51 | One of the most awful question asked on the newsgroup is why this code: |
| 52 | |
| 53 | \begin{verbatim} |
| 54 | f = open("www") |
| 55 | f.read() |
| 56 | \end{verbatim} |
| 57 | |
| 58 | does not work. Of course, it works just fine (assuming you have a file |
| 59 | called "www".) But it does not work if somewhere in the module, the |
| 60 | statement \code{from os import *} is present. The \module{os} module |
| 61 | has a function called \function{open()} which returns an integer. While |
| 62 | it is very useful, shadowing builtins is one of its least useful properties. |
| 63 | |
| 64 | Remember, you can never know for sure what names a module exports, so either |
| 65 | take what you need --- \code{from module import name1, name2}, or keep them in |
| 66 | the module and access on a per-need basis --- |
| 67 | \code{import module;print module.name}. |
| 68 | |
| 69 | \subsubsection{When It Is Just Fine} |
| 70 | |
| 71 | There are situations in which \code{from module import *} is just fine: |
| 72 | |
| 73 | \begin{itemize} |
| 74 | |
| 75 | \item The interactive prompt. For example, \code{from math import *} makes |
| 76 | Python an amazing scientific calculator. |
| 77 | |
| 78 | \item When extending a module in C with a module in Python. |
| 79 | |
| 80 | \item When the module advertises itself as \code{from import *} safe. |
| 81 | |
| 82 | \end{itemize} |
| 83 | |
| 84 | \subsection{Unadorned \keyword{exec}, \function{execfile} and friends} |
| 85 | |
| 86 | The word ``unadorned'' refers to the use without an explicit dictionary, |
| 87 | in which case those constructs evaluate code in the {\em current} environment. |
| 88 | This is dangerous for the same reasons \code{from import *} is dangerous --- |
| 89 | it might step over variables you are counting on and mess up things for |
| 90 | the rest of your code. Simply do not do that. |
| 91 | |
| 92 | Bad examples: |
| 93 | |
| 94 | \begin{verbatim} |
| 95 | >>> for name in sys.argv[1:]: |
| 96 | >>> exec "%s=1" % name |
| 97 | >>> def func(s, **kw): |
| 98 | >>> for var, val in kw.items(): |
| 99 | >>> exec "s.%s=val" % var # invalid! |
| 100 | >>> execfile("handler.py") |
| 101 | >>> handle() |
| 102 | \end{verbatim} |
| 103 | |
| 104 | Good examples: |
| 105 | |
| 106 | \begin{verbatim} |
| 107 | >>> d = {} |
| 108 | >>> for name in sys.argv[1:]: |
| 109 | >>> d[name] = 1 |
| 110 | >>> def func(s, **kw): |
| 111 | >>> for var, val in kw.items(): |
| 112 | >>> setattr(s, var, val) |
| 113 | >>> d={} |
| 114 | >>> execfile("handle.py", d, d) |
| 115 | >>> handle = d['handle'] |
| 116 | >>> handle() |
| 117 | \end{verbatim} |
| 118 | |
| 119 | \subsection{from module import name1, name2} |
| 120 | |
| 121 | This is a ``don't'' which is much weaker then the previous ``don't''s |
| 122 | but is still something you should not do if you don't have good reasons |
| 123 | to do that. The reason it is usually bad idea is because you suddenly |
| 124 | have an object which lives in two seperate namespaces. When the binding |
| 125 | in one namespace changes, the binding in the other will not, so there |
| 126 | will be a discrepancy between them. This happens when, for example, |
| 127 | one module is reloaded, or changes the definition of a function at runtime. |
| 128 | |
| 129 | Bad example: |
| 130 | |
| 131 | \begin{verbatim} |
| 132 | # foo.py |
| 133 | a = 1 |
| 134 | |
| 135 | # bar.py |
| 136 | from foo import a |
| 137 | if something(): |
| 138 | a = 2 # danger: foo.a != a |
| 139 | \end{verbatim} |
| 140 | |
| 141 | Good example: |
| 142 | |
| 143 | \begin{verbatim} |
| 144 | # foo.py |
| 145 | a = 1 |
| 146 | |
| 147 | # bar.py |
| 148 | import foo |
| 149 | if something(): |
| 150 | foo.a = 2 |
| 151 | \end{verbatim} |
| 152 | |
| 153 | \subsection{except:} |
| 154 | |
| 155 | Python has the \code{except:} clause, which catches all exceptions. |
| 156 | Since {\em every} error in Python raises an exception, this makes many |
| 157 | programming errors look like runtime problems, and hinders |
| 158 | the debugging process. |
| 159 | |
| 160 | The following code shows a great example: |
| 161 | |
| 162 | \begin{verbatim} |
| 163 | try: |
| 164 | foo = opne("file") # misspelled "open" |
| 165 | except: |
| 166 | sys.exit("could not open file!") |
| 167 | \end{verbatim} |
| 168 | |
| 169 | The second line triggers a \exception{NameError} which is caught by the |
| 170 | except clause. The program will exit, and you will have no idea that |
| 171 | this has nothing to do with the readability of \code{"file"}. |
| 172 | |
| 173 | The example above is better written |
| 174 | |
| 175 | \begin{verbatim} |
| 176 | try: |
| 177 | foo = opne("file") # will be changed to "open" as soon as we run it |
| 178 | except IOError: |
| 179 | sys.exit("could not open file") |
| 180 | \end{verbatim} |
| 181 | |
| 182 | There are some situations in which the \code{except:} clause is useful: |
| 183 | for example, in a framework when running callbacks, it is good not to |
| 184 | let any callback disturb the framework. |
| 185 | |
| 186 | \section{Exceptions} |
| 187 | |
| 188 | Exceptions are a useful feature of Python. You should learn to raise |
| 189 | them whenever something unexpected occurs, and catch them only where |
| 190 | you can do something about them. |
| 191 | |
| 192 | The following is a very popular anti-idiom |
| 193 | |
| 194 | \begin{verbatim} |
| 195 | def get_status(file): |
| 196 | if not os.path.exists(file): |
| 197 | print "file not found" |
| 198 | sys.exit(1) |
| 199 | return open(file).readline() |
| 200 | \end{verbatim} |
| 201 | |
| 202 | Consider the case the file gets deleted between the time the call to |
| 203 | \function{os.path.exists} is made and the time \function{open} is called. |
| 204 | That means the last line will throw an \exception{IOError}. The same would |
| 205 | happen if \var{file} exists but has no read permission. Since testing this |
| 206 | on a normal machine on existing and non-existing files make it seem bugless, |
| 207 | that means in testing the results will seem fine, and the code will get |
| 208 | shipped. Then an unhandled \exception{IOError} escapes to the user, who |
| 209 | has to watch the ugly traceback. |
| 210 | |
| 211 | Here is a better way to do it. |
| 212 | |
| 213 | \begin{verbatim} |
| 214 | def get_status(file): |
| 215 | try: |
| 216 | return open(file).readline() |
| 217 | except (IOError, OSError): |
| 218 | print "file not found" |
| 219 | sys.exit(1) |
| 220 | \end{verbatim} |
| 221 | |
| 222 | In this version, *either* the file gets opened and the line is read |
| 223 | (so it works even on flaky NFS or SMB connections), or the message |
| 224 | is printed and the application aborted. |
| 225 | |
| 226 | Still, \function{get_status} makes too many assumptions --- that it |
| 227 | will only be used in a short running script, and not, say, in a long |
| 228 | running server. Sure, the caller could do something like |
| 229 | |
| 230 | \begin{verbatim} |
| 231 | try: |
| 232 | status = get_status(log) |
| 233 | except SystemExit: |
| 234 | status = None |
| 235 | \end{verbatim} |
| 236 | |
| 237 | So, try to make as few \code{except} clauses in your code --- those will |
| 238 | usually be a catch-all in the \function{main}, or inside calls which |
| 239 | should always succeed. |
| 240 | |
| 241 | So, the best version is probably |
| 242 | |
| 243 | \begin{verbatim} |
| 244 | def get_status(file): |
| 245 | return open(file).readline() |
| 246 | \end{verbatim} |
| 247 | |
| 248 | The caller can deal with the exception if it wants (for example, if it |
| 249 | tries several files in a loop), or just let the exception filter upwards |
| 250 | to {\em its} caller. |
| 251 | |
| 252 | The last version is not very good either --- due to implementation details, |
| 253 | the file would not be closed when an exception is raised until the handler |
| 254 | finishes, and perhaps not at all in non-C implementations (e.g., Jython). |
| 255 | |
| 256 | \begin{verbatim} |
| 257 | def get_status(file): |
| 258 | fp = open(file) |
| 259 | try: |
| 260 | return fp.readline() |
| 261 | finally: |
| 262 | fp.close() |
| 263 | \end{verbatim} |
| 264 | |
| 265 | \section{Using the Batteries} |
| 266 | |
| 267 | Every so often, people seem to be writing stuff in the Python library |
| 268 | again, usually poorly. While the occasional module has a poor interface, |
| 269 | it is usually much better to use the rich standard library and data |
| 270 | types that come with Python then inventing your own. |
| 271 | |
| 272 | A useful module very few people know about is \module{os.path}. It |
| 273 | always has the correct path arithmetic for your operating system, and |
| 274 | will usually be much better then whatever you come up with yourself. |
| 275 | |
| 276 | Compare: |
| 277 | |
| 278 | \begin{verbatim} |
| 279 | # ugh! |
| 280 | return dir+"/"+file |
| 281 | # better |
| 282 | return os.path.join(dir, file) |
| 283 | \end{verbatim} |
| 284 | |
| 285 | More useful functions in \module{os.path}: \function{basename}, |
| 286 | \function{dirname} and \function{splitext}. |
| 287 | |
| 288 | There are also many useful builtin functions people seem not to be |
| 289 | aware of for some reason: \function{min()} and \function{max()} can |
| 290 | find the minimum/maximum of any sequence with comparable semantics, |
| 291 | for example, yet many people write they own max/min. Another highly |
| 292 | useful function is \function{reduce()}. Classical use of \function{reduce()} |
| 293 | is something like |
| 294 | |
| 295 | \begin{verbatim} |
| 296 | import sys, operator |
| 297 | nums = map(float, sys.argv[1:]) |
| 298 | print reduce(operator.add, nums)/len(nums) |
| 299 | \end{verbatim} |
| 300 | |
| 301 | This cute little script prints the average of all numbers given on the |
| 302 | command line. The \function{reduce()} adds up all the numbers, and |
| 303 | the rest is just some pre- and postprocessing. |
| 304 | |
| 305 | On the same note, note that \function{float()}, \function{int()} and |
| 306 | \function{long()} all accept arguments of type string, and so are |
| 307 | suited to parsing --- assuming you are ready to deal with the |
| 308 | \exception{ValueError} they raise. |
| 309 | |
| 310 | \section{Using Backslash to Continue Statements} |
| 311 | |
| 312 | Since Python treats a newline as a statement terminator, |
| 313 | and since statements are often more then is comfortable to put |
| 314 | in one line, many people do: |
| 315 | |
| 316 | \begin{verbatim} |
| 317 | if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \ |
| 318 | calculate_number(10, 20) != forbulate(500, 360): |
| 319 | pass |
| 320 | \end{verbatim} |
| 321 | |
| 322 | You should realize that this is dangerous: a stray space after the |
| 323 | \code{\\} would make this line wrong, and stray spaces are notoriously |
| 324 | hard to see in editors. In this case, at least it would be a syntax |
| 325 | error, but if the code was: |
| 326 | |
| 327 | \begin{verbatim} |
| 328 | value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \ |
| 329 | + calculate_number(10, 20)*forbulate(500, 360) |
| 330 | \end{verbatim} |
| 331 | |
| 332 | then it would just be subtly wrong. |
| 333 | |
| 334 | It is usually much better to use the implicit continuation inside parenthesis: |
| 335 | |
| 336 | This version is bulletproof: |
| 337 | |
| 338 | \begin{verbatim} |
| 339 | value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] |
| 340 | + calculate_number(10, 20)*forbulate(500, 360)) |
| 341 | \end{verbatim} |
| 342 | |
| 343 | \end{document} |