| \documentclass{howto} |
| |
| \title{Idioms and Anti-Idioms in Python} |
| |
| \release{0.00} |
| |
| \author{Moshe Zadka} |
| \authoraddress{howto@zadka.site.co.il} |
| |
| \begin{document} |
| \maketitle |
| |
| This document is placed in the public doman. |
| |
| \begin{abstract} |
| \noindent |
| This document can be considered a companion to the tutorial. It |
| shows how to use Python, and even more importantly, how {\em not} |
| to use Python. |
| \end{abstract} |
| |
| \tableofcontents |
| |
| \section{Language Constructs You Should Not Use} |
| |
| While Python has relatively few gotchas compared to other languages, it |
| still has some constructs which are only useful in corner cases, or are |
| plain dangerous. |
| |
| \subsection{from module import *} |
| |
| \subsubsection{Inside Function Definitions} |
| |
| \code{from module import *} is {\em invalid} inside function definitions. |
| While many versions of Python do not check for the invalidity, it does not |
| make it more valid, no more then having a smart lawyer makes a man innocent. |
| Do not use it like that ever. Even in versions where it was accepted, it made |
| the function execution slower, because the compiler could not be certain |
| which names are local and which are global. In Python 2.1 this construct |
| causes warnings, and sometimes even errors. |
| |
| \subsubsection{At Module Level} |
| |
| While it is valid to use \code{from module import *} at module level it |
| is usually a bad idea. For one, this loses an important property Python |
| otherwise has --- you can know where each toplevel name is defined by |
| a simple "search" function in your favourite editor. You also open yourself |
| to trouble in the future, if some module grows additional functions or |
| classes. |
| |
| One of the most awful question asked on the newsgroup is why this code: |
| |
| \begin{verbatim} |
| f = open("www") |
| f.read() |
| \end{verbatim} |
| |
| does not work. Of course, it works just fine (assuming you have a file |
| called "www".) But it does not work if somewhere in the module, the |
| statement \code{from os import *} is present. The \module{os} module |
| has a function called \function{open()} which returns an integer. While |
| it is very useful, shadowing builtins is one of its least useful properties. |
| |
| Remember, you can never know for sure what names a module exports, so either |
| take what you need --- \code{from module import name1, name2}, or keep them in |
| the module and access on a per-need basis --- |
| \code{import module;print module.name}. |
| |
| \subsubsection{When It Is Just Fine} |
| |
| There are situations in which \code{from module import *} is just fine: |
| |
| \begin{itemize} |
| |
| \item The interactive prompt. For example, \code{from math import *} makes |
| Python an amazing scientific calculator. |
| |
| \item When extending a module in C with a module in Python. |
| |
| \item When the module advertises itself as \code{from import *} safe. |
| |
| \end{itemize} |
| |
| \subsection{Unadorned \function{exec}, \function{execfile} and friends} |
| |
| The word ``unadorned'' refers to the use without an explicit dictionary, |
| in which case those constructs evaluate code in the {\em current} environment. |
| This is dangerous for the same reasons \code{from import *} is dangerous --- |
| it might step over variables you are counting on and mess up things for |
| the rest of your code. Simply do not do that. |
| |
| Bad examples: |
| |
| \begin{verbatim} |
| >>> for name in sys.argv[1:]: |
| >>> exec("%s=1" % name) |
| >>> def func(s, **kw): |
| >>> for var, val in kw.items(): |
| >>> exec("s.%s=val" % var) # invalid! |
| >>> execfile("handler.py") |
| >>> handle() |
| \end{verbatim} |
| |
| Good examples: |
| |
| \begin{verbatim} |
| >>> d = {} |
| >>> for name in sys.argv[1:]: |
| >>> d[name] = 1 |
| >>> def func(s, **kw): |
| >>> for var, val in kw.items(): |
| >>> setattr(s, var, val) |
| >>> d={} |
| >>> execfile("handle.py", d, d) |
| >>> handle = d['handle'] |
| >>> handle() |
| \end{verbatim} |
| |
| \subsection{from module import name1, name2} |
| |
| This is a ``don't'' which is much weaker then the previous ``don't''s |
| but is still something you should not do if you don't have good reasons |
| to do that. The reason it is usually bad idea is because you suddenly |
| have an object which lives in two seperate namespaces. When the binding |
| in one namespace changes, the binding in the other will not, so there |
| will be a discrepancy between them. This happens when, for example, |
| one module is reloaded, or changes the definition of a function at runtime. |
| |
| Bad example: |
| |
| \begin{verbatim} |
| # foo.py |
| a = 1 |
| |
| # bar.py |
| from foo import a |
| if something(): |
| a = 2 # danger: foo.a != a |
| \end{verbatim} |
| |
| Good example: |
| |
| \begin{verbatim} |
| # foo.py |
| a = 1 |
| |
| # bar.py |
| import foo |
| if something(): |
| foo.a = 2 |
| \end{verbatim} |
| |
| \subsection{except:} |
| |
| Python has the \code{except:} clause, which catches all exceptions. |
| Since {\em every} error in Python raises an exception, this makes many |
| programming errors look like runtime problems, and hinders |
| the debugging process. |
| |
| The following code shows a great example: |
| |
| \begin{verbatim} |
| try: |
| foo = opne("file") # misspelled "open" |
| except: |
| sys.exit("could not open file!") |
| \end{verbatim} |
| |
| The second line triggers a \exception{NameError} which is caught by the |
| except clause. The program will exit, and you will have no idea that |
| this has nothing to do with the readability of \code{"file"}. |
| |
| The example above is better written |
| |
| \begin{verbatim} |
| try: |
| foo = opne("file") # will be changed to "open" as soon as we run it |
| except IOError: |
| sys.exit("could not open file") |
| \end{verbatim} |
| |
| There are some situations in which the \code{except:} clause is useful: |
| for example, in a framework when running callbacks, it is good not to |
| let any callback disturb the framework. |
| |
| \section{Exceptions} |
| |
| Exceptions are a useful feature of Python. You should learn to raise |
| them whenever something unexpected occurs, and catch them only where |
| you can do something about them. |
| |
| The following is a very popular anti-idiom |
| |
| \begin{verbatim} |
| def get_status(file): |
| if not os.path.exists(file): |
| print "file not found" |
| sys.exit(1) |
| return open(file).readline() |
| \end{verbatim} |
| |
| Consider the case the file gets deleted between the time the call to |
| \function{os.path.exists} is made and the time \function{open} is called. |
| That means the last line will throw an \exception{IOError}. The same would |
| happen if \var{file} exists but has no read permission. Since testing this |
| on a normal machine on existing and non-existing files make it seem bugless, |
| that means in testing the results will seem fine, and the code will get |
| shipped. Then an unhandled \exception{IOError} escapes to the user, who |
| has to watch the ugly traceback. |
| |
| Here is a better way to do it. |
| |
| \begin{verbatim} |
| def get_status(file): |
| try: |
| return open(file).readline() |
| except (IOError, OSError): |
| print "file not found" |
| sys.exit(1) |
| \end{verbatim} |
| |
| In this version, *either* the file gets opened and the line is read |
| (so it works even on flaky NFS or SMB connections), or the message |
| is printed and the application aborted. |
| |
| Still, \function{get_status} makes too many assumptions --- that it |
| will only be used in a short running script, and not, say, in a long |
| running server. Sure, the caller could do something like |
| |
| \begin{verbatim} |
| try: |
| status = get_status(log) |
| except SystemExit: |
| status = None |
| \end{verbatim} |
| |
| So, try to make as few \code{except} clauses in your code --- those will |
| usually be a catch-all in the \function{main}, or inside calls which |
| should always succeed. |
| |
| So, the best version is probably |
| |
| \begin{verbatim} |
| def get_status(file): |
| return open(file).readline() |
| \end{verbatim} |
| |
| The caller can deal with the exception if it wants (for example, if it |
| tries several files in a loop), or just let the exception filter upwards |
| to {\em its} caller. |
| |
| The last version is not very good either --- due to implementation details, |
| the file would not be closed when an exception is raised until the handler |
| finishes, and perhaps not at all in non-C implementations (e.g., Jython). |
| |
| \begin{verbatim} |
| def get_status(file): |
| fp = open(file) |
| try: |
| return fp.readline() |
| finally: |
| fp.close() |
| \end{verbatim} |
| |
| \section{Using the Batteries} |
| |
| Every so often, people seem to be writing stuff in the Python library |
| again, usually poorly. While the occasional module has a poor interface, |
| it is usually much better to use the rich standard library and data |
| types that come with Python then inventing your own. |
| |
| A useful module very few people know about is \module{os.path}. It |
| always has the correct path arithmetic for your operating system, and |
| will usually be much better then whatever you come up with yourself. |
| |
| Compare: |
| |
| \begin{verbatim} |
| # ugh! |
| return dir+"/"+file |
| # better |
| return os.path.join(dir, file) |
| \end{verbatim} |
| |
| More useful functions in \module{os.path}: \function{basename}, |
| \function{dirname} and \function{splitext}. |
| |
| There are also many useful builtin functions people seem not to be |
| aware of for some reason: \function{min()} and \function{max()} can |
| find the minimum/maximum of any sequence with comparable semantics, |
| for example, yet many people write their own |
| \function{max()}/\function{min()}. |
| |
| On the same note, note that \function{float()}, \function{int()} and |
| \function{long()} all accept arguments of type string, and so are |
| suited to parsing --- assuming you are ready to deal with the |
| \exception{ValueError} they raise. |
| |
| \section{Using Backslash to Continue Statements} |
| |
| Since Python treats a newline as a statement terminator, |
| and since statements are often more then is comfortable to put |
| in one line, many people do: |
| |
| \begin{verbatim} |
| if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \ |
| calculate_number(10, 20) != forbulate(500, 360): |
| pass |
| \end{verbatim} |
| |
| You should realize that this is dangerous: a stray space after the |
| \code{\\} would make this line wrong, and stray spaces are notoriously |
| hard to see in editors. In this case, at least it would be a syntax |
| error, but if the code was: |
| |
| \begin{verbatim} |
| value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \ |
| + calculate_number(10, 20)*forbulate(500, 360) |
| \end{verbatim} |
| |
| then it would just be subtly wrong. |
| |
| It is usually much better to use the implicit continuation inside parenthesis: |
| |
| This version is bulletproof: |
| |
| \begin{verbatim} |
| value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9] |
| + calculate_number(10, 20)*forbulate(500, 360)) |
| \end{verbatim} |
| |
| \end{document} |