Doc/howto/doanddont.tex - platform/external/python/cpython3 - Gitiles

 \documentclass{howto}

 \title{Idioms and Anti-Idioms in Python}

 \release{0.00}

 \author{Moshe Zadka}
 \authoraddress{howto@zadka.site.co.il}

 \begin{document}
 \maketitle

 This document is placed in the public doman.

 \begin{abstract}
 \noindent
 This document can be considered a companion to the tutorial. It
 shows how to use Python, and even more importantly, how {\em not}
 to use Python.
 \end{abstract}

 \tableofcontents

 \section{Language Constructs You Should Not Use}

 While Python has relatively few gotchas compared to other languages, it
 still has some constructs which are only useful in corner cases, or are
 plain dangerous.

 \subsection{from module import *}

 \subsubsection{Inside Function Definitions}

 \code{from module import *} is {\em invalid} inside function definitions.
 While many versions of Python do not check for the invalidity, it does not
 make it more valid, no more then having a smart lawyer makes a man innocent.
 Do not use it like that ever. Even in versions where it was accepted, it made
 the function execution slower, because the compiler could not be certain
 which names are local and which are global. In Python 2.1 this construct
 causes warnings, and sometimes even errors.

 \subsubsection{At Module Level}

 While it is valid to use \code{from module import *} at module level it
 is usually a bad idea. For one, this loses an important property Python
 otherwise has --- you can know where each toplevel name is defined by
 a simple "search" function in your favourite editor. You also open yourself
 to trouble in the future, if some module grows additional functions or
 classes.

 One of the most awful question asked on the newsgroup is why this code:

 \begin{verbatim}
 f = open("www")
 f.read()
 \end{verbatim}

 does not work. Of course, it works just fine (assuming you have a file
 called "www".) But it does not work if somewhere in the module, the
 statement \code{from os import *} is present. The \module{os} module
 has a function called \function{open()} which returns an integer. While
 it is very useful, shadowing builtins is one of its least useful properties.

 Remember, you can never know for sure what names a module exports, so either
 take what you need --- \code{from module import name1, name2}, or keep them in
 the module and access on a per-need basis ---
 \code{import module;print module.name}.

 \subsubsection{When It Is Just Fine}

 There are situations in which \code{from module import *} is just fine:

 \begin{itemize}

 \item The interactive prompt. For example, \code{from math import *} makes
       Python an amazing scientific calculator.

 \item When extending a module in C with a module in Python.

 \item When the module advertises itself as \code{from import *} safe.

 \end{itemize}

 \subsection{Unadorned \function{exec}, \function{execfile} and friends}

 The word ``unadorned'' refers to the use without an explicit dictionary,
 in which case those constructs evaluate code in the {\em current} environment.
 This is dangerous for the same reasons \code{from import *} is dangerous ---
 it might step over variables you are counting on and mess up things for
 the rest of your code. Simply do not do that.

 Bad examples:

 \begin{verbatim}
 >>> for name in sys.argv[1:]:
 >>>     exec("%s=1" % name)
 >>> def func(s, **kw):
 >>>     for var, val in kw.items():
 >>>         exec("s.%s=val" % var)  # invalid!
 >>> execfile("handler.py")
 >>> handle()
 \end{verbatim}

 Good examples:

 \begin{verbatim}
 >>> d = {}
 >>> for name in sys.argv[1:]:
 >>>     d[name] = 1
 >>> def func(s, **kw):
 >>>     for var, val in kw.items():
 >>>         setattr(s, var, val)
 >>> d={}
 >>> execfile("handle.py", d, d)
 >>> handle = d['handle']
 >>> handle()
 \end{verbatim}

 \subsection{from module import name1, name2}

 This is a ``don't'' which is much weaker then the previous ``don't''s
 but is still something you should not do if you don't have good reasons
 to do that. The reason it is usually bad idea is because you suddenly
 have an object which lives in two seperate namespaces. When the binding
 in one namespace changes, the binding in the other will not, so there
 will be a discrepancy between them. This happens when, for example,
 one module is reloaded, or changes the definition of a function at runtime.

 Bad example:

 \begin{verbatim}
 # foo.py
 a = 1

 # bar.py
 from foo import a
 if something():
     a = 2 # danger: foo.a != a
 \end{verbatim}

 Good example:

 \begin{verbatim}
 # foo.py
 a = 1

 # bar.py
 import foo
 if something():
     foo.a = 2
 \end{verbatim}

 \subsection{except:}

 Python has the \code{except:} clause, which catches all exceptions.
 Since {\em every} error in Python raises an exception, this makes many
 programming errors look like runtime problems, and hinders
 the debugging process.

 The following code shows a great example:

 \begin{verbatim}
 try:
     foo = opne("file") # misspelled "open"
 except:
     sys.exit("could not open file!")
 \end{verbatim}

 The second line triggers a \exception{NameError} which is caught by the
 except clause. The program will exit, and you will have no idea that
 this has nothing to do with the readability of \code{"file"}.

 The example above is better written

 \begin{verbatim}
 try:
     foo = opne("file") # will be changed to "open" as soon as we run it
 except IOError:
     sys.exit("could not open file")
 \end{verbatim}

 There are some situations in which the \code{except:} clause is useful:
 for example, in a framework when running callbacks, it is good not to
 let any callback disturb the framework.

 \section{Exceptions}

 Exceptions are a useful feature of Python. You should learn to raise
 them whenever something unexpected occurs, and catch them only where
 you can do something about them.

 The following is a very popular anti-idiom

 \begin{verbatim}
 def get_status(file):
     if not os.path.exists(file):
         print "file not found"
         sys.exit(1)
     return open(file).readline()
 \end{verbatim}

 Consider the case the file gets deleted between the time the call to
 \function{os.path.exists} is made and the time \function{open} is called.
 That means the last line will throw an \exception{IOError}. The same would
 happen if \var{file} exists but has no read permission. Since testing this
 on a normal machine on existing and non-existing files make it seem bugless,
 that means in testing the results will seem fine, and the code will get
 shipped. Then an unhandled \exception{IOError} escapes to the user, who
 has to watch the ugly traceback.

 Here is a better way to do it.

 \begin{verbatim}
 def get_status(file):
     try:
         return open(file).readline()
     except (IOError, OSError):
         print "file not found"
         sys.exit(1)
 \end{verbatim}

 In this version, *either* the file gets opened and the line is read
 (so it works even on flaky NFS or SMB connections), or the message
 is printed and the application aborted.

 Still, \function{get_status} makes too many assumptions --- that it
 will only be used in a short running script, and not, say, in a long
 running server. Sure, the caller could do something like

 \begin{verbatim}
 try:
     status = get_status(log)
 except SystemExit:
     status = None
 \end{verbatim}

 So, try to make as few \code{except} clauses in your code --- those will
 usually be a catch-all in the \function{main}, or inside calls which
 should always succeed.

 So, the best version is probably

 \begin{verbatim}
 def get_status(file):
     return open(file).readline()
 \end{verbatim}

 The caller can deal with the exception if it wants (for example, if it
 tries several files in a loop), or just let the exception filter upwards
 to {\em its} caller.

 The last version is not very good either --- due to implementation details,
 the file would not be closed when an exception is raised until the handler
 finishes, and perhaps not at all in non-C implementations (e.g., Jython).

 \begin{verbatim}
 def get_status(file):
     fp = open(file)
     try:
         return fp.readline()
     finally:
         fp.close()
 \end{verbatim}

 \section{Using the Batteries}

 Every so often, people seem to be writing stuff in the Python library
 again, usually poorly. While the occasional module has a poor interface,
 it is usually much better to use the rich standard library and data
 types that come with Python then inventing your own.

 A useful module very few people know about is \module{os.path}. It
 always has the correct path arithmetic for your operating system, and
 will usually be much better then whatever you come up with yourself.

 Compare:

 \begin{verbatim}
 # ugh!
 return dir+"/"+file
 # better
 return os.path.join(dir, file)
 \end{verbatim}

 More useful functions in \module{os.path}: \function{basename},
 \function{dirname} and \function{splitext}.

 There are also many useful builtin functions people seem not to be
 aware of for some reason: \function{min()} and \function{max()} can
 find the minimum/maximum of any sequence with comparable semantics,
 for example, yet many people write their own
 \function{max()}/\function{min()}.

 On the same note, note that \function{float()}, \function{int()} and
 \function{long()} all accept arguments of type string, and so are
 suited to parsing --- assuming you are ready to deal with the
 \exception{ValueError} they raise.

 \section{Using Backslash to Continue Statements}

 Since Python treats a newline as a statement terminator,
 and since statements are often more then is comfortable to put
 in one line, many people do:

 \begin{verbatim}
 if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
    calculate_number(10, 20) != forbulate(500, 360):
       pass
 \end{verbatim}

 You should realize that this is dangerous: a stray space after the
 \code{\\} would make this line wrong, and stray spaces are notoriously
 hard to see in editors. In this case, at least it would be a syntax
 error, but if the code was:

 \begin{verbatim}
 value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
         + calculate_number(10, 20)*forbulate(500, 360)
 \end{verbatim}

 then it would just be subtly wrong.

 It is usually much better to use the implicit continuation inside parenthesis:

 This version is bulletproof:

 \begin{verbatim}
 value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
         + calculate_number(10, 20)*forbulate(500, 360))
 \end{verbatim}

 \end{document}
	\documentclass{howto}

	\title{Idioms and Anti-Idioms in Python}

	\release{0.00}

	\author{Moshe Zadka}
	\authoraddress{howto@zadka.site.co.il}

	\begin{document}
	\maketitle

	This document is placed in the public doman.

	\begin{abstract}
	\noindent
	This document can be considered a companion to the tutorial. It
	shows how to use Python, and even more importantly, how {\em not}
	to use Python.
	\end{abstract}

	\tableofcontents

	\section{Language Constructs You Should Not Use}

	While Python has relatively few gotchas compared to other languages, it
	still has some constructs which are only useful in corner cases, or are
	plain dangerous.

	\subsection{from module import *}

	\subsubsection{Inside Function Definitions}

	\code{from module import *} is {\em invalid} inside function definitions.
	While many versions of Python do not check for the invalidity, it does not
	make it more valid, no more then having a smart lawyer makes a man innocent.
	Do not use it like that ever. Even in versions where it was accepted, it made
	the function execution slower, because the compiler could not be certain
	which names are local and which are global. In Python 2.1 this construct
	causes warnings, and sometimes even errors.

	\subsubsection{At Module Level}

	While it is valid to use \code{from module import *} at module level it
	is usually a bad idea. For one, this loses an important property Python
	otherwise has --- you can know where each toplevel name is defined by
	a simple "search" function in your favourite editor. You also open yourself
	to trouble in the future, if some module grows additional functions or
	classes.

	One of the most awful question asked on the newsgroup is why this code:

	\begin{verbatim}
	f = open("www")
	f.read()
	\end{verbatim}

	does not work. Of course, it works just fine (assuming you have a file
	called "www".) But it does not work if somewhere in the module, the
	statement \code{from os import *} is present. The \module{os} module
	has a function called \function{open()} which returns an integer. While
	it is very useful, shadowing builtins is one of its least useful properties.

	Remember, you can never know for sure what names a module exports, so either
	take what you need --- \code{from module import name1, name2}, or keep them in
	the module and access on a per-need basis ---
	\code{import module;print module.name}.

	\subsubsection{When It Is Just Fine}

	There are situations in which \code{from module import *} is just fine:

	\begin{itemize}

	\item The interactive prompt. For example, \code{from math import *} makes
	Python an amazing scientific calculator.

	\item When extending a module in C with a module in Python.

	\item When the module advertises itself as \code{from import *} safe.

	\end{itemize}

	\subsection{Unadorned \function{exec}, \function{execfile} and friends}

	The word ``unadorned'' refers to the use without an explicit dictionary,
	in which case those constructs evaluate code in the {\em current} environment.
	This is dangerous for the same reasons \code{from import *} is dangerous ---
	it might step over variables you are counting on and mess up things for
	the rest of your code. Simply do not do that.

	Bad examples:

	\begin{verbatim}
	>>> for name in sys.argv[1:]:
	>>> exec("%s=1" % name)
	>>> def func(s, **kw):
	>>> for var, val in kw.items():
	>>> exec("s.%s=val" % var) # invalid!
	>>> execfile("handler.py")
	>>> handle()
	\end{verbatim}

	Good examples:

	\begin{verbatim}
	>>> d = {}
	>>> for name in sys.argv[1:]:
	>>> d[name] = 1
	>>> def func(s, **kw):
	>>> for var, val in kw.items():
	>>> setattr(s, var, val)
	>>> d={}
	>>> execfile("handle.py", d, d)
	>>> handle = d['handle']
	>>> handle()
	\end{verbatim}

	\subsection{from module import name1, name2}

	This is a ``don't'' which is much weaker then the previous ``don't''s
	but is still something you should not do if you don't have good reasons
	to do that. The reason it is usually bad idea is because you suddenly
	have an object which lives in two seperate namespaces. When the binding
	in one namespace changes, the binding in the other will not, so there
	will be a discrepancy between them. This happens when, for example,
	one module is reloaded, or changes the definition of a function at runtime.

	Bad example:

	\begin{verbatim}
	# foo.py
	a = 1

	# bar.py
	from foo import a
	if something():
	a = 2 # danger: foo.a != a
	\end{verbatim}

	Good example:

	\begin{verbatim}
	# foo.py
	a = 1

	# bar.py
	import foo
	if something():
	foo.a = 2
	\end{verbatim}

	\subsection{except:}

	Python has the \code{except:} clause, which catches all exceptions.
	Since {\em every} error in Python raises an exception, this makes many
	programming errors look like runtime problems, and hinders
	the debugging process.

	The following code shows a great example:

	\begin{verbatim}
	try:
	foo = opne("file") # misspelled "open"
	except:
	sys.exit("could not open file!")
	\end{verbatim}

	The second line triggers a \exception{NameError} which is caught by the
	except clause. The program will exit, and you will have no idea that
	this has nothing to do with the readability of \code{"file"}.

	The example above is better written

	\begin{verbatim}
	try:
	foo = opne("file") # will be changed to "open" as soon as we run it
	except IOError:
	sys.exit("could not open file")
	\end{verbatim}

	There are some situations in which the \code{except:} clause is useful:
	for example, in a framework when running callbacks, it is good not to
	let any callback disturb the framework.

	\section{Exceptions}

	Exceptions are a useful feature of Python. You should learn to raise
	them whenever something unexpected occurs, and catch them only where
	you can do something about them.

	The following is a very popular anti-idiom

	\begin{verbatim}
	def get_status(file):
	if not os.path.exists(file):
	print "file not found"
	sys.exit(1)
	return open(file).readline()
	\end{verbatim}

	Consider the case the file gets deleted between the time the call to
	\function{os.path.exists} is made and the time \function{open} is called.
	That means the last line will throw an \exception{IOError}. The same would
	happen if \var{file} exists but has no read permission. Since testing this
	on a normal machine on existing and non-existing files make it seem bugless,
	that means in testing the results will seem fine, and the code will get
	shipped. Then an unhandled \exception{IOError} escapes to the user, who
	has to watch the ugly traceback.

	Here is a better way to do it.

	\begin{verbatim}
	def get_status(file):
	try:
	return open(file).readline()
	except (IOError, OSError):
	print "file not found"
	sys.exit(1)
	\end{verbatim}

	In this version, either the file gets opened and the line is read
	(so it works even on flaky NFS or SMB connections), or the message
	is printed and the application aborted.

	Still, \function{get_status} makes too many assumptions --- that it
	will only be used in a short running script, and not, say, in a long
	running server. Sure, the caller could do something like

	\begin{verbatim}
	try:
	status = get_status(log)
	except SystemExit:
	status = None
	\end{verbatim}

	So, try to make as few \code{except} clauses in your code --- those will
	usually be a catch-all in the \function{main}, or inside calls which
	should always succeed.

	So, the best version is probably

	\begin{verbatim}
	def get_status(file):
	return open(file).readline()
	\end{verbatim}

	The caller can deal with the exception if it wants (for example, if it
	tries several files in a loop), or just let the exception filter upwards
	to {\em its} caller.

	The last version is not very good either --- due to implementation details,
	the file would not be closed when an exception is raised until the handler
	finishes, and perhaps not at all in non-C implementations (e.g., Jython).

	\begin{verbatim}
	def get_status(file):
	fp = open(file)
	try:
	return fp.readline()
	finally:
	fp.close()
	\end{verbatim}

	\section{Using the Batteries}

	Every so often, people seem to be writing stuff in the Python library
	again, usually poorly. While the occasional module has a poor interface,
	it is usually much better to use the rich standard library and data
	types that come with Python then inventing your own.

	A useful module very few people know about is \module{os.path}. It
	always has the correct path arithmetic for your operating system, and
	will usually be much better then whatever you come up with yourself.

	Compare:

	\begin{verbatim}
	# ugh!
	return dir+"/"+file
	# better
	return os.path.join(dir, file)
	\end{verbatim}

	More useful functions in \module{os.path}: \function{basename},
	\function{dirname} and \function{splitext}.

	There are also many useful builtin functions people seem not to be
	aware of for some reason: \function{min()} and \function{max()} can
	find the minimum/maximum of any sequence with comparable semantics,
	for example, yet many people write their own
	\function{max()}/\function{min()}.

	On the same note, note that \function{float()}, \function{int()} and
	\function{long()} all accept arguments of type string, and so are
	suited to parsing --- assuming you are ready to deal with the
	\exception{ValueError} they raise.

	\section{Using Backslash to Continue Statements}

	Since Python treats a newline as a statement terminator,
	and since statements are often more then is comfortable to put
	in one line, many people do:

	\begin{verbatim}
	if foo.bar()['first'][0] == baz.quux(1, 2)[5:9] and \
	calculate_number(10, 20) != forbulate(500, 360):
	pass
	\end{verbatim}

	You should realize that this is dangerous: a stray space after the
	\code{\\} would make this line wrong, and stray spaces are notoriously
	hard to see in editors. In this case, at least it would be a syntax
	error, but if the code was:

	\begin{verbatim}
	value = foo.bar()['first'][0]*baz.quux(1, 2)[5:9] \
	+ calculate_number(10, 20)*forbulate(500, 360)
	\end{verbatim}

	then it would just be subtly wrong.

	It is usually much better to use the implicit continuation inside parenthesis:

	This version is bulletproof:

	\begin{verbatim}
	value = (foo.bar()['first'][0]*baz.quux(1, 2)[5:9]
	+ calculate_number(10, 20)*forbulate(500, 360))
	\end{verbatim}

	\end{document}