blob: e5aa8b43b10ef25da75ec7f8b639a330bb0be160 [file] [log] [blame]
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00001% Complete documentation on the extended LaTeX markup used for Python
2% documentation is available in ``Documenting Python'', which is part
3% of the standard documentation for Python. It may be found online
4% at:
5%
6% http://www.python.org/doc/current/doc/doc.html
7
Fred Drake834a85a2001-08-15 17:01:34 +00008\documentclass{howto}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00009
10\title{Python compiler package}
11
12\author{Jeremy Hylton}
13
14% Please at least include a long-lived email address;
15% the rest is at your discretion.
16\authoraddress{
17 PythonLabs \\
Fred Drake834a85a2001-08-15 17:01:34 +000018 Zope Corporation \\
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000019 Email: \email{jeremy@zope.com}
20}
21
22\date{August 15, 2001} % update before release!
23 % Use an explicit date so that reformatting
24 % doesn't cause a new date to be used. Setting
25 % the date to \today can be used during draft
26 % stages to make it easier to handle versions.
27
28\release{2.2} % release version; this is used to define the
29 % \version macro
30
31\makeindex % tell \index to actually write the .idx file
32\makemodindex % If this contains a lot of module sections.
33
34
35\begin{document}
36
37\maketitle
38
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000039\begin{abstract}
40
41\noindent
42The Python compiler package is a tool for analyzing Python source code
43and generating Python bytecode. The compiler contains libraries to
44generate an abstract syntax tree from Python source code and to
45generate Python bytecode from the tree.
46
47\end{abstract}
48
49\tableofcontents
50
Fred Drake834a85a2001-08-15 17:01:34 +000051
52\section{Introduction\label{Introduction}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000053
Jeremy Hylton241d69c2001-08-18 00:24:46 +000054The \module{compiler} package is a Python source to bytecode
55translator written in Python. It uses the builtin parser and standard
56\ulink{\module{parser}}
57{http://www.python.org/doc/current/lib/module-parser.html} to
58generated a concrete syntax tree. This tree is used to generate an
59abstract syntax tree (AST) and then Python bytecode.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000060
Jeremy Hylton241d69c2001-08-18 00:24:46 +000061The full functionality of the package duplicates the builtin compiler
62provided with the Python interpreter. It is intended to match its
63behavior almost exactly. Why implement another compiler that does the
64same thing? The package is useful for a variety of purposes. It can
65be modified more easily than the builtin compiler. The AST it
66generates is useful for analyzing Python source code.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000067
Jeremy Hylton241d69c2001-08-18 00:24:46 +000068This manual explains how the various components of the
69\module{compiler} package work. It blends reference material with a
70tutorial. (At least it will when the document is done.)
Fred Drake834a85a2001-08-15 17:01:34 +000071
Jeremy Hylton241d69c2001-08-18 00:24:46 +000072\subsection{The basic interface}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000073
Fred Drake834a85a2001-08-15 17:01:34 +000074\declaremodule{}{compiler}
75
Jeremy Hylton241d69c2001-08-18 00:24:46 +000076The top-level of the package defines four functions. If you import
77\module{compiler}, you will get these functions and a collection of
78modules contained in the package.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000079
80\begin{funcdesc}{parse}{buf}
81Returns an abstract syntax tree for the Python source code in \var{buf}.
82The function raises SyntaxError if there is an error in the source
83code. The return value is a \class{compiler.ast.Module} instance that
84contains the tree.
85\end{funcdesc}
86
87\begin{funcdesc}{parseFile}{path}
88Return an abstract syntax tree for the Python source code in the file
Fred Drake42caf3f2001-08-15 14:35:13 +000089specified by \var{path}. It is equivalent to
90\code{parse(open(\var{path}).read())}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000091\end{funcdesc}
92
Fred Drake834a85a2001-08-15 17:01:34 +000093\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000094Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
95appropriate method on the \var{visitor} instance for each node
Fred Drake834a85a2001-08-15 17:01:34 +000096encountered.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000097\end{funcdesc}
98
Fred Drake834a85a2001-08-15 17:01:34 +000099\begin{funcdesc}{compile}{path}
100Compile the file \var{path} and generate the corresponding \file{.pyc}
101file.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000102\end{funcdesc}
103
104The \module{compiler} package contains the following modules:
Fred Drake834a85a2001-08-15 17:01:34 +0000105\refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
106\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
107\module{transformer}, and \refmodule[compiler.visitor]{visitor}.
108
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000109\subsection{Limitations}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000110
111There are some problems with the error checking of the compiler
112package. The interpreter detects syntax errors in two distinct
113phases. One set of errors is detected by the interpreter's parser,
114the other set by the compiler. The compiler package relies on the
115interpreter's parser, so it get the first phases of error checking for
116free. It implements the second phase itself, and that implement is
117incomplete. For example, the compiler package does not raise an error
118if a name appears more than once in an argument list:
119\code{def f(x, x): ...}
120
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000121A future version of the compiler should fix these problems.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000122
Fred Drake834a85a2001-08-15 17:01:34 +0000123\section{Python Abstract Syntax}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000124
125The \module{compiler.ast} module defines an abstract syntax for
126Python. In the abstract syntax tree, each node represents a syntactic
127construct. The root of the tree is \class{Module} object.
128
129The abstract syntax offers a higher level interface to parsed Python
Fred Drake834a85a2001-08-15 17:01:34 +0000130source code. The \ulink{\module{parser}}
131{http://www.python.org/doc/current/lib/module-parser.html}
132module and the compiler written in C for the Python interpreter use a
133concrete syntax tree. The concrete syntax is tied closely to the
134grammar description used for the Python parser. Instead of a single
135node for a construct, there are often several levels of nested nodes
136that are introduced by Python's precedence rules.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000137
138The abstract syntax tree is created by the
139\module{compiler.transformer} module. The transformer relies on the
140builtin Python parser to generate a concrete syntax tree. It
141generates an abstract syntax tree from the concrete tree.
142
Fred Drake834a85a2001-08-15 17:01:34 +0000143The \module{transformer} module was created by Greg
144Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
145experimental Python-to-C compiler. The current version contains a
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000146number of modifications and improvements, but the basic form of the
147abstract syntax and of the transformer are due to Stein and Tutt.
148
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000149\subsection{AST Nodes}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000150
Fred Drake834a85a2001-08-15 17:01:34 +0000151\declaremodule{}{compiler.ast}
152
153The \module{compiler.ast} module is generated from a text file that
154describes each node type and its elements. Each node type is
155represented as a class that inherits from the abstract base class
156\class{compiler.ast.Node} and defines a set of named attributes for
157child nodes.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000158
159\begin{classdesc}{Node}{}
160
161 The \class{Node} instances are created automatically by the parser
162 generator. The recommended interface for specific \class{Node}
163 instances is to use the public attributes to access child nodes. A
164 public attribute may be bound to a single node or to a sequence of
165 nodes, depending on the \class{Node} type. For example, the
166 \member{bases} attribute of the \class{Class} node, is bound to a
167 list of base class nodes, and the \member{doc} attribute is bound to
168 a single node.
169
170 Each \class{Node} instance has a \member{lineno} attribute which may
171 be \code{None}. XXX Not sure what the rules are for which nodes
172 will have a useful lineno.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000173\end{classdesc}
174
Fred Drake834a85a2001-08-15 17:01:34 +0000175All \class{Node} objects offer the following methods:
176
177\begin{methoddesc}{getChildren}{}
178 Returns a flattened list of the child nodes and objects in the
179 order they occur. Specifically, the order of the nodes is the
180 order in which they appear in the Python grammar. Not all of the
181 children are \class{Node} instances. The names of functions and
182 classes, for example, are plain strings.
183\end{methoddesc}
184
185\begin{methoddesc}{getChildNodes}{}
186 Returns a flattened list of the child nodes in the order they
187 occur. This method is like \method{getChildren()}, except that it
188 only returns those children that are \class{Node} instances.
189\end{methoddesc}
190
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000191Two examples illustrate the general structure of \class{Node}
Fred Drake834a85a2001-08-15 17:01:34 +0000192classes. The \keyword{while} statement is defined by the following
193grammar production:
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000194
195\begin{verbatim}
196while_stmt: "while" expression ":" suite
197 ["else" ":" suite]
198\end{verbatim}
199
200The \class{While} node has three attributes: \member{test},
201\member{body}, and \member{else_}. (If the natural name for an
202attribute is also a Python reserved word, it can't be used as an
Fred Drake834a85a2001-08-15 17:01:34 +0000203attribute name. An underscore is appended to the word to make it a
204legal identifier, hence \member{else_} instead of \keyword{else}.)
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000205
Fred Drake834a85a2001-08-15 17:01:34 +0000206The \keyword{if} statement is more complicated because it can include
207several tests.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000208
209\begin{verbatim}
210if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
211\end{verbatim}
212
213The \class{If} node only defines two attributes: \member{tests} and
214\member{else_}. The \member{tests} attribute is a sequence of test
Fred Drake834a85a2001-08-15 17:01:34 +0000215expression, consequent body pairs. There is one pair for each
216\keyword{if}/\keyword{elif} clause. The first element of the pair is
217the test expression. The second elements is a \class{Stmt} node that
218contains the code to execute if the test is true.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000219
220The \method{getChildren()} method of \class{If} returns a flat list of
Fred Drake834a85a2001-08-15 17:01:34 +0000221child nodes. If there are three \keyword{if}/\keyword{elif} clauses
222and no \keyword{else} clause, then \method{getChildren()} will return
223a list of six elements: the first test expression, the first
224\class{Stmt}, the second text expression, etc.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000225
226The following table lists each of the \class{Node} subclasses defined
227in \module{compiler.ast} and each of the public attributes available
228on their instances. The values of most of the attributes are
229themselves \class{Node} instances or sequences of instances. When the
230value is something other than an instance, the type is noted in the
231comment. The attributes are listed in the order in which they are
Fred Drake42caf3f2001-08-15 14:35:13 +0000232returned by \method{getChildren()} and \method{getChildNodes()}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000233
234\input{asttable}
235
Fred Drake834a85a2001-08-15 17:01:34 +0000236
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000237\subsection{Assignment nodes}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000238
239There is a collection of nodes used to represent assignments. Each
240assignment statement in the source code becomes a single
241\class{Assign} node in the AST. The \member{nodes} attribute is a
242list that contains a node for each assignment target. This is
243necessary because assignment can be chained, e.g. \code{a = b = 2}.
244Each \class{Node} in the list will be one of the following classes:
245\class{AssAttr}, \class{AssList}, \class{AssName}, or
246\class{AssTuple}.
247
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000248Each target assignment node will describe the kind of object being
249assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}.
250\class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}.
251\class{AssList} and \class{AssTuple} for list and tuple expansion
252respectively, e.g. \code{a, b, c = a_tuple}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000253
Jeremy Hylton241d69c2001-08-18 00:24:46 +0000254The target assignment nodes also have a \member{flags} attribute that
255indicates whether the node is being used for assignment or in a delete
256statement. The \class{AssName} is also used to represent a delete
257statement, e.g. \class{del x}.
258
259When an expression contains several attribute references, an
260assignment or delete statement will contain only one \class{AssAttr}
261node -- for the final attribute reference. The other attribute
262references will be represented as \class{Getattr} nodes in the
263\member{expr} attribute of the \class{AssAttr} instance.
264
265\subsection{Examples}
266
267This section shows several simple examples of ASTs for Python source
268code. The examples demonstrate how to use the \function{parse()}
269function, what the repr of an AST looks like, and how to access
270attributes of an AST node.
271
272The first module defines a single function. Assume it is stored in
273\file{/tmp/doublelib.py}.
274
275\begin{verbatim}
276"""This is an example module.
277
278This is the docstring.
279"""
280
281def double(x):
282 "Return twice the argument"
283 return x * 2
284\end{verbatim}
285
286In the interactive interpreter session below, I have reformatted the
287long AST reprs for readability. The AST reprs use unqualified class
288names. If you want to create an instance from a repr, you must import
289the class names from the \module{compiler.ast} module.
290
291\begin{verbatim}
292>>> import compiler
293>>> mod = compiler.parseFile("/tmp/doublelib.py")
294>>> mod
295Module('This is an example module.\n\nThis is the docstring.\n',
296 Stmt([Function('double', ['x'], [], 0, 'Return twice the argument',
297 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
298>>> from compiler.ast import *
299>>> Module('This is an example module.\n\nThis is the docstring.\n',
300... Stmt([Function('double', ['x'], [], 0, 'Return twice the argument',
301... Stmt([Return(Mul((Name('x'), Const(2))))]))]))
302Module('This is an example module.\n\nThis is the docstring.\n',
303 Stmt([Function('double', ['x'], [], 0, 'Return twice the argument',
304 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
305>>> mod.doc
306'This is an example module.\n\nThis is the docstring.\n'
307>>> for node in mod.node.nodes:
308... print node
309...
310Function('double', ['x'], [], 0, 'Return twice the argument',
311 Stmt([Return(Mul((Name('x'), Const(2))))]))
312>>> func = mod.node.nodes[0]
313>>> func.code
314Stmt([Return(Mul((Name('x'), Const(2))))])
315\end{verbatim}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000316
Fred Drake834a85a2001-08-15 17:01:34 +0000317\section{Using Visitors to Walk ASTs}
318
319\declaremodule{}{compiler.visitor}
320
321The visitor pattern is ... The \refmodule{compiler} package uses a
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000322variant on the visitor pattern that takes advantage of Python's
323introspection features to elminiate the need for much of the visitor's
324infrastructure.
325
326The classes being visited do not need to be programmed to accept
327visitors. The visitor need only define visit methods for classes it
328is specifically interested in; a default visit method can handle the
329rest.
330
331XXX The magic \method{visit()} method for visitors.
332
Fred Drake834a85a2001-08-15 17:01:34 +0000333\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
334\end{funcdesc}
335
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000336\begin{classdesc}{ASTVisitor}{}
337
338The \class{ASTVisitor} is responsible for walking over the tree in the
339correct order. A walk begins with a call to \method{preorder()}. For
Fred Drake42caf3f2001-08-15 14:35:13 +0000340each node, it checks the \var{visitor} argument to \method{preorder()}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000341for a method named `visitNodeType,' where NodeType is the name of the
Fred Drake42caf3f2001-08-15 14:35:13 +0000342node's class, e.g. for a \class{While} node a \method{visitWhile()}
Fred Drake4e6a3fe2001-08-15 18:48:10 +0000343would be called. If the method exists, it is called with the node as
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000344its first argument.
345
346The visitor method for a particular node type can control how child
347nodes are visited during the walk. The \class{ASTVisitor} modifies
348the visitor argument by adding a visit method to the visitor; this
349method can be used to visit a particular child node. If no visitor is
Fred Drake42caf3f2001-08-15 14:35:13 +0000350found for a particular node type, the \method{default()} method is
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000351called.
Fred Drake834a85a2001-08-15 17:01:34 +0000352\end{classdesc}
353
354\class{ASTVisitor} objects have the following methods:
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000355
356XXX describe extra arguments
357
Fred Drake834a85a2001-08-15 17:01:34 +0000358\begin{methoddesc}{default}{node\optional{, \moreargs}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000359\end{methoddesc}
360
Fred Drake834a85a2001-08-15 17:01:34 +0000361\begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000362\end{methoddesc}
363
364\begin{methoddesc}{preorder}{tree, visitor}
365\end{methoddesc}
366
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000367
Fred Drake834a85a2001-08-15 17:01:34 +0000368\section{Bytecode Generation}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000369
Fred Drake834a85a2001-08-15 17:01:34 +0000370The code generator is a visitor that emits bytecodes. Each visit method
Fred Drake42caf3f2001-08-15 14:35:13 +0000371can call the \method{emit()} method to emit a new bytecode. The basic
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000372code generator is specialized for modules, classes, and functions. An
373assembler converts that emitted instructions to the low-level bytecode
374format. It handles things like generator of constant lists of code
375objects and calculation of jump offsets.
376
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000377
Fred Drake834a85a2001-08-15 17:01:34 +0000378\input{compiler.ind} % Index
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000379
380\end{document}