blob: e619a9af36b5f381e299b6bf047af331596d73d9 [file] [log] [blame]
Fred Drakee2f99172001-09-27 20:06:07 +00001\chapter{Python compiler package \label{compiler}}
2
3\sectionauthor{Jeremy Hylton}{jeremy@zope.com}
4
5
6The Python compiler package is a tool for analyzing Python source code
7and generating Python bytecode. The compiler contains libraries to
8generate an abstract syntax tree from Python source code and to
9generate Python bytecode from the tree.
10
11The \refmodule{compiler} package is a Python source to bytecode
12translator written in Python. It uses the built-in parser and
13standard \refmodule{parser} module to generated a concrete syntax
14tree. This tree is used to generate an abstract syntax tree (AST) and
15then Python bytecode.
16
17The full functionality of the package duplicates the builtin compiler
18provided with the Python interpreter. It is intended to match its
19behavior almost exactly. Why implement another compiler that does the
20same thing? The package is useful for a variety of purposes. It can
21be modified more easily than the builtin compiler. The AST it
22generates is useful for analyzing Python source code.
23
24This chapter explains how the various components of the
25\refmodule{compiler} package work. It blends reference material with
26a tutorial.
27
28The following modules are part of the \refmodule{compiler} package:
29
30\localmoduletable
31
32
Fred Drake67bd6832001-11-02 19:41:23 +000033\section{The basic interface}
Fred Drakee2f99172001-09-27 20:06:07 +000034
35\declaremodule{}{compiler}
36
37The top-level of the package defines four functions. If you import
38\module{compiler}, you will get these functions and a collection of
39modules contained in the package.
40
41\begin{funcdesc}{parse}{buf}
42Returns an abstract syntax tree for the Python source code in \var{buf}.
43The function raises SyntaxError if there is an error in the source
44code. The return value is a \class{compiler.ast.Module} instance that
45contains the tree.
46\end{funcdesc}
47
48\begin{funcdesc}{parseFile}{path}
49Return an abstract syntax tree for the Python source code in the file
50specified by \var{path}. It is equivalent to
51\code{parse(open(\var{path}).read())}.
52\end{funcdesc}
53
54\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
55Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
56appropriate method on the \var{visitor} instance for each node
57encountered.
58\end{funcdesc}
59
Jeremy Hyltondb5a93c2001-12-04 02:48:52 +000060\begin{funcdesc}{compile}{source, filename, mode, flags=None,
61 dont_inherit=None}
62Compile the string \var{source}, a Python module, statement or
63expression, into a code object that can be executed by the exec
64statement or \function{eval()}. This function is a replacement for the
65built-in \function{compile()} function.
66
67The \var{filename} will be used for run-time error messages.
68
69The \var{mode} must be 'exec' to compile a module, 'single' to compile a
70single (interactive) statement, or 'eval' to compile an expression.
71
72The \var{flags} and \var{dont_inherit} arguments affect future-related
73statements, but are not supported yet.
74\end{funcdesc}
75
76\begin{funcdesc}{compileFile}{source}
77Compiles the file \var{source} and generates a .pyc file.
Fred Drakee2f99172001-09-27 20:06:07 +000078\end{funcdesc}
79
80The \module{compiler} package contains the following modules:
81\refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
82\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
83\module{transformer}, and \refmodule[compiler.visitor]{visitor}.
84
Fred Drake67bd6832001-11-02 19:41:23 +000085\section{Limitations}
Fred Drakee2f99172001-09-27 20:06:07 +000086
87There are some problems with the error checking of the compiler
88package. The interpreter detects syntax errors in two distinct
89phases. One set of errors is detected by the interpreter's parser,
90the other set by the compiler. The compiler package relies on the
91interpreter's parser, so it get the first phases of error checking for
Brett Cannon82b24822003-10-30 05:42:15 +000092free. It implements the second phase itself, and that implementation is
Fred Drakee2f99172001-09-27 20:06:07 +000093incomplete. For example, the compiler package does not raise an error
94if a name appears more than once in an argument list:
95\code{def f(x, x): ...}
96
97A future version of the compiler should fix these problems.
98
99\section{Python Abstract Syntax}
100
101The \module{compiler.ast} module defines an abstract syntax for
102Python. In the abstract syntax tree, each node represents a syntactic
103construct. The root of the tree is \class{Module} object.
104
105The abstract syntax offers a higher level interface to parsed Python
106source code. The \ulink{\module{parser}}
107{http://www.python.org/doc/current/lib/module-parser.html}
108module and the compiler written in C for the Python interpreter use a
109concrete syntax tree. The concrete syntax is tied closely to the
110grammar description used for the Python parser. Instead of a single
111node for a construct, there are often several levels of nested nodes
112that are introduced by Python's precedence rules.
113
114The abstract syntax tree is created by the
115\module{compiler.transformer} module. The transformer relies on the
116builtin Python parser to generate a concrete syntax tree. It
117generates an abstract syntax tree from the concrete tree.
118
119The \module{transformer} module was created by Greg
120Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
121experimental Python-to-C compiler. The current version contains a
122number of modifications and improvements, but the basic form of the
123abstract syntax and of the transformer are due to Stein and Tutt.
124
125\subsection{AST Nodes}
126
127\declaremodule{}{compiler.ast}
128
129The \module{compiler.ast} module is generated from a text file that
130describes each node type and its elements. Each node type is
131represented as a class that inherits from the abstract base class
132\class{compiler.ast.Node} and defines a set of named attributes for
133child nodes.
134
135\begin{classdesc}{Node}{}
136
137 The \class{Node} instances are created automatically by the parser
138 generator. The recommended interface for specific \class{Node}
139 instances is to use the public attributes to access child nodes. A
140 public attribute may be bound to a single node or to a sequence of
141 nodes, depending on the \class{Node} type. For example, the
142 \member{bases} attribute of the \class{Class} node, is bound to a
143 list of base class nodes, and the \member{doc} attribute is bound to
144 a single node.
145
146 Each \class{Node} instance has a \member{lineno} attribute which may
147 be \code{None}. XXX Not sure what the rules are for which nodes
148 will have a useful lineno.
149\end{classdesc}
150
151All \class{Node} objects offer the following methods:
152
153\begin{methoddesc}{getChildren}{}
154 Returns a flattened list of the child nodes and objects in the
155 order they occur. Specifically, the order of the nodes is the
156 order in which they appear in the Python grammar. Not all of the
157 children are \class{Node} instances. The names of functions and
158 classes, for example, are plain strings.
159\end{methoddesc}
160
161\begin{methoddesc}{getChildNodes}{}
162 Returns a flattened list of the child nodes in the order they
163 occur. This method is like \method{getChildren()}, except that it
164 only returns those children that are \class{Node} instances.
165\end{methoddesc}
166
167Two examples illustrate the general structure of \class{Node}
168classes. The \keyword{while} statement is defined by the following
169grammar production:
170
171\begin{verbatim}
172while_stmt: "while" expression ":" suite
173 ["else" ":" suite]
174\end{verbatim}
175
176The \class{While} node has three attributes: \member{test},
177\member{body}, and \member{else_}. (If the natural name for an
178attribute is also a Python reserved word, it can't be used as an
179attribute name. An underscore is appended to the word to make it a
180legal identifier, hence \member{else_} instead of \keyword{else}.)
181
182The \keyword{if} statement is more complicated because it can include
183several tests.
184
185\begin{verbatim}
186if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
187\end{verbatim}
188
189The \class{If} node only defines two attributes: \member{tests} and
190\member{else_}. The \member{tests} attribute is a sequence of test
191expression, consequent body pairs. There is one pair for each
192\keyword{if}/\keyword{elif} clause. The first element of the pair is
193the test expression. The second elements is a \class{Stmt} node that
194contains the code to execute if the test is true.
195
196The \method{getChildren()} method of \class{If} returns a flat list of
197child nodes. If there are three \keyword{if}/\keyword{elif} clauses
198and no \keyword{else} clause, then \method{getChildren()} will return
199a list of six elements: the first test expression, the first
200\class{Stmt}, the second text expression, etc.
201
202The following table lists each of the \class{Node} subclasses defined
203in \module{compiler.ast} and each of the public attributes available
204on their instances. The values of most of the attributes are
205themselves \class{Node} instances or sequences of instances. When the
206value is something other than an instance, the type is noted in the
207comment. The attributes are listed in the order in which they are
208returned by \method{getChildren()} and \method{getChildNodes()}.
209
210\input{asttable}
211
212
213\subsection{Assignment nodes}
214
215There is a collection of nodes used to represent assignments. Each
216assignment statement in the source code becomes a single
217\class{Assign} node in the AST. The \member{nodes} attribute is a
218list that contains a node for each assignment target. This is
219necessary because assignment can be chained, e.g. \code{a = b = 2}.
220Each \class{Node} in the list will be one of the following classes:
221\class{AssAttr}, \class{AssList}, \class{AssName}, or
222\class{AssTuple}.
223
224Each target assignment node will describe the kind of object being
225assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}.
226\class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}.
227\class{AssList} and \class{AssTuple} for list and tuple expansion
228respectively, e.g. \code{a, b, c = a_tuple}.
229
230The target assignment nodes also have a \member{flags} attribute that
231indicates whether the node is being used for assignment or in a delete
232statement. The \class{AssName} is also used to represent a delete
233statement, e.g. \class{del x}.
234
235When an expression contains several attribute references, an
236assignment or delete statement will contain only one \class{AssAttr}
237node -- for the final attribute reference. The other attribute
238references will be represented as \class{Getattr} nodes in the
239\member{expr} attribute of the \class{AssAttr} instance.
240
241\subsection{Examples}
242
243This section shows several simple examples of ASTs for Python source
244code. The examples demonstrate how to use the \function{parse()}
245function, what the repr of an AST looks like, and how to access
246attributes of an AST node.
247
248The first module defines a single function. Assume it is stored in
249\file{/tmp/doublelib.py}.
250
251\begin{verbatim}
252"""This is an example module.
253
254This is the docstring.
255"""
256
257def double(x):
258 "Return twice the argument"
259 return x * 2
260\end{verbatim}
261
262In the interactive interpreter session below, I have reformatted the
263long AST reprs for readability. The AST reprs use unqualified class
264names. If you want to create an instance from a repr, you must import
265the class names from the \module{compiler.ast} module.
266
267\begin{verbatim}
268>>> import compiler
269>>> mod = compiler.parseFile("/tmp/doublelib.py")
270>>> mod
271Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000272 Stmt([Function(None, 'double', ['x'], [], 0,
273 'Return twice the argument',
274 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000275>>> from compiler.ast import *
276>>> Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000277... Stmt([Function(None, 'double', ['x'], [], 0,
278... 'Return twice the argument',
279... Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000280Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000281 Stmt([Function(None, 'double', ['x'], [], 0,
282 'Return twice the argument',
283 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000284>>> mod.doc
285'This is an example module.\n\nThis is the docstring.\n'
286>>> for node in mod.node.nodes:
287... print node
288...
Edward Lopera7f62812004-09-28 02:53:50 +0000289Function(None, 'double', ['x'], [], 0, 'Return twice the argument',
Fred Drakee2f99172001-09-27 20:06:07 +0000290 Stmt([Return(Mul((Name('x'), Const(2))))]))
291>>> func = mod.node.nodes[0]
292>>> func.code
293Stmt([Return(Mul((Name('x'), Const(2))))])
294\end{verbatim}
295
296\section{Using Visitors to Walk ASTs}
297
298\declaremodule{}{compiler.visitor}
299
300The visitor pattern is ... The \refmodule{compiler} package uses a
301variant on the visitor pattern that takes advantage of Python's
Raymond Hettinger68804312005-01-01 00:28:46 +0000302introspection features to eliminate the need for much of the visitor's
Fred Drakee2f99172001-09-27 20:06:07 +0000303infrastructure.
304
305The classes being visited do not need to be programmed to accept
306visitors. The visitor need only define visit methods for classes it
307is specifically interested in; a default visit method can handle the
308rest.
309
310XXX The magic \method{visit()} method for visitors.
311
312\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
313\end{funcdesc}
314
315\begin{classdesc}{ASTVisitor}{}
316
317The \class{ASTVisitor} is responsible for walking over the tree in the
318correct order. A walk begins with a call to \method{preorder()}. For
319each node, it checks the \var{visitor} argument to \method{preorder()}
320for a method named `visitNodeType,' where NodeType is the name of the
321node's class, e.g. for a \class{While} node a \method{visitWhile()}
322would be called. If the method exists, it is called with the node as
323its first argument.
324
325The visitor method for a particular node type can control how child
326nodes are visited during the walk. The \class{ASTVisitor} modifies
327the visitor argument by adding a visit method to the visitor; this
328method can be used to visit a particular child node. If no visitor is
329found for a particular node type, the \method{default()} method is
330called.
331\end{classdesc}
332
333\class{ASTVisitor} objects have the following methods:
334
335XXX describe extra arguments
336
337\begin{methoddesc}{default}{node\optional{, \moreargs}}
338\end{methoddesc}
339
340\begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
341\end{methoddesc}
342
343\begin{methoddesc}{preorder}{tree, visitor}
344\end{methoddesc}
345
346
347\section{Bytecode Generation}
348
349The code generator is a visitor that emits bytecodes. Each visit method
350can call the \method{emit()} method to emit a new bytecode. The basic
351code generator is specialized for modules, classes, and functions. An
352assembler converts that emitted instructions to the low-level bytecode
353format. It handles things like generator of constant lists of code
354objects and calculation of jump offsets.