blob: d4f4124ef744f880dcce322ac80611c23d7b565b [file] [log] [blame]
Fred Drakee2f99172001-09-27 20:06:07 +00001\chapter{Python compiler package \label{compiler}}
2
3\sectionauthor{Jeremy Hylton}{jeremy@zope.com}
4
5
6The Python compiler package is a tool for analyzing Python source code
7and generating Python bytecode. The compiler contains libraries to
8generate an abstract syntax tree from Python source code and to
9generate Python bytecode from the tree.
10
11The \refmodule{compiler} package is a Python source to bytecode
12translator written in Python. It uses the built-in parser and
13standard \refmodule{parser} module to generated a concrete syntax
14tree. This tree is used to generate an abstract syntax tree (AST) and
15then Python bytecode.
16
17The full functionality of the package duplicates the builtin compiler
18provided with the Python interpreter. It is intended to match its
19behavior almost exactly. Why implement another compiler that does the
20same thing? The package is useful for a variety of purposes. It can
21be modified more easily than the builtin compiler. The AST it
22generates is useful for analyzing Python source code.
23
24This chapter explains how the various components of the
25\refmodule{compiler} package work. It blends reference material with
26a tutorial.
27
28The following modules are part of the \refmodule{compiler} package:
29
30\localmoduletable
31
32
Fred Drake67bd6832001-11-02 19:41:23 +000033\section{The basic interface}
Fred Drakee2f99172001-09-27 20:06:07 +000034
35\declaremodule{}{compiler}
36
37The top-level of the package defines four functions. If you import
38\module{compiler}, you will get these functions and a collection of
39modules contained in the package.
40
41\begin{funcdesc}{parse}{buf}
42Returns an abstract syntax tree for the Python source code in \var{buf}.
Thomas Wouters49fd7fa2006-04-21 10:40:58 +000043The function raises \exception{SyntaxError} if there is an error in the
44source code. The return value is a \class{compiler.ast.Module} instance
45that contains the tree.
Fred Drakee2f99172001-09-27 20:06:07 +000046\end{funcdesc}
47
48\begin{funcdesc}{parseFile}{path}
49Return an abstract syntax tree for the Python source code in the file
50specified by \var{path}. It is equivalent to
51\code{parse(open(\var{path}).read())}.
52\end{funcdesc}
53
54\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
55Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
56appropriate method on the \var{visitor} instance for each node
57encountered.
58\end{funcdesc}
59
Jeremy Hyltondb5a93c2001-12-04 02:48:52 +000060\begin{funcdesc}{compile}{source, filename, mode, flags=None,
61 dont_inherit=None}
62Compile the string \var{source}, a Python module, statement or
63expression, into a code object that can be executed by the exec
64statement or \function{eval()}. This function is a replacement for the
65built-in \function{compile()} function.
66
67The \var{filename} will be used for run-time error messages.
68
69The \var{mode} must be 'exec' to compile a module, 'single' to compile a
70single (interactive) statement, or 'eval' to compile an expression.
71
72The \var{flags} and \var{dont_inherit} arguments affect future-related
73statements, but are not supported yet.
74\end{funcdesc}
75
76\begin{funcdesc}{compileFile}{source}
77Compiles the file \var{source} and generates a .pyc file.
Fred Drakee2f99172001-09-27 20:06:07 +000078\end{funcdesc}
79
80The \module{compiler} package contains the following modules:
81\refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
82\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
83\module{transformer}, and \refmodule[compiler.visitor]{visitor}.
84
Fred Drake67bd6832001-11-02 19:41:23 +000085\section{Limitations}
Fred Drakee2f99172001-09-27 20:06:07 +000086
87There are some problems with the error checking of the compiler
88package. The interpreter detects syntax errors in two distinct
89phases. One set of errors is detected by the interpreter's parser,
90the other set by the compiler. The compiler package relies on the
91interpreter's parser, so it get the first phases of error checking for
Brett Cannon82b24822003-10-30 05:42:15 +000092free. It implements the second phase itself, and that implementation is
Fred Drakee2f99172001-09-27 20:06:07 +000093incomplete. For example, the compiler package does not raise an error
94if a name appears more than once in an argument list:
95\code{def f(x, x): ...}
96
97A future version of the compiler should fix these problems.
98
99\section{Python Abstract Syntax}
100
101The \module{compiler.ast} module defines an abstract syntax for
102Python. In the abstract syntax tree, each node represents a syntactic
103construct. The root of the tree is \class{Module} object.
104
105The abstract syntax offers a higher level interface to parsed Python
Guido van Rossumd8faa362007-04-27 19:54:29 +0000106source code. The \refmodule{parser}
Fred Drakee2f99172001-09-27 20:06:07 +0000107module and the compiler written in C for the Python interpreter use a
108concrete syntax tree. The concrete syntax is tied closely to the
109grammar description used for the Python parser. Instead of a single
110node for a construct, there are often several levels of nested nodes
111that are introduced by Python's precedence rules.
112
113The abstract syntax tree is created by the
114\module{compiler.transformer} module. The transformer relies on the
115builtin Python parser to generate a concrete syntax tree. It
116generates an abstract syntax tree from the concrete tree.
117
118The \module{transformer} module was created by Greg
119Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
120experimental Python-to-C compiler. The current version contains a
121number of modifications and improvements, but the basic form of the
122abstract syntax and of the transformer are due to Stein and Tutt.
123
124\subsection{AST Nodes}
125
126\declaremodule{}{compiler.ast}
127
128The \module{compiler.ast} module is generated from a text file that
129describes each node type and its elements. Each node type is
130represented as a class that inherits from the abstract base class
131\class{compiler.ast.Node} and defines a set of named attributes for
132child nodes.
133
134\begin{classdesc}{Node}{}
135
136 The \class{Node} instances are created automatically by the parser
137 generator. The recommended interface for specific \class{Node}
138 instances is to use the public attributes to access child nodes. A
139 public attribute may be bound to a single node or to a sequence of
140 nodes, depending on the \class{Node} type. For example, the
141 \member{bases} attribute of the \class{Class} node, is bound to a
142 list of base class nodes, and the \member{doc} attribute is bound to
143 a single node.
144
145 Each \class{Node} instance has a \member{lineno} attribute which may
146 be \code{None}. XXX Not sure what the rules are for which nodes
147 will have a useful lineno.
148\end{classdesc}
149
150All \class{Node} objects offer the following methods:
151
152\begin{methoddesc}{getChildren}{}
153 Returns a flattened list of the child nodes and objects in the
154 order they occur. Specifically, the order of the nodes is the
155 order in which they appear in the Python grammar. Not all of the
156 children are \class{Node} instances. The names of functions and
157 classes, for example, are plain strings.
158\end{methoddesc}
159
160\begin{methoddesc}{getChildNodes}{}
161 Returns a flattened list of the child nodes in the order they
162 occur. This method is like \method{getChildren()}, except that it
163 only returns those children that are \class{Node} instances.
164\end{methoddesc}
165
166Two examples illustrate the general structure of \class{Node}
167classes. The \keyword{while} statement is defined by the following
168grammar production:
169
170\begin{verbatim}
171while_stmt: "while" expression ":" suite
172 ["else" ":" suite]
173\end{verbatim}
174
175The \class{While} node has three attributes: \member{test},
176\member{body}, and \member{else_}. (If the natural name for an
177attribute is also a Python reserved word, it can't be used as an
178attribute name. An underscore is appended to the word to make it a
179legal identifier, hence \member{else_} instead of \keyword{else}.)
180
181The \keyword{if} statement is more complicated because it can include
182several tests.
183
184\begin{verbatim}
185if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
186\end{verbatim}
187
188The \class{If} node only defines two attributes: \member{tests} and
189\member{else_}. The \member{tests} attribute is a sequence of test
190expression, consequent body pairs. There is one pair for each
191\keyword{if}/\keyword{elif} clause. The first element of the pair is
192the test expression. The second elements is a \class{Stmt} node that
193contains the code to execute if the test is true.
194
195The \method{getChildren()} method of \class{If} returns a flat list of
196child nodes. If there are three \keyword{if}/\keyword{elif} clauses
197and no \keyword{else} clause, then \method{getChildren()} will return
198a list of six elements: the first test expression, the first
199\class{Stmt}, the second text expression, etc.
200
201The following table lists each of the \class{Node} subclasses defined
202in \module{compiler.ast} and each of the public attributes available
203on their instances. The values of most of the attributes are
204themselves \class{Node} instances or sequences of instances. When the
205value is something other than an instance, the type is noted in the
206comment. The attributes are listed in the order in which they are
207returned by \method{getChildren()} and \method{getChildNodes()}.
208
209\input{asttable}
210
211
212\subsection{Assignment nodes}
213
214There is a collection of nodes used to represent assignments. Each
215assignment statement in the source code becomes a single
216\class{Assign} node in the AST. The \member{nodes} attribute is a
217list that contains a node for each assignment target. This is
218necessary because assignment can be chained, e.g. \code{a = b = 2}.
219Each \class{Node} in the list will be one of the following classes:
220\class{AssAttr}, \class{AssList}, \class{AssName}, or
221\class{AssTuple}.
222
223Each target assignment node will describe the kind of object being
224assigned to: \class{AssName} for a simple name, e.g. \code{a = 1}.
225\class{AssAttr} for an attribute assigned, e.g. \code{a.x = 1}.
226\class{AssList} and \class{AssTuple} for list and tuple expansion
227respectively, e.g. \code{a, b, c = a_tuple}.
228
229The target assignment nodes also have a \member{flags} attribute that
230indicates whether the node is being used for assignment or in a delete
231statement. The \class{AssName} is also used to represent a delete
232statement, e.g. \class{del x}.
233
234When an expression contains several attribute references, an
235assignment or delete statement will contain only one \class{AssAttr}
236node -- for the final attribute reference. The other attribute
237references will be represented as \class{Getattr} nodes in the
238\member{expr} attribute of the \class{AssAttr} instance.
239
240\subsection{Examples}
241
242This section shows several simple examples of ASTs for Python source
243code. The examples demonstrate how to use the \function{parse()}
244function, what the repr of an AST looks like, and how to access
245attributes of an AST node.
246
247The first module defines a single function. Assume it is stored in
248\file{/tmp/doublelib.py}.
249
250\begin{verbatim}
251"""This is an example module.
252
253This is the docstring.
254"""
255
256def double(x):
257 "Return twice the argument"
258 return x * 2
259\end{verbatim}
260
261In the interactive interpreter session below, I have reformatted the
262long AST reprs for readability. The AST reprs use unqualified class
263names. If you want to create an instance from a repr, you must import
264the class names from the \module{compiler.ast} module.
265
266\begin{verbatim}
267>>> import compiler
268>>> mod = compiler.parseFile("/tmp/doublelib.py")
269>>> mod
270Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000271 Stmt([Function(None, 'double', ['x'], [], 0,
272 'Return twice the argument',
273 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000274>>> from compiler.ast import *
275>>> Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000276... Stmt([Function(None, 'double', ['x'], [], 0,
277... 'Return twice the argument',
278... Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000279Module('This is an example module.\n\nThis is the docstring.\n',
Edward Lopera7f62812004-09-28 02:53:50 +0000280 Stmt([Function(None, 'double', ['x'], [], 0,
281 'Return twice the argument',
282 Stmt([Return(Mul((Name('x'), Const(2))))]))]))
Fred Drakee2f99172001-09-27 20:06:07 +0000283>>> mod.doc
284'This is an example module.\n\nThis is the docstring.\n'
285>>> for node in mod.node.nodes:
286... print node
287...
Edward Lopera7f62812004-09-28 02:53:50 +0000288Function(None, 'double', ['x'], [], 0, 'Return twice the argument',
Fred Drakee2f99172001-09-27 20:06:07 +0000289 Stmt([Return(Mul((Name('x'), Const(2))))]))
290>>> func = mod.node.nodes[0]
291>>> func.code
292Stmt([Return(Mul((Name('x'), Const(2))))])
293\end{verbatim}
294
295\section{Using Visitors to Walk ASTs}
296
297\declaremodule{}{compiler.visitor}
298
299The visitor pattern is ... The \refmodule{compiler} package uses a
300variant on the visitor pattern that takes advantage of Python's
Raymond Hettinger68804312005-01-01 00:28:46 +0000301introspection features to eliminate the need for much of the visitor's
Fred Drakee2f99172001-09-27 20:06:07 +0000302infrastructure.
303
304The classes being visited do not need to be programmed to accept
305visitors. The visitor need only define visit methods for classes it
306is specifically interested in; a default visit method can handle the
307rest.
308
309XXX The magic \method{visit()} method for visitors.
310
311\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
312\end{funcdesc}
313
314\begin{classdesc}{ASTVisitor}{}
315
316The \class{ASTVisitor} is responsible for walking over the tree in the
317correct order. A walk begins with a call to \method{preorder()}. For
318each node, it checks the \var{visitor} argument to \method{preorder()}
319for a method named `visitNodeType,' where NodeType is the name of the
320node's class, e.g. for a \class{While} node a \method{visitWhile()}
321would be called. If the method exists, it is called with the node as
322its first argument.
323
324The visitor method for a particular node type can control how child
325nodes are visited during the walk. The \class{ASTVisitor} modifies
326the visitor argument by adding a visit method to the visitor; this
327method can be used to visit a particular child node. If no visitor is
328found for a particular node type, the \method{default()} method is
329called.
330\end{classdesc}
331
332\class{ASTVisitor} objects have the following methods:
333
334XXX describe extra arguments
335
336\begin{methoddesc}{default}{node\optional{, \moreargs}}
337\end{methoddesc}
338
339\begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
340\end{methoddesc}
341
342\begin{methoddesc}{preorder}{tree, visitor}
343\end{methoddesc}
344
345
346\section{Bytecode Generation}
347
348The code generator is a visitor that emits bytecodes. Each visit method
349can call the \method{emit()} method to emit a new bytecode. The basic
350code generator is specialized for modules, classes, and functions. An
351assembler converts that emitted instructions to the low-level bytecode
352format. It handles things like generator of constant lists of code
353objects and calculation of jump offsets.