blob: 824ff089433c5260bde48af9761fe1629d302666 [file] [log] [blame]
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00001% Complete documentation on the extended LaTeX markup used for Python
2% documentation is available in ``Documenting Python'', which is part
3% of the standard documentation for Python. It may be found online
4% at:
5%
6% http://www.python.org/doc/current/doc/doc.html
7
Fred Drake834a85a2001-08-15 17:01:34 +00008\documentclass{howto}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00009
10\title{Python compiler package}
11
12\author{Jeremy Hylton}
13
14% Please at least include a long-lived email address;
15% the rest is at your discretion.
16\authoraddress{
17 PythonLabs \\
Fred Drake834a85a2001-08-15 17:01:34 +000018 Zope Corporation \\
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000019 Email: \email{jeremy@zope.com}
20}
21
22\date{August 15, 2001} % update before release!
23 % Use an explicit date so that reformatting
24 % doesn't cause a new date to be used. Setting
25 % the date to \today can be used during draft
26 % stages to make it easier to handle versions.
27
28\release{2.2} % release version; this is used to define the
29 % \version macro
30
31\makeindex % tell \index to actually write the .idx file
32\makemodindex % If this contains a lot of module sections.
33
34
35\begin{document}
36
37\maketitle
38
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000039\begin{abstract}
40
41\noindent
42The Python compiler package is a tool for analyzing Python source code
43and generating Python bytecode. The compiler contains libraries to
44generate an abstract syntax tree from Python source code and to
45generate Python bytecode from the tree.
46
47\end{abstract}
48
49\tableofcontents
50
Fred Drake834a85a2001-08-15 17:01:34 +000051
52\section{Introduction\label{Introduction}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000053
54XXX Need basic intro
55
56XXX what are the major advantages... the abstract syntax is much
57closer to the python source...
58
Fred Drake834a85a2001-08-15 17:01:34 +000059
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000060\section{The basic interface}
61
Fred Drake834a85a2001-08-15 17:01:34 +000062\declaremodule{}{compiler}
63
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000064The top-level of the package defines four functions.
65
66\begin{funcdesc}{parse}{buf}
67Returns an abstract syntax tree for the Python source code in \var{buf}.
68The function raises SyntaxError if there is an error in the source
69code. The return value is a \class{compiler.ast.Module} instance that
70contains the tree.
71\end{funcdesc}
72
73\begin{funcdesc}{parseFile}{path}
74Return an abstract syntax tree for the Python source code in the file
Fred Drake42caf3f2001-08-15 14:35:13 +000075specified by \var{path}. It is equivalent to
76\code{parse(open(\var{path}).read())}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000077\end{funcdesc}
78
Fred Drake834a85a2001-08-15 17:01:34 +000079\begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000080Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
81appropriate method on the \var{visitor} instance for each node
Fred Drake834a85a2001-08-15 17:01:34 +000082encountered.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000083\end{funcdesc}
84
Fred Drake834a85a2001-08-15 17:01:34 +000085\begin{funcdesc}{compile}{path}
86Compile the file \var{path} and generate the corresponding \file{.pyc}
87file.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000088\end{funcdesc}
89
90The \module{compiler} package contains the following modules:
Fred Drake834a85a2001-08-15 17:01:34 +000091\refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
92\module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
93\module{transformer}, and \refmodule[compiler.visitor]{visitor}.
94
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000095
96\section{Limitations}
97
98There are some problems with the error checking of the compiler
99package. The interpreter detects syntax errors in two distinct
100phases. One set of errors is detected by the interpreter's parser,
101the other set by the compiler. The compiler package relies on the
102interpreter's parser, so it get the first phases of error checking for
103free. It implements the second phase itself, and that implement is
104incomplete. For example, the compiler package does not raise an error
105if a name appears more than once in an argument list:
106\code{def f(x, x): ...}
107
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000108
Fred Drake834a85a2001-08-15 17:01:34 +0000109\section{Python Abstract Syntax}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000110
111The \module{compiler.ast} module defines an abstract syntax for
112Python. In the abstract syntax tree, each node represents a syntactic
113construct. The root of the tree is \class{Module} object.
114
115The abstract syntax offers a higher level interface to parsed Python
Fred Drake834a85a2001-08-15 17:01:34 +0000116source code. The \ulink{\module{parser}}
117{http://www.python.org/doc/current/lib/module-parser.html}
118module and the compiler written in C for the Python interpreter use a
119concrete syntax tree. The concrete syntax is tied closely to the
120grammar description used for the Python parser. Instead of a single
121node for a construct, there are often several levels of nested nodes
122that are introduced by Python's precedence rules.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000123
124The abstract syntax tree is created by the
125\module{compiler.transformer} module. The transformer relies on the
126builtin Python parser to generate a concrete syntax tree. It
127generates an abstract syntax tree from the concrete tree.
128
Fred Drake834a85a2001-08-15 17:01:34 +0000129The \module{transformer} module was created by Greg
130Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
131experimental Python-to-C compiler. The current version contains a
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000132number of modifications and improvements, but the basic form of the
133abstract syntax and of the transformer are due to Stein and Tutt.
134
Fred Drake834a85a2001-08-15 17:01:34 +0000135
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000136\section{AST Nodes}
137
Fred Drake834a85a2001-08-15 17:01:34 +0000138\declaremodule{}{compiler.ast}
139
140The \module{compiler.ast} module is generated from a text file that
141describes each node type and its elements. Each node type is
142represented as a class that inherits from the abstract base class
143\class{compiler.ast.Node} and defines a set of named attributes for
144child nodes.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000145
146\begin{classdesc}{Node}{}
147
148 The \class{Node} instances are created automatically by the parser
149 generator. The recommended interface for specific \class{Node}
150 instances is to use the public attributes to access child nodes. A
151 public attribute may be bound to a single node or to a sequence of
152 nodes, depending on the \class{Node} type. For example, the
153 \member{bases} attribute of the \class{Class} node, is bound to a
154 list of base class nodes, and the \member{doc} attribute is bound to
155 a single node.
156
157 Each \class{Node} instance has a \member{lineno} attribute which may
158 be \code{None}. XXX Not sure what the rules are for which nodes
159 will have a useful lineno.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000160\end{classdesc}
161
Fred Drake834a85a2001-08-15 17:01:34 +0000162All \class{Node} objects offer the following methods:
163
164\begin{methoddesc}{getChildren}{}
165 Returns a flattened list of the child nodes and objects in the
166 order they occur. Specifically, the order of the nodes is the
167 order in which they appear in the Python grammar. Not all of the
168 children are \class{Node} instances. The names of functions and
169 classes, for example, are plain strings.
170\end{methoddesc}
171
172\begin{methoddesc}{getChildNodes}{}
173 Returns a flattened list of the child nodes in the order they
174 occur. This method is like \method{getChildren()}, except that it
175 only returns those children that are \class{Node} instances.
176\end{methoddesc}
177
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000178Two examples illustrate the general structure of \class{Node}
Fred Drake834a85a2001-08-15 17:01:34 +0000179classes. The \keyword{while} statement is defined by the following
180grammar production:
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000181
182\begin{verbatim}
183while_stmt: "while" expression ":" suite
184 ["else" ":" suite]
185\end{verbatim}
186
187The \class{While} node has three attributes: \member{test},
188\member{body}, and \member{else_}. (If the natural name for an
189attribute is also a Python reserved word, it can't be used as an
Fred Drake834a85a2001-08-15 17:01:34 +0000190attribute name. An underscore is appended to the word to make it a
191legal identifier, hence \member{else_} instead of \keyword{else}.)
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000192
Fred Drake834a85a2001-08-15 17:01:34 +0000193The \keyword{if} statement is more complicated because it can include
194several tests.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000195
196\begin{verbatim}
197if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
198\end{verbatim}
199
200The \class{If} node only defines two attributes: \member{tests} and
201\member{else_}. The \member{tests} attribute is a sequence of test
Fred Drake834a85a2001-08-15 17:01:34 +0000202expression, consequent body pairs. There is one pair for each
203\keyword{if}/\keyword{elif} clause. The first element of the pair is
204the test expression. The second elements is a \class{Stmt} node that
205contains the code to execute if the test is true.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000206
207The \method{getChildren()} method of \class{If} returns a flat list of
Fred Drake834a85a2001-08-15 17:01:34 +0000208child nodes. If there are three \keyword{if}/\keyword{elif} clauses
209and no \keyword{else} clause, then \method{getChildren()} will return
210a list of six elements: the first test expression, the first
211\class{Stmt}, the second text expression, etc.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000212
213The following table lists each of the \class{Node} subclasses defined
214in \module{compiler.ast} and each of the public attributes available
215on their instances. The values of most of the attributes are
216themselves \class{Node} instances or sequences of instances. When the
217value is something other than an instance, the type is noted in the
218comment. The attributes are listed in the order in which they are
Fred Drake42caf3f2001-08-15 14:35:13 +0000219returned by \method{getChildren()} and \method{getChildNodes()}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000220
221\input{asttable}
222
Fred Drake834a85a2001-08-15 17:01:34 +0000223
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000224\section{Assignment nodes}
225
226There is a collection of nodes used to represent assignments. Each
227assignment statement in the source code becomes a single
228\class{Assign} node in the AST. The \member{nodes} attribute is a
229list that contains a node for each assignment target. This is
230necessary because assignment can be chained, e.g. \code{a = b = 2}.
231Each \class{Node} in the list will be one of the following classes:
232\class{AssAttr}, \class{AssList}, \class{AssName}, or
233\class{AssTuple}.
234
235XXX Explain what the AssXXX nodes are for. Mention \code{a.b.c = 2}
236as an example. Explain what the flags are for.
237
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000238
Fred Drake834a85a2001-08-15 17:01:34 +0000239\section{Using Visitors to Walk ASTs}
240
241\declaremodule{}{compiler.visitor}
242
243The visitor pattern is ... The \refmodule{compiler} package uses a
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000244variant on the visitor pattern that takes advantage of Python's
245introspection features to elminiate the need for much of the visitor's
246infrastructure.
247
248The classes being visited do not need to be programmed to accept
249visitors. The visitor need only define visit methods for classes it
250is specifically interested in; a default visit method can handle the
251rest.
252
253XXX The magic \method{visit()} method for visitors.
254
Fred Drake834a85a2001-08-15 17:01:34 +0000255\begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
256\end{funcdesc}
257
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000258\begin{classdesc}{ASTVisitor}{}
259
260The \class{ASTVisitor} is responsible for walking over the tree in the
261correct order. A walk begins with a call to \method{preorder()}. For
Fred Drake42caf3f2001-08-15 14:35:13 +0000262each node, it checks the \var{visitor} argument to \method{preorder()}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000263for a method named `visitNodeType,' where NodeType is the name of the
Fred Drake42caf3f2001-08-15 14:35:13 +0000264node's class, e.g. for a \class{While} node a \method{visitWhile()}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000265would be called . If the method exists, it is called with the node as
266its first argument.
267
268The visitor method for a particular node type can control how child
269nodes are visited during the walk. The \class{ASTVisitor} modifies
270the visitor argument by adding a visit method to the visitor; this
271method can be used to visit a particular child node. If no visitor is
Fred Drake42caf3f2001-08-15 14:35:13 +0000272found for a particular node type, the \method{default()} method is
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000273called.
Fred Drake834a85a2001-08-15 17:01:34 +0000274\end{classdesc}
275
276\class{ASTVisitor} objects have the following methods:
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000277
278XXX describe extra arguments
279
Fred Drake834a85a2001-08-15 17:01:34 +0000280\begin{methoddesc}{default}{node\optional{, \moreargs}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000281\end{methoddesc}
282
Fred Drake834a85a2001-08-15 17:01:34 +0000283\begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000284\end{methoddesc}
285
286\begin{methoddesc}{preorder}{tree, visitor}
287\end{methoddesc}
288
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000289
Fred Drake834a85a2001-08-15 17:01:34 +0000290\section{Bytecode Generation}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000291
Fred Drake834a85a2001-08-15 17:01:34 +0000292The code generator is a visitor that emits bytecodes. Each visit method
Fred Drake42caf3f2001-08-15 14:35:13 +0000293can call the \method{emit()} method to emit a new bytecode. The basic
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000294code generator is specialized for modules, classes, and functions. An
295assembler converts that emitted instructions to the low-level bytecode
296format. It handles things like generator of constant lists of code
297objects and calculation of jump offsets.
298
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000299
Fred Drake834a85a2001-08-15 17:01:34 +0000300\input{compiler.ind} % Index
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000301
302\end{document}