blob: 98ac5f27eca41c90a7926f813a295623beae75ec [file] [log] [blame]
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00001% Complete documentation on the extended LaTeX markup used for Python
2% documentation is available in ``Documenting Python'', which is part
3% of the standard documentation for Python. It may be found online
4% at:
5%
6% http://www.python.org/doc/current/doc/doc.html
7
8\documentclass{manual}
9
10\title{Python compiler package}
11
12\author{Jeremy Hylton}
13
14% Please at least include a long-lived email address;
15% the rest is at your discretion.
16\authoraddress{
17 PythonLabs \\
18 Zope Corp. \\
19 Email: \email{jeremy@zope.com}
20}
21
22\date{August 15, 2001} % update before release!
23 % Use an explicit date so that reformatting
24 % doesn't cause a new date to be used. Setting
25 % the date to \today can be used during draft
26 % stages to make it easier to handle versions.
27
28\release{2.2} % release version; this is used to define the
29 % \version macro
30
31\makeindex % tell \index to actually write the .idx file
32\makemodindex % If this contains a lot of module sections.
33
34
35\begin{document}
36
37\maketitle
38
39% This makes the contents more accessible from the front page of the HTML.
40\ifhtml
41\chapter*{Front Matter\label{front}}
42\fi
43
44%\input{copyright}
45
46\begin{abstract}
47
48\noindent
49The Python compiler package is a tool for analyzing Python source code
50and generating Python bytecode. The compiler contains libraries to
51generate an abstract syntax tree from Python source code and to
52generate Python bytecode from the tree.
53
54\end{abstract}
55
56\tableofcontents
57
58\chapter{Introduction\label{Introduction}}
59
60XXX Need basic intro
61
62XXX what are the major advantages... the abstract syntax is much
63closer to the python source...
64
65\section{The basic interface}
66
67The top-level of the package defines four functions.
68
69\begin{funcdesc}{parse}{buf}
70Returns an abstract syntax tree for the Python source code in \var{buf}.
71The function raises SyntaxError if there is an error in the source
72code. The return value is a \class{compiler.ast.Module} instance that
73contains the tree.
74\end{funcdesc}
75
76\begin{funcdesc}{parseFile}{path}
77Return an abstract syntax tree for the Python source code in the file
78specified by \var{path}. It is equivalent to \code{parse(open(path))}.
79\end{funcdesc}
80
81\begin{funcdesc}{walk}{ast, visitor, \optional{verbose=None}}
82Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
83appropriate method on the \var{visitor} instance for each node
84encountered.
85\end{funcdesc}
86
87\begin{funcdesc}{compile}{filename}
88Compile the file \var{filename} and generated \var{filename}.pyc.
89\end{funcdesc}
90
91The \module{compiler} package contains the following modules:
92\module{ast}, \module{consts}, \module{future}, \module{misc},
93\module{pyassem}, \module{pycodegen}, \module{symbols},
94\module{transformer}, and \module{visitor}.
95
96\section{Limitations}
97
98There are some problems with the error checking of the compiler
99package. The interpreter detects syntax errors in two distinct
100phases. One set of errors is detected by the interpreter's parser,
101the other set by the compiler. The compiler package relies on the
102interpreter's parser, so it get the first phases of error checking for
103free. It implements the second phase itself, and that implement is
104incomplete. For example, the compiler package does not raise an error
105if a name appears more than once in an argument list:
106\code{def f(x, x): ...}
107
108\chapter{Python Abstract Syntax}
109
110\section{Introduction}
111
112The \module{compiler.ast} module defines an abstract syntax for
113Python. In the abstract syntax tree, each node represents a syntactic
114construct. The root of the tree is \class{Module} object.
115
116The abstract syntax offers a higher level interface to parsed Python
117source code. The \module{parser} module and the compiler written in C
118for the Python interpreter use a concrete syntax tree. The concrete
119syntax is tied closely to the grammar description used for the Python
120parser. Instead of a single node for a construct, there are often
121several levels of nested nodes that are introduced by Python's
122precedence rules.
123
124The abstract syntax tree is created by the
125\module{compiler.transformer} module. The transformer relies on the
126builtin Python parser to generate a concrete syntax tree. It
127generates an abstract syntax tree from the concrete tree.
128
129The \module{transformer} module was created by Greg Stein and Bill
130Tutt for the Python-to-C compiler. The current version contains a
131number of modifications and improvements, but the basic form of the
132abstract syntax and of the transformer are due to Stein and Tutt.
133
134\section{AST Nodes}
135
136The \module{ast} module is generated from a text file that describes
137each node type and its elements. Each node type is represented as a
138class that inherits from the abstract base class \class{ast.Node} and
139defines a set of named attributes for child nodes.
140
141\begin{classdesc}{Node}{}
142
143 The \class{Node} instances are created automatically by the parser
144 generator. The recommended interface for specific \class{Node}
145 instances is to use the public attributes to access child nodes. A
146 public attribute may be bound to a single node or to a sequence of
147 nodes, depending on the \class{Node} type. For example, the
148 \member{bases} attribute of the \class{Class} node, is bound to a
149 list of base class nodes, and the \member{doc} attribute is bound to
150 a single node.
151
152 Each \class{Node} instance has a \member{lineno} attribute which may
153 be \code{None}. XXX Not sure what the rules are for which nodes
154 will have a useful lineno.
155
156 \begin{methoddesc}{getChildren}{}
157 Returns a flattened list of the child nodes and objects in the
158 order they occur. Specifically, the order of the nodes is the
159 order in which they appear in the Python grammar. Not all of the
160 children are \class{Node} instances. The names of functions and
161 classes, for example, are plain strings.
162 \end{methoddesc}
163
164 \begin{methoddesc}{getChildNodes}{}
165 Returns a flattened list of the child nodes in the order they
166 occur. This method is like \method{getChildNodes}, except that it
167 only returns those children that are \class{Node} instances.
168 \end{methoddesc}
169
170\end{classdesc}
171
172Two examples illustrate the general structure of \class{Node}
173classes. The while statement is defined by the following grammar
174production:
175
176\begin{verbatim}
177while_stmt: "while" expression ":" suite
178 ["else" ":" suite]
179\end{verbatim}
180
181The \class{While} node has three attributes: \member{test},
182\member{body}, and \member{else_}. (If the natural name for an
183attribute is also a Python reserved word, it can't be used as an
184attribute name. An underscore is appended to the word to make it
185legal, hence \code{else_} instead of \code{else}.)
186
187The if statement is more complicated because it can include several
188tests.
189
190\begin{verbatim}
191if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
192\end{verbatim}
193
194The \class{If} node only defines two attributes: \member{tests} and
195\member{else_}. The \member{tests} attribute is a sequence of test
196expression, consequent body pairs. There is one pair of each if/elif
197clause. The first element of the pair is the test expression. The
198second elements is a \class{Stmt} node that contains the code to
199execute if the test is true.
200
201The \method{getChildren()} method of \class{If} returns a flat list of
202child nodes. If there are three if/elif clauses and no else clause,
203then \method{getChildren()} will return a list of six elements: the
204first test expression, the first \class{Stmt}, the second text
205expression, etc.
206
207The following table lists each of the \class{Node} subclasses defined
208in \module{compiler.ast} and each of the public attributes available
209on their instances. The values of most of the attributes are
210themselves \class{Node} instances or sequences of instances. When the
211value is something other than an instance, the type is noted in the
212comment. The attributes are listed in the order in which they are
213returned by \method{getChildren} and \method{getChildNodes}.
214
215\input{asttable}
216
217\section{Assignment nodes}
218
219There is a collection of nodes used to represent assignments. Each
220assignment statement in the source code becomes a single
221\class{Assign} node in the AST. The \member{nodes} attribute is a
222list that contains a node for each assignment target. This is
223necessary because assignment can be chained, e.g. \code{a = b = 2}.
224Each \class{Node} in the list will be one of the following classes:
225\class{AssAttr}, \class{AssList}, \class{AssName}, or
226\class{AssTuple}.
227
228XXX Explain what the AssXXX nodes are for. Mention \code{a.b.c = 2}
229as an example. Explain what the flags are for.
230
231\chapter{Using Visitors to Walk ASTs}
232
233The visitor pattern is ... The \module{compiler} package uses a
234variant on the visitor pattern that takes advantage of Python's
235introspection features to elminiate the need for much of the visitor's
236infrastructure.
237
238The classes being visited do not need to be programmed to accept
239visitors. The visitor need only define visit methods for classes it
240is specifically interested in; a default visit method can handle the
241rest.
242
243XXX The magic \method{visit()} method for visitors.
244
245\begin{classdesc}{ASTVisitor}{}
246
247The \class{ASTVisitor} is responsible for walking over the tree in the
248correct order. A walk begins with a call to \method{preorder()}. For
249each node, it checks the \var{visitor} argument to \method{preorder{}}
250for a method named `visitNodeType,' where NodeType is the name of the
251node's class, e.g. for a \class{While} node a \method{visitWhile}
252would be called . If the method exists, it is called with the node as
253its first argument.
254
255The visitor method for a particular node type can control how child
256nodes are visited during the walk. The \class{ASTVisitor} modifies
257the visitor argument by adding a visit method to the visitor; this
258method can be used to visit a particular child node. If no visitor is
259found for a particular node type, the \method{default} method is
260called.
261
262XXX describe extra arguments
263
264\begin{methoddesc}{default}{node\optional{, *args}}
265\end{methoddesc}
266
267\begin{methoddesc}{dispatch}{node\optional{, *args}}
268\end{methoddesc}
269
270\begin{methoddesc}{preorder}{tree, visitor}
271\end{methoddesc}
272
273\end{classdesc}
274
275\begin{funcdesc}{walk}{tree, visitor\optional{, verbose=None}}
276\end{funcdesc}
277
278\chapter{Bytecode Generation}
279
280The code generator is a visit that emits bytecodes. Each visit method
281can call the \method{emit} method to emit a new bytecode. The basic
282code generator is specialized for modules, classes, and functions. An
283assembler converts that emitted instructions to the low-level bytecode
284format. It handles things like generator of constant lists of code
285objects and calculation of jump offsets.
286
287%
288% The ugly "%begin{latexonly}" pseudo-environments are really just to
289% keep LaTeX2HTML quiet during the \renewcommand{} macros; they're
290% not really valuable.
291%
292% If you don't want the Module Index, you can remove all of this up
293% until the second \input line.
294%
295%begin{latexonly}
296\renewcommand{\indexname}{Module Index}
297%end{latexonly}
298\input{mod\jobname.ind} % Module Index
299
300%begin{latexonly}
301\renewcommand{\indexname}{Index}
302%end{latexonly}
303\input{\jobname.ind} % Index
304
305\end{document}