blob: f39b7342bc22fdbee1c754ad0db3834ad48630d9 [file] [log] [blame]
Jeremy Hylton76f42ac2001-08-14 22:04:44 +00001% Complete documentation on the extended LaTeX markup used for Python
2% documentation is available in ``Documenting Python'', which is part
3% of the standard documentation for Python. It may be found online
4% at:
5%
6% http://www.python.org/doc/current/doc/doc.html
7
8\documentclass{manual}
9
10\title{Python compiler package}
11
12\author{Jeremy Hylton}
13
14% Please at least include a long-lived email address;
15% the rest is at your discretion.
16\authoraddress{
17 PythonLabs \\
18 Zope Corp. \\
19 Email: \email{jeremy@zope.com}
20}
21
22\date{August 15, 2001} % update before release!
23 % Use an explicit date so that reformatting
24 % doesn't cause a new date to be used. Setting
25 % the date to \today can be used during draft
26 % stages to make it easier to handle versions.
27
28\release{2.2} % release version; this is used to define the
29 % \version macro
30
31\makeindex % tell \index to actually write the .idx file
32\makemodindex % If this contains a lot of module sections.
33
34
35\begin{document}
36
37\maketitle
38
39% This makes the contents more accessible from the front page of the HTML.
40\ifhtml
41\chapter*{Front Matter\label{front}}
42\fi
43
44%\input{copyright}
45
46\begin{abstract}
47
48\noindent
49The Python compiler package is a tool for analyzing Python source code
50and generating Python bytecode. The compiler contains libraries to
51generate an abstract syntax tree from Python source code and to
52generate Python bytecode from the tree.
53
54\end{abstract}
55
56\tableofcontents
57
58\chapter{Introduction\label{Introduction}}
59
60XXX Need basic intro
61
62XXX what are the major advantages... the abstract syntax is much
63closer to the python source...
64
65\section{The basic interface}
66
67The top-level of the package defines four functions.
68
69\begin{funcdesc}{parse}{buf}
70Returns an abstract syntax tree for the Python source code in \var{buf}.
71The function raises SyntaxError if there is an error in the source
72code. The return value is a \class{compiler.ast.Module} instance that
73contains the tree.
74\end{funcdesc}
75
76\begin{funcdesc}{parseFile}{path}
77Return an abstract syntax tree for the Python source code in the file
Fred Drake42caf3f2001-08-15 14:35:13 +000078specified by \var{path}. It is equivalent to
79\code{parse(open(\var{path}).read())}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +000080\end{funcdesc}
81
82\begin{funcdesc}{walk}{ast, visitor, \optional{verbose=None}}
83Do a pre-order walk over the abstract syntax tree \var{ast}. Call the
84appropriate method on the \var{visitor} instance for each node
85encountered.
86\end{funcdesc}
87
88\begin{funcdesc}{compile}{filename}
89Compile the file \var{filename} and generated \var{filename}.pyc.
90\end{funcdesc}
91
92The \module{compiler} package contains the following modules:
93\module{ast}, \module{consts}, \module{future}, \module{misc},
94\module{pyassem}, \module{pycodegen}, \module{symbols},
95\module{transformer}, and \module{visitor}.
96
97\section{Limitations}
98
99There are some problems with the error checking of the compiler
100package. The interpreter detects syntax errors in two distinct
101phases. One set of errors is detected by the interpreter's parser,
102the other set by the compiler. The compiler package relies on the
103interpreter's parser, so it get the first phases of error checking for
104free. It implements the second phase itself, and that implement is
105incomplete. For example, the compiler package does not raise an error
106if a name appears more than once in an argument list:
107\code{def f(x, x): ...}
108
109\chapter{Python Abstract Syntax}
110
111\section{Introduction}
112
113The \module{compiler.ast} module defines an abstract syntax for
114Python. In the abstract syntax tree, each node represents a syntactic
115construct. The root of the tree is \class{Module} object.
116
117The abstract syntax offers a higher level interface to parsed Python
118source code. The \module{parser} module and the compiler written in C
119for the Python interpreter use a concrete syntax tree. The concrete
120syntax is tied closely to the grammar description used for the Python
121parser. Instead of a single node for a construct, there are often
122several levels of nested nodes that are introduced by Python's
123precedence rules.
124
125The abstract syntax tree is created by the
126\module{compiler.transformer} module. The transformer relies on the
127builtin Python parser to generate a concrete syntax tree. It
128generates an abstract syntax tree from the concrete tree.
129
130The \module{transformer} module was created by Greg Stein and Bill
131Tutt for the Python-to-C compiler. The current version contains a
132number of modifications and improvements, but the basic form of the
133abstract syntax and of the transformer are due to Stein and Tutt.
134
135\section{AST Nodes}
136
137The \module{ast} module is generated from a text file that describes
138each node type and its elements. Each node type is represented as a
139class that inherits from the abstract base class \class{ast.Node} and
140defines a set of named attributes for child nodes.
141
142\begin{classdesc}{Node}{}
143
144 The \class{Node} instances are created automatically by the parser
145 generator. The recommended interface for specific \class{Node}
146 instances is to use the public attributes to access child nodes. A
147 public attribute may be bound to a single node or to a sequence of
148 nodes, depending on the \class{Node} type. For example, the
149 \member{bases} attribute of the \class{Class} node, is bound to a
150 list of base class nodes, and the \member{doc} attribute is bound to
151 a single node.
152
153 Each \class{Node} instance has a \member{lineno} attribute which may
154 be \code{None}. XXX Not sure what the rules are for which nodes
155 will have a useful lineno.
156
157 \begin{methoddesc}{getChildren}{}
158 Returns a flattened list of the child nodes and objects in the
159 order they occur. Specifically, the order of the nodes is the
160 order in which they appear in the Python grammar. Not all of the
161 children are \class{Node} instances. The names of functions and
162 classes, for example, are plain strings.
163 \end{methoddesc}
164
165 \begin{methoddesc}{getChildNodes}{}
166 Returns a flattened list of the child nodes in the order they
Fred Drake42caf3f2001-08-15 14:35:13 +0000167 occur. This method is like \method{getChildNodes()}, except that it
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000168 only returns those children that are \class{Node} instances.
169 \end{methoddesc}
170
171\end{classdesc}
172
173Two examples illustrate the general structure of \class{Node}
174classes. The while statement is defined by the following grammar
175production:
176
177\begin{verbatim}
178while_stmt: "while" expression ":" suite
179 ["else" ":" suite]
180\end{verbatim}
181
182The \class{While} node has three attributes: \member{test},
183\member{body}, and \member{else_}. (If the natural name for an
184attribute is also a Python reserved word, it can't be used as an
185attribute name. An underscore is appended to the word to make it
186legal, hence \code{else_} instead of \code{else}.)
187
188The if statement is more complicated because it can include several
189tests.
190
191\begin{verbatim}
192if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
193\end{verbatim}
194
195The \class{If} node only defines two attributes: \member{tests} and
196\member{else_}. The \member{tests} attribute is a sequence of test
197expression, consequent body pairs. There is one pair of each if/elif
198clause. The first element of the pair is the test expression. The
199second elements is a \class{Stmt} node that contains the code to
200execute if the test is true.
201
202The \method{getChildren()} method of \class{If} returns a flat list of
203child nodes. If there are three if/elif clauses and no else clause,
204then \method{getChildren()} will return a list of six elements: the
205first test expression, the first \class{Stmt}, the second text
206expression, etc.
207
208The following table lists each of the \class{Node} subclasses defined
209in \module{compiler.ast} and each of the public attributes available
210on their instances. The values of most of the attributes are
211themselves \class{Node} instances or sequences of instances. When the
212value is something other than an instance, the type is noted in the
213comment. The attributes are listed in the order in which they are
Fred Drake42caf3f2001-08-15 14:35:13 +0000214returned by \method{getChildren()} and \method{getChildNodes()}.
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000215
216\input{asttable}
217
218\section{Assignment nodes}
219
220There is a collection of nodes used to represent assignments. Each
221assignment statement in the source code becomes a single
222\class{Assign} node in the AST. The \member{nodes} attribute is a
223list that contains a node for each assignment target. This is
224necessary because assignment can be chained, e.g. \code{a = b = 2}.
225Each \class{Node} in the list will be one of the following classes:
226\class{AssAttr}, \class{AssList}, \class{AssName}, or
227\class{AssTuple}.
228
229XXX Explain what the AssXXX nodes are for. Mention \code{a.b.c = 2}
230as an example. Explain what the flags are for.
231
232\chapter{Using Visitors to Walk ASTs}
233
234The visitor pattern is ... The \module{compiler} package uses a
235variant on the visitor pattern that takes advantage of Python's
236introspection features to elminiate the need for much of the visitor's
237infrastructure.
238
239The classes being visited do not need to be programmed to accept
240visitors. The visitor need only define visit methods for classes it
241is specifically interested in; a default visit method can handle the
242rest.
243
244XXX The magic \method{visit()} method for visitors.
245
246\begin{classdesc}{ASTVisitor}{}
247
248The \class{ASTVisitor} is responsible for walking over the tree in the
249correct order. A walk begins with a call to \method{preorder()}. For
Fred Drake42caf3f2001-08-15 14:35:13 +0000250each node, it checks the \var{visitor} argument to \method{preorder()}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000251for a method named `visitNodeType,' where NodeType is the name of the
Fred Drake42caf3f2001-08-15 14:35:13 +0000252node's class, e.g. for a \class{While} node a \method{visitWhile()}
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000253would be called . If the method exists, it is called with the node as
254its first argument.
255
256The visitor method for a particular node type can control how child
257nodes are visited during the walk. The \class{ASTVisitor} modifies
258the visitor argument by adding a visit method to the visitor; this
259method can be used to visit a particular child node. If no visitor is
Fred Drake42caf3f2001-08-15 14:35:13 +0000260found for a particular node type, the \method{default()} method is
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000261called.
262
263XXX describe extra arguments
264
265\begin{methoddesc}{default}{node\optional{, *args}}
266\end{methoddesc}
267
268\begin{methoddesc}{dispatch}{node\optional{, *args}}
269\end{methoddesc}
270
271\begin{methoddesc}{preorder}{tree, visitor}
272\end{methoddesc}
273
274\end{classdesc}
275
276\begin{funcdesc}{walk}{tree, visitor\optional{, verbose=None}}
277\end{funcdesc}
278
279\chapter{Bytecode Generation}
280
281The code generator is a visit that emits bytecodes. Each visit method
Fred Drake42caf3f2001-08-15 14:35:13 +0000282can call the \method{emit()} method to emit a new bytecode. The basic
Jeremy Hylton76f42ac2001-08-14 22:04:44 +0000283code generator is specialized for modules, classes, and functions. An
284assembler converts that emitted instructions to the low-level bytecode
285format. It handles things like generator of constant lists of code
286objects and calculation of jump offsets.
287
288%
289% The ugly "%begin{latexonly}" pseudo-environments are really just to
290% keep LaTeX2HTML quiet during the \renewcommand{} macros; they're
291% not really valuable.
292%
293% If you don't want the Module Index, you can remove all of this up
294% until the second \input line.
295%
296%begin{latexonly}
297\renewcommand{\indexname}{Module Index}
298%end{latexonly}
299\input{mod\jobname.ind} % Module Index
300
301%begin{latexonly}
302\renewcommand{\indexname}{Index}
303%end{latexonly}
304\input{\jobname.ind} % Index
305
306\end{document}