Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 1 | % Complete documentation on the extended LaTeX markup used for Python |
| 2 | % documentation is available in ``Documenting Python'', which is part |
| 3 | % of the standard documentation for Python. It may be found online |
| 4 | % at: |
| 5 | % |
| 6 | % http://www.python.org/doc/current/doc/doc.html |
| 7 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 8 | \documentclass{howto} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 9 | |
| 10 | \title{Python compiler package} |
| 11 | |
| 12 | \author{Jeremy Hylton} |
| 13 | |
| 14 | % Please at least include a long-lived email address; |
| 15 | % the rest is at your discretion. |
| 16 | \authoraddress{ |
| 17 | PythonLabs \\ |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 18 | Zope Corporation \\ |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 19 | Email: \email{jeremy@zope.com} |
| 20 | } |
| 21 | |
| 22 | \date{August 15, 2001} % update before release! |
| 23 | % Use an explicit date so that reformatting |
| 24 | % doesn't cause a new date to be used. Setting |
| 25 | % the date to \today can be used during draft |
| 26 | % stages to make it easier to handle versions. |
| 27 | |
| 28 | \release{2.2} % release version; this is used to define the |
| 29 | % \version macro |
| 30 | |
| 31 | \makeindex % tell \index to actually write the .idx file |
| 32 | \makemodindex % If this contains a lot of module sections. |
| 33 | |
| 34 | |
| 35 | \begin{document} |
| 36 | |
| 37 | \maketitle |
| 38 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 39 | \begin{abstract} |
| 40 | |
| 41 | \noindent |
| 42 | The Python compiler package is a tool for analyzing Python source code |
| 43 | and generating Python bytecode. The compiler contains libraries to |
| 44 | generate an abstract syntax tree from Python source code and to |
| 45 | generate Python bytecode from the tree. |
| 46 | |
| 47 | \end{abstract} |
| 48 | |
| 49 | \tableofcontents |
| 50 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 51 | |
| 52 | \section{Introduction\label{Introduction}} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 53 | |
| 54 | XXX Need basic intro |
| 55 | |
| 56 | XXX what are the major advantages... the abstract syntax is much |
| 57 | closer to the python source... |
| 58 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 59 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 60 | \section{The basic interface} |
| 61 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 62 | \declaremodule{}{compiler} |
| 63 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 64 | The top-level of the package defines four functions. |
| 65 | |
| 66 | \begin{funcdesc}{parse}{buf} |
| 67 | Returns an abstract syntax tree for the Python source code in \var{buf}. |
| 68 | The function raises SyntaxError if there is an error in the source |
| 69 | code. The return value is a \class{compiler.ast.Module} instance that |
| 70 | contains the tree. |
| 71 | \end{funcdesc} |
| 72 | |
| 73 | \begin{funcdesc}{parseFile}{path} |
| 74 | Return an abstract syntax tree for the Python source code in the file |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 75 | specified by \var{path}. It is equivalent to |
| 76 | \code{parse(open(\var{path}).read())}. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 77 | \end{funcdesc} |
| 78 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 79 | \begin{funcdesc}{walk}{ast, visitor\optional{, verbose}} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 80 | Do a pre-order walk over the abstract syntax tree \var{ast}. Call the |
| 81 | appropriate method on the \var{visitor} instance for each node |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 82 | encountered. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 83 | \end{funcdesc} |
| 84 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 85 | \begin{funcdesc}{compile}{path} |
| 86 | Compile the file \var{path} and generate the corresponding \file{.pyc} |
| 87 | file. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 88 | \end{funcdesc} |
| 89 | |
| 90 | The \module{compiler} package contains the following modules: |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 91 | \refmodule[compiler.ast]{ast}, \module{consts}, \module{future}, |
| 92 | \module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols}, |
| 93 | \module{transformer}, and \refmodule[compiler.visitor]{visitor}. |
| 94 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 95 | |
| 96 | \section{Limitations} |
| 97 | |
| 98 | There are some problems with the error checking of the compiler |
| 99 | package. The interpreter detects syntax errors in two distinct |
| 100 | phases. One set of errors is detected by the interpreter's parser, |
| 101 | the other set by the compiler. The compiler package relies on the |
| 102 | interpreter's parser, so it get the first phases of error checking for |
| 103 | free. It implements the second phase itself, and that implement is |
| 104 | incomplete. For example, the compiler package does not raise an error |
| 105 | if a name appears more than once in an argument list: |
| 106 | \code{def f(x, x): ...} |
| 107 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 108 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 109 | \section{Python Abstract Syntax} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 110 | |
| 111 | The \module{compiler.ast} module defines an abstract syntax for |
| 112 | Python. In the abstract syntax tree, each node represents a syntactic |
| 113 | construct. The root of the tree is \class{Module} object. |
| 114 | |
| 115 | The abstract syntax offers a higher level interface to parsed Python |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 116 | source code. The \ulink{\module{parser}} |
| 117 | {http://www.python.org/doc/current/lib/module-parser.html} |
| 118 | module and the compiler written in C for the Python interpreter use a |
| 119 | concrete syntax tree. The concrete syntax is tied closely to the |
| 120 | grammar description used for the Python parser. Instead of a single |
| 121 | node for a construct, there are often several levels of nested nodes |
| 122 | that are introduced by Python's precedence rules. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 123 | |
| 124 | The abstract syntax tree is created by the |
| 125 | \module{compiler.transformer} module. The transformer relies on the |
| 126 | builtin Python parser to generate a concrete syntax tree. It |
| 127 | generates an abstract syntax tree from the concrete tree. |
| 128 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 129 | The \module{transformer} module was created by Greg |
| 130 | Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an |
| 131 | experimental Python-to-C compiler. The current version contains a |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 132 | number of modifications and improvements, but the basic form of the |
| 133 | abstract syntax and of the transformer are due to Stein and Tutt. |
| 134 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 135 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 136 | \section{AST Nodes} |
| 137 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 138 | \declaremodule{}{compiler.ast} |
| 139 | |
| 140 | The \module{compiler.ast} module is generated from a text file that |
| 141 | describes each node type and its elements. Each node type is |
| 142 | represented as a class that inherits from the abstract base class |
| 143 | \class{compiler.ast.Node} and defines a set of named attributes for |
| 144 | child nodes. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 145 | |
| 146 | \begin{classdesc}{Node}{} |
| 147 | |
| 148 | The \class{Node} instances are created automatically by the parser |
| 149 | generator. The recommended interface for specific \class{Node} |
| 150 | instances is to use the public attributes to access child nodes. A |
| 151 | public attribute may be bound to a single node or to a sequence of |
| 152 | nodes, depending on the \class{Node} type. For example, the |
| 153 | \member{bases} attribute of the \class{Class} node, is bound to a |
| 154 | list of base class nodes, and the \member{doc} attribute is bound to |
| 155 | a single node. |
| 156 | |
| 157 | Each \class{Node} instance has a \member{lineno} attribute which may |
| 158 | be \code{None}. XXX Not sure what the rules are for which nodes |
| 159 | will have a useful lineno. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 160 | \end{classdesc} |
| 161 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 162 | All \class{Node} objects offer the following methods: |
| 163 | |
| 164 | \begin{methoddesc}{getChildren}{} |
| 165 | Returns a flattened list of the child nodes and objects in the |
| 166 | order they occur. Specifically, the order of the nodes is the |
| 167 | order in which they appear in the Python grammar. Not all of the |
| 168 | children are \class{Node} instances. The names of functions and |
| 169 | classes, for example, are plain strings. |
| 170 | \end{methoddesc} |
| 171 | |
| 172 | \begin{methoddesc}{getChildNodes}{} |
| 173 | Returns a flattened list of the child nodes in the order they |
| 174 | occur. This method is like \method{getChildren()}, except that it |
| 175 | only returns those children that are \class{Node} instances. |
| 176 | \end{methoddesc} |
| 177 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 178 | Two examples illustrate the general structure of \class{Node} |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 179 | classes. The \keyword{while} statement is defined by the following |
| 180 | grammar production: |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 181 | |
| 182 | \begin{verbatim} |
| 183 | while_stmt: "while" expression ":" suite |
| 184 | ["else" ":" suite] |
| 185 | \end{verbatim} |
| 186 | |
| 187 | The \class{While} node has three attributes: \member{test}, |
| 188 | \member{body}, and \member{else_}. (If the natural name for an |
| 189 | attribute is also a Python reserved word, it can't be used as an |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 190 | attribute name. An underscore is appended to the word to make it a |
| 191 | legal identifier, hence \member{else_} instead of \keyword{else}.) |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 192 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 193 | The \keyword{if} statement is more complicated because it can include |
| 194 | several tests. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 195 | |
| 196 | \begin{verbatim} |
| 197 | if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] |
| 198 | \end{verbatim} |
| 199 | |
| 200 | The \class{If} node only defines two attributes: \member{tests} and |
| 201 | \member{else_}. The \member{tests} attribute is a sequence of test |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 202 | expression, consequent body pairs. There is one pair for each |
| 203 | \keyword{if}/\keyword{elif} clause. The first element of the pair is |
| 204 | the test expression. The second elements is a \class{Stmt} node that |
| 205 | contains the code to execute if the test is true. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 206 | |
| 207 | The \method{getChildren()} method of \class{If} returns a flat list of |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 208 | child nodes. If there are three \keyword{if}/\keyword{elif} clauses |
| 209 | and no \keyword{else} clause, then \method{getChildren()} will return |
| 210 | a list of six elements: the first test expression, the first |
| 211 | \class{Stmt}, the second text expression, etc. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 212 | |
| 213 | The following table lists each of the \class{Node} subclasses defined |
| 214 | in \module{compiler.ast} and each of the public attributes available |
| 215 | on their instances. The values of most of the attributes are |
| 216 | themselves \class{Node} instances or sequences of instances. When the |
| 217 | value is something other than an instance, the type is noted in the |
| 218 | comment. The attributes are listed in the order in which they are |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 219 | returned by \method{getChildren()} and \method{getChildNodes()}. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 220 | |
| 221 | \input{asttable} |
| 222 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 223 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 224 | \section{Assignment nodes} |
| 225 | |
| 226 | There is a collection of nodes used to represent assignments. Each |
| 227 | assignment statement in the source code becomes a single |
| 228 | \class{Assign} node in the AST. The \member{nodes} attribute is a |
| 229 | list that contains a node for each assignment target. This is |
| 230 | necessary because assignment can be chained, e.g. \code{a = b = 2}. |
| 231 | Each \class{Node} in the list will be one of the following classes: |
| 232 | \class{AssAttr}, \class{AssList}, \class{AssName}, or |
| 233 | \class{AssTuple}. |
| 234 | |
| 235 | XXX Explain what the AssXXX nodes are for. Mention \code{a.b.c = 2} |
| 236 | as an example. Explain what the flags are for. |
| 237 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 238 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 239 | \section{Using Visitors to Walk ASTs} |
| 240 | |
| 241 | \declaremodule{}{compiler.visitor} |
| 242 | |
| 243 | The visitor pattern is ... The \refmodule{compiler} package uses a |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 244 | variant on the visitor pattern that takes advantage of Python's |
| 245 | introspection features to elminiate the need for much of the visitor's |
| 246 | infrastructure. |
| 247 | |
| 248 | The classes being visited do not need to be programmed to accept |
| 249 | visitors. The visitor need only define visit methods for classes it |
| 250 | is specifically interested in; a default visit method can handle the |
| 251 | rest. |
| 252 | |
| 253 | XXX The magic \method{visit()} method for visitors. |
| 254 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 255 | \begin{funcdesc}{walk}{tree, visitor\optional{, verbose}} |
| 256 | \end{funcdesc} |
| 257 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 258 | \begin{classdesc}{ASTVisitor}{} |
| 259 | |
| 260 | The \class{ASTVisitor} is responsible for walking over the tree in the |
| 261 | correct order. A walk begins with a call to \method{preorder()}. For |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 262 | each node, it checks the \var{visitor} argument to \method{preorder()} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 263 | for a method named `visitNodeType,' where NodeType is the name of the |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 264 | node's class, e.g. for a \class{While} node a \method{visitWhile()} |
Fred Drake | 4e6a3fe | 2001-08-15 18:48:10 +0000 | [diff] [blame^] | 265 | would be called. If the method exists, it is called with the node as |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 266 | its first argument. |
| 267 | |
| 268 | The visitor method for a particular node type can control how child |
| 269 | nodes are visited during the walk. The \class{ASTVisitor} modifies |
| 270 | the visitor argument by adding a visit method to the visitor; this |
| 271 | method can be used to visit a particular child node. If no visitor is |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 272 | found for a particular node type, the \method{default()} method is |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 273 | called. |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 274 | \end{classdesc} |
| 275 | |
| 276 | \class{ASTVisitor} objects have the following methods: |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 277 | |
| 278 | XXX describe extra arguments |
| 279 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 280 | \begin{methoddesc}{default}{node\optional{, \moreargs}} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 281 | \end{methoddesc} |
| 282 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 283 | \begin{methoddesc}{dispatch}{node\optional{, \moreargs}} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 284 | \end{methoddesc} |
| 285 | |
| 286 | \begin{methoddesc}{preorder}{tree, visitor} |
| 287 | \end{methoddesc} |
| 288 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 289 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 290 | \section{Bytecode Generation} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 291 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 292 | The code generator is a visitor that emits bytecodes. Each visit method |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 293 | can call the \method{emit()} method to emit a new bytecode. The basic |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 294 | code generator is specialized for modules, classes, and functions. An |
| 295 | assembler converts that emitted instructions to the low-level bytecode |
| 296 | format. It handles things like generator of constant lists of code |
| 297 | objects and calculation of jump offsets. |
| 298 | |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 299 | |
Fred Drake | 834a85a | 2001-08-15 17:01:34 +0000 | [diff] [blame] | 300 | \input{compiler.ind} % Index |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 301 | |
| 302 | \end{document} |