Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 1 | % Complete documentation on the extended LaTeX markup used for Python |
| 2 | % documentation is available in ``Documenting Python'', which is part |
| 3 | % of the standard documentation for Python. It may be found online |
| 4 | % at: |
| 5 | % |
| 6 | % http://www.python.org/doc/current/doc/doc.html |
| 7 | |
| 8 | \documentclass{manual} |
| 9 | |
| 10 | \title{Python compiler package} |
| 11 | |
| 12 | \author{Jeremy Hylton} |
| 13 | |
| 14 | % Please at least include a long-lived email address; |
| 15 | % the rest is at your discretion. |
| 16 | \authoraddress{ |
| 17 | PythonLabs \\ |
| 18 | Zope Corp. \\ |
| 19 | Email: \email{jeremy@zope.com} |
| 20 | } |
| 21 | |
| 22 | \date{August 15, 2001} % update before release! |
| 23 | % Use an explicit date so that reformatting |
| 24 | % doesn't cause a new date to be used. Setting |
| 25 | % the date to \today can be used during draft |
| 26 | % stages to make it easier to handle versions. |
| 27 | |
| 28 | \release{2.2} % release version; this is used to define the |
| 29 | % \version macro |
| 30 | |
| 31 | \makeindex % tell \index to actually write the .idx file |
| 32 | \makemodindex % If this contains a lot of module sections. |
| 33 | |
| 34 | |
| 35 | \begin{document} |
| 36 | |
| 37 | \maketitle |
| 38 | |
| 39 | % This makes the contents more accessible from the front page of the HTML. |
| 40 | \ifhtml |
| 41 | \chapter*{Front Matter\label{front}} |
| 42 | \fi |
| 43 | |
| 44 | %\input{copyright} |
| 45 | |
| 46 | \begin{abstract} |
| 47 | |
| 48 | \noindent |
| 49 | The Python compiler package is a tool for analyzing Python source code |
| 50 | and generating Python bytecode. The compiler contains libraries to |
| 51 | generate an abstract syntax tree from Python source code and to |
| 52 | generate Python bytecode from the tree. |
| 53 | |
| 54 | \end{abstract} |
| 55 | |
| 56 | \tableofcontents |
| 57 | |
| 58 | \chapter{Introduction\label{Introduction}} |
| 59 | |
| 60 | XXX Need basic intro |
| 61 | |
| 62 | XXX what are the major advantages... the abstract syntax is much |
| 63 | closer to the python source... |
| 64 | |
| 65 | \section{The basic interface} |
| 66 | |
| 67 | The top-level of the package defines four functions. |
| 68 | |
| 69 | \begin{funcdesc}{parse}{buf} |
| 70 | Returns an abstract syntax tree for the Python source code in \var{buf}. |
| 71 | The function raises SyntaxError if there is an error in the source |
| 72 | code. The return value is a \class{compiler.ast.Module} instance that |
| 73 | contains the tree. |
| 74 | \end{funcdesc} |
| 75 | |
| 76 | \begin{funcdesc}{parseFile}{path} |
| 77 | Return an abstract syntax tree for the Python source code in the file |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 78 | specified by \var{path}. It is equivalent to |
| 79 | \code{parse(open(\var{path}).read())}. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 80 | \end{funcdesc} |
| 81 | |
| 82 | \begin{funcdesc}{walk}{ast, visitor, \optional{verbose=None}} |
| 83 | Do a pre-order walk over the abstract syntax tree \var{ast}. Call the |
| 84 | appropriate method on the \var{visitor} instance for each node |
| 85 | encountered. |
| 86 | \end{funcdesc} |
| 87 | |
| 88 | \begin{funcdesc}{compile}{filename} |
| 89 | Compile the file \var{filename} and generated \var{filename}.pyc. |
| 90 | \end{funcdesc} |
| 91 | |
| 92 | The \module{compiler} package contains the following modules: |
| 93 | \module{ast}, \module{consts}, \module{future}, \module{misc}, |
| 94 | \module{pyassem}, \module{pycodegen}, \module{symbols}, |
| 95 | \module{transformer}, and \module{visitor}. |
| 96 | |
| 97 | \section{Limitations} |
| 98 | |
| 99 | There are some problems with the error checking of the compiler |
| 100 | package. The interpreter detects syntax errors in two distinct |
| 101 | phases. One set of errors is detected by the interpreter's parser, |
| 102 | the other set by the compiler. The compiler package relies on the |
| 103 | interpreter's parser, so it get the first phases of error checking for |
| 104 | free. It implements the second phase itself, and that implement is |
| 105 | incomplete. For example, the compiler package does not raise an error |
| 106 | if a name appears more than once in an argument list: |
| 107 | \code{def f(x, x): ...} |
| 108 | |
| 109 | \chapter{Python Abstract Syntax} |
| 110 | |
| 111 | \section{Introduction} |
| 112 | |
| 113 | The \module{compiler.ast} module defines an abstract syntax for |
| 114 | Python. In the abstract syntax tree, each node represents a syntactic |
| 115 | construct. The root of the tree is \class{Module} object. |
| 116 | |
| 117 | The abstract syntax offers a higher level interface to parsed Python |
| 118 | source code. The \module{parser} module and the compiler written in C |
| 119 | for the Python interpreter use a concrete syntax tree. The concrete |
| 120 | syntax is tied closely to the grammar description used for the Python |
| 121 | parser. Instead of a single node for a construct, there are often |
| 122 | several levels of nested nodes that are introduced by Python's |
| 123 | precedence rules. |
| 124 | |
| 125 | The abstract syntax tree is created by the |
| 126 | \module{compiler.transformer} module. The transformer relies on the |
| 127 | builtin Python parser to generate a concrete syntax tree. It |
| 128 | generates an abstract syntax tree from the concrete tree. |
| 129 | |
| 130 | The \module{transformer} module was created by Greg Stein and Bill |
| 131 | Tutt for the Python-to-C compiler. The current version contains a |
| 132 | number of modifications and improvements, but the basic form of the |
| 133 | abstract syntax and of the transformer are due to Stein and Tutt. |
| 134 | |
| 135 | \section{AST Nodes} |
| 136 | |
| 137 | The \module{ast} module is generated from a text file that describes |
| 138 | each node type and its elements. Each node type is represented as a |
| 139 | class that inherits from the abstract base class \class{ast.Node} and |
| 140 | defines a set of named attributes for child nodes. |
| 141 | |
| 142 | \begin{classdesc}{Node}{} |
| 143 | |
| 144 | The \class{Node} instances are created automatically by the parser |
| 145 | generator. The recommended interface for specific \class{Node} |
| 146 | instances is to use the public attributes to access child nodes. A |
| 147 | public attribute may be bound to a single node or to a sequence of |
| 148 | nodes, depending on the \class{Node} type. For example, the |
| 149 | \member{bases} attribute of the \class{Class} node, is bound to a |
| 150 | list of base class nodes, and the \member{doc} attribute is bound to |
| 151 | a single node. |
| 152 | |
| 153 | Each \class{Node} instance has a \member{lineno} attribute which may |
| 154 | be \code{None}. XXX Not sure what the rules are for which nodes |
| 155 | will have a useful lineno. |
| 156 | |
| 157 | \begin{methoddesc}{getChildren}{} |
| 158 | Returns a flattened list of the child nodes and objects in the |
| 159 | order they occur. Specifically, the order of the nodes is the |
| 160 | order in which they appear in the Python grammar. Not all of the |
| 161 | children are \class{Node} instances. The names of functions and |
| 162 | classes, for example, are plain strings. |
| 163 | \end{methoddesc} |
| 164 | |
| 165 | \begin{methoddesc}{getChildNodes}{} |
| 166 | Returns a flattened list of the child nodes in the order they |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 167 | occur. This method is like \method{getChildNodes()}, except that it |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 168 | only returns those children that are \class{Node} instances. |
| 169 | \end{methoddesc} |
| 170 | |
| 171 | \end{classdesc} |
| 172 | |
| 173 | Two examples illustrate the general structure of \class{Node} |
| 174 | classes. The while statement is defined by the following grammar |
| 175 | production: |
| 176 | |
| 177 | \begin{verbatim} |
| 178 | while_stmt: "while" expression ":" suite |
| 179 | ["else" ":" suite] |
| 180 | \end{verbatim} |
| 181 | |
| 182 | The \class{While} node has three attributes: \member{test}, |
| 183 | \member{body}, and \member{else_}. (If the natural name for an |
| 184 | attribute is also a Python reserved word, it can't be used as an |
| 185 | attribute name. An underscore is appended to the word to make it |
| 186 | legal, hence \code{else_} instead of \code{else}.) |
| 187 | |
| 188 | The if statement is more complicated because it can include several |
| 189 | tests. |
| 190 | |
| 191 | \begin{verbatim} |
| 192 | if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite] |
| 193 | \end{verbatim} |
| 194 | |
| 195 | The \class{If} node only defines two attributes: \member{tests} and |
| 196 | \member{else_}. The \member{tests} attribute is a sequence of test |
| 197 | expression, consequent body pairs. There is one pair of each if/elif |
| 198 | clause. The first element of the pair is the test expression. The |
| 199 | second elements is a \class{Stmt} node that contains the code to |
| 200 | execute if the test is true. |
| 201 | |
| 202 | The \method{getChildren()} method of \class{If} returns a flat list of |
| 203 | child nodes. If there are three if/elif clauses and no else clause, |
| 204 | then \method{getChildren()} will return a list of six elements: the |
| 205 | first test expression, the first \class{Stmt}, the second text |
| 206 | expression, etc. |
| 207 | |
| 208 | The following table lists each of the \class{Node} subclasses defined |
| 209 | in \module{compiler.ast} and each of the public attributes available |
| 210 | on their instances. The values of most of the attributes are |
| 211 | themselves \class{Node} instances or sequences of instances. When the |
| 212 | value is something other than an instance, the type is noted in the |
| 213 | comment. The attributes are listed in the order in which they are |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 214 | returned by \method{getChildren()} and \method{getChildNodes()}. |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 215 | |
| 216 | \input{asttable} |
| 217 | |
| 218 | \section{Assignment nodes} |
| 219 | |
| 220 | There is a collection of nodes used to represent assignments. Each |
| 221 | assignment statement in the source code becomes a single |
| 222 | \class{Assign} node in the AST. The \member{nodes} attribute is a |
| 223 | list that contains a node for each assignment target. This is |
| 224 | necessary because assignment can be chained, e.g. \code{a = b = 2}. |
| 225 | Each \class{Node} in the list will be one of the following classes: |
| 226 | \class{AssAttr}, \class{AssList}, \class{AssName}, or |
| 227 | \class{AssTuple}. |
| 228 | |
| 229 | XXX Explain what the AssXXX nodes are for. Mention \code{a.b.c = 2} |
| 230 | as an example. Explain what the flags are for. |
| 231 | |
| 232 | \chapter{Using Visitors to Walk ASTs} |
| 233 | |
| 234 | The visitor pattern is ... The \module{compiler} package uses a |
| 235 | variant on the visitor pattern that takes advantage of Python's |
| 236 | introspection features to elminiate the need for much of the visitor's |
| 237 | infrastructure. |
| 238 | |
| 239 | The classes being visited do not need to be programmed to accept |
| 240 | visitors. The visitor need only define visit methods for classes it |
| 241 | is specifically interested in; a default visit method can handle the |
| 242 | rest. |
| 243 | |
| 244 | XXX The magic \method{visit()} method for visitors. |
| 245 | |
| 246 | \begin{classdesc}{ASTVisitor}{} |
| 247 | |
| 248 | The \class{ASTVisitor} is responsible for walking over the tree in the |
| 249 | correct order. A walk begins with a call to \method{preorder()}. For |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 250 | each node, it checks the \var{visitor} argument to \method{preorder()} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 251 | for a method named `visitNodeType,' where NodeType is the name of the |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 252 | node's class, e.g. for a \class{While} node a \method{visitWhile()} |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 253 | would be called . If the method exists, it is called with the node as |
| 254 | its first argument. |
| 255 | |
| 256 | The visitor method for a particular node type can control how child |
| 257 | nodes are visited during the walk. The \class{ASTVisitor} modifies |
| 258 | the visitor argument by adding a visit method to the visitor; this |
| 259 | method can be used to visit a particular child node. If no visitor is |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 260 | found for a particular node type, the \method{default()} method is |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 261 | called. |
| 262 | |
| 263 | XXX describe extra arguments |
| 264 | |
| 265 | \begin{methoddesc}{default}{node\optional{, *args}} |
| 266 | \end{methoddesc} |
| 267 | |
| 268 | \begin{methoddesc}{dispatch}{node\optional{, *args}} |
| 269 | \end{methoddesc} |
| 270 | |
| 271 | \begin{methoddesc}{preorder}{tree, visitor} |
| 272 | \end{methoddesc} |
| 273 | |
| 274 | \end{classdesc} |
| 275 | |
| 276 | \begin{funcdesc}{walk}{tree, visitor\optional{, verbose=None}} |
| 277 | \end{funcdesc} |
| 278 | |
| 279 | \chapter{Bytecode Generation} |
| 280 | |
| 281 | The code generator is a visit that emits bytecodes. Each visit method |
Fred Drake | 42caf3f | 2001-08-15 14:35:13 +0000 | [diff] [blame] | 282 | can call the \method{emit()} method to emit a new bytecode. The basic |
Jeremy Hylton | 76f42ac | 2001-08-14 22:04:44 +0000 | [diff] [blame] | 283 | code generator is specialized for modules, classes, and functions. An |
| 284 | assembler converts that emitted instructions to the low-level bytecode |
| 285 | format. It handles things like generator of constant lists of code |
| 286 | objects and calculation of jump offsets. |
| 287 | |
| 288 | % |
| 289 | % The ugly "%begin{latexonly}" pseudo-environments are really just to |
| 290 | % keep LaTeX2HTML quiet during the \renewcommand{} macros; they're |
| 291 | % not really valuable. |
| 292 | % |
| 293 | % If you don't want the Module Index, you can remove all of this up |
| 294 | % until the second \input line. |
| 295 | % |
| 296 | %begin{latexonly} |
| 297 | \renewcommand{\indexname}{Module Index} |
| 298 | %end{latexonly} |
| 299 | \input{mod\jobname.ind} % Module Index |
| 300 | |
| 301 | %begin{latexonly} |
| 302 | \renewcommand{\indexname}{Index} |
| 303 | %end{latexonly} |
| 304 | \input{\jobname.ind} % Index |
| 305 | |
| 306 | \end{document} |