blob: b3c3148e593fa8842391657206b1096a34ede9d8 [file] [log] [blame]
% XXX Label can't be _ast?
% XXX Where should this section/chapter go?
\chapter{Abstract Syntax Trees\label{ast}}
\sectionauthor{Martin v. L\"owis}{martin@v.loewis.de}
\versionadded{2.5}
The \code{_ast} module helps Python applications to process
trees of the Python abstract syntax grammar. The Python compiler
currently provides read-only access to such trees, meaning that
applications can only create a tree for a given piece of Python
source code; generating byte code from a (potentially modified)
tree is not supported. The abstract syntax itself might change with
each Python release; this module helps to find out programmatically
what the current grammar looks like.
An abstract syntax tree can be generated by passing \code{_ast.PyCF_ONLY_AST}
as a flag to the \function{compile} builtin function. The result will be a tree
of objects whose classes all inherit from \code{_ast.AST}.
The actual classes are derived from the \code{Parser/Python.asdl} file,
which is reproduced below. There is one class defined for each left-hand
side symbol in the abstract grammar (for example, \code{_ast.stmt} or \code{_ast.expr}).
In addition, there is one class defined for each constructor on the
right-hand side; these classes inherit from the classes for the left-hand
side trees. For example, \code{_ast.BinOp} inherits from \code{_ast.expr}.
For production rules with alternatives (aka "sums"), the left-hand side
class is abstract: only instances of specific constructor nodes are ever
created.
Each concrete class has an attribute \code{_fields} which gives the
names of all child nodes.
Each instance of a concrete class has one attribute for each child node,
of the type as defined in the grammar. For example, \code{_ast.BinOp}
instances have an attribute \code{left} of type \code{_ast.expr}.
Instances of \code{_ast.expr} and \code{_ast.stmt} subclasses also
have lineno and col_offset attributes. The lineno is the line number
of source text (1 indexed so the first line is line 1) and the
col_offset is the utf8 byte offset of the first token that generated
the node. The utf8 offset is recorded because the parser uses utf8
internally.
If these attributes are marked as optional in the grammar (using a
question mark), the value might be \code{None}. If the attributes
can have zero-or-more values (marked with an asterisk), the
values are represented as Python lists.
\subsection{Abstract Grammar}
The module defines a string constant \code{__version__} which
is the decimal subversion revision number of the file shown below.
The abstract grammar is currently defined as follows:
\verbatiminput{../../Parser/Python.asdl}