blob: 1f5d4fda750ac7fb191b391b61e1d7a03986f1ac [file] [log] [blame]
Guido van Rossum4b73a061995-10-11 17:30:04 +00001% libparser.tex
2%
3% Introductory documentation for the new parser built-in module.
4%
5% Copyright 1995 Virginia Polytechnic Institute and State University
6% and Fred L. Drake, Jr. This copyright notice must be distributed on
7% all copies, but this document otherwise may be distributed as part
8% of the Python distribution. No fee may be charged for this document
9% in any representation, either on paper or electronically. This
10% restriction does not affect other elements in a distributed package
11% in any way.
12%
13
14\section{Built-in Module \sectcode{parser}}
15\bimodindex{parser}
16
17
18% ==== 2. ====
19% Give a short overview of what the module does.
20% If it is platform specific, mention this.
21% Mention other important restrictions or general operating principles.
22
23The \code{parser} module provides an interface to Python's internal
24parser and byte-code compiler. The primary purpose for this interface
25is to allow Python code to edit the parse tree of a Python expression
26and create executable code from this. This can be better than trying
27to parse and modify an arbitrary Python code fragment as a string, and
28ensures that parsing is performed in a manner identical to the code
29forming the application. It's also faster.
30
31There are a few things to note about this module which are important
32to making use of the data structures created. This is not a tutorial
33on editing the parse trees for Python code.
34
35Most importantly, a good understanding of the Python grammar processed
36by the internal parser is required. For full information on the
37language syntax, refer to the Language Reference. The parser itself
38is created from a grammar specification defined in the file
39\code{Grammar/Grammar} in the standard Python distribution. The parse
40trees stored in the ``AST objects'' created by this module are the
41actual output from the internal parser when created by the
42\code{expr()} or \code{suite()} functions, described below. The AST
43objects created by \code{tuple2ast()} faithfully simulate those
44structures.
45
46Each element of the tuples returned by \code{ast2tuple()} has a simple
47form. Tuples representing non-terminal elements in the grammar always
48have a length greater than one. The first element is an integer which
49identifies a production in the grammar. These integers are given
50symbolic names in the C header file \code{Include/graminit.h} and the
51Python module \code{Lib/symbol.py}. Each additional element of the
52tuple represents a component of the production as recognized in the
53input string: these are always tuples which have the same form as the
54parent. An important aspect of this structure which should be noted
55is that keywords used to identify the parent node type, such as the
56keyword \code{if} in an \emph{if\_stmt}, are included in the node tree
57without any special treatment. For example, the \code{if} keyword is
58represented by the tuple \code{(1, 'if')}, where \code{1} is the
59numeric value associated with all \code{NAME} elements, including
60variable and function names defined by the user.
61
62Terminal elements are represented in much the same way, but without
63any child elements and the addition of the source text which was
64identified. The example of the \code{if} keyword above is
65representative. The various types of terminal symbols are defined in
66the C header file \code{Include/token.h} and the Python module
67\code{Lib/token.py}.
68
69The AST objects are not actually required to support the functionality
70of this module, but are provided for three purposes: to allow an
71application to amortize the cost of processing complex parse trees, to
72provide a parse tree representation which conserves memory space when
73compared to the Python tuple representation, and to ease the creation
74of additional modules in C which manipulate parse trees. A simple
75``wrapper'' module may be created in Python if desired to hide the use
76of AST objects.
77
78
79% ==== 3. ====
80% List the public functions defined by the module. Begin with a
81% standard phrase. You may also list the exceptions and other data
82% items defined in the module, insofar as they are important for the
83% user.
84
85The \code{parser} module defines the following functions:
86
87% ---- 3.1. ----
88% Redefine the ``indexsubitem'' macro to point to this module
89% (alternatively, you can put this at the top of the file):
90
91\renewcommand{\indexsubitem}{(in module parser)}
92
93% ---- 3.2. ----
94% For each function, use a ``funcdesc'' block. This has exactly two
95% parameters (each parameters is contained in a set of curly braces):
96% the first parameter is the function name (this automatically
97% generates an index entry); the second parameter is the function's
98% argument list. If there are no arguments, use an empty pair of
99% curly braces. If there is more than one argument, separate the
100% arguments with backslash-comma. Optional parts of the parameter
101% list are contained in \optional{...} (this generates a set of square
102% brackets around its parameter). Arguments are automatically set in
103% italics in the parameter list. Each argument should be mentioned at
104% least once in the description; each usage (even inside \code{...})
105% should be enclosed in \var{...}.
106
107\begin{funcdesc}{ast2tuple}{ast}
108This function accepts an AST object from the caller in
109\code{\var{ast}} and returns a Python tuple representing the
110equivelent parse tree. The resulting tuple representation can be used
111for inspection or the creation of a new parse tree in tuple form.
112This function does not fail so long as memory is available to build
113the tuple representation.
114\end{funcdesc}
115
116
117\begin{funcdesc}{compileast}{ast\optional{\, filename \code{= '<ast>'}}}
118The Python byte compiler can be invoked on an AST object to produce
119code objects which can be used as part of an \code{exec} statement or
120a call to the built-in \code{eval()} function. This function provides
121the interface to the compiler, passing the internal parse tree from
122\code{\var{ast}} to the parser, using the source file name specified
123by the \code{\var{filename}} parameter. The default value supplied
124for \code{\var{filename}} indicates that the source was an AST object.
125\end{funcdesc}
126
127
128\begin{funcdesc}{expr}{string}
129The \code{expr()} function parses the parameter \code{\var{string}}
130as if it were an input to \code{compile(\var{string}, 'eval')}. If
131the parse succeeds, an AST object is created to hold the internal
132parse tree representation, otherwise an appropriate exception is
133thrown.
134\end{funcdesc}
135
136
137\begin{funcdesc}{isexpr}{ast}
138When \code{\var{ast}} represents an \code{'eval'} form, this function
139returns a true value (\code{1}), otherwise it returns false
140(\code{0}). This is useful, since code objects normally cannot be
141queried for this information using existing built-in functions. Note
142that the code objects created by \code{compileast()} cannot be queried
143like this either, and are identical to those created by the built-in
144\code{compile()} function.
145\end{funcdesc}
146
147
148\begin{funcdesc}{issuite}{ast}
149This function mirrors \code{isexpr()} in that it reports whether an
150AST object represents a suite of statements. It is not safe to assume
151that this function is equivelent to \code{not isexpr(\var{ast})}, as
152additional syntactic fragments may be supported in the future.
153\end{funcdesc}
154
155
156\begin{funcdesc}{suite}{string}
157The \code{suite()} function parses the parameter \code{\var{string}}
158as if it were an input to \code{compile(\var{string}, 'exec')}. If
159the parse succeeds, an AST object is created to hold the internal
160parse tree representation, otherwise an appropriate exception is
161thrown.
162\end{funcdesc}
163
164
165\begin{funcdesc}{tuple2ast}{tuple}
166This function accepts a parse tree represented as a tuple and builds
167an internal representation if possible. If it can validate that the
168tree conforms to the Python syntax and all nodes are valid node types
169in the host version of Python, an AST object is created from the
170internal representation and returned to the called. If there is a
171problem creating the internal representation, or if the tree cannot be
172validated, a \code{ParserError} exception is thrown. An AST object
173created this way should not be assumed to compile correctly; normal
174exceptions thrown by compilation may still be initiated when the AST
175object is passed to \code{compileast()}. This will normally indicate
176problems not related to syntax (such as a \code{MemoryError}
177exception).
178\end{funcdesc}
179
180
181% --- 3.4. ---
182% Exceptions are described using a ``excdesc'' block. This has only
183% one parameter: the exception name.
184
185\subsection{Exceptions and Error Handling}
186
187The parser module defines a single exception, but may also pass other
188built-in exceptions from other portions of the Python runtime
189environment. See each function for information about the exceptions
190it can raise.
191
192\begin{excdesc}{ParserError}
193Exception raised when a failure occurs within the parser module. This
194is generally produced for validation failures rather than the built in
195\code{SyntaxError} thrown during normal parsing.
196The exception argument is either a string describing the reason of the
197failure or a tuple containing a tuple causing the failure from a parse
198tree passed to \code{tuple2ast()} and an explanatory string. Calls to
199\code{tuple2ast()} need to be able to handle either type of exception,
200while calls to other functions in the module will only need to be
201aware of the simple string values.
202\end{excdesc}
203
204Note that the functions \code{compileast()}, \code{expr()}, and
205\code{suite()} may throw exceptions which are normally thrown by the
206parsing and compilation process. These include the built in
207exceptions \code{MemoryError}, \code{OverflowError},
208\code{SyntaxError}, and \code{SystemError}. In these cases, these
209exceptions carry all the meaning normally associated with them. Refer
210to the descriptions of each function for detailed information.
211
212% ---- 3.5. ----
213% There is no standard block type for classes. I generally use
214% ``funcdesc'' blocks, since class instantiation looks very much like
215% a function call.
216
217
218% ==== 4. ====
219% Now is probably a good time for a complete example. (Alternatively,
220% an example giving the flavor of the module may be given before the
221% detailed list of functions.)
222
223\subsection{Example}
224
225A simple example:
226
227\begin{verbatim}
228>>> import parser
229>>> ast = parser.expr('a + 5')
230>>> code = parser.compileast(ast)
231>>> a = 5
232>>> eval(code)
23310
234\end{verbatim}
235
236
237\subsection{AST Objects}
238
239AST objects (returned by \code{expr()}, \code{suite()}, and
240\code{tuple2ast()}, described above) have no methods of their own.
241Some of the functions defined which accept an AST object as their
242first argument may change to object methods in the future.
243
244Ordered and equality comparisons are supported between AST objects.
245
246\renewcommand{\indexsubitem}{(ast method)}
247
248%\begin{funcdesc}{empty}{}
249%Empty the can into the trash.
250%\end{funcdesc}