Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 1 | % libparser.tex |
| 2 | % |
| 3 | % Introductory documentation for the new parser built-in module. |
| 4 | % |
| 5 | % Copyright 1995 Virginia Polytechnic Institute and State University |
| 6 | % and Fred L. Drake, Jr. This copyright notice must be distributed on |
| 7 | % all copies, but this document otherwise may be distributed as part |
| 8 | % of the Python distribution. No fee may be charged for this document |
| 9 | % in any representation, either on paper or electronically. This |
| 10 | % restriction does not affect other elements in a distributed package |
| 11 | % in any way. |
| 12 | % |
| 13 | |
| 14 | \section{Built-in Module \sectcode{parser}} |
| 15 | \bimodindex{parser} |
| 16 | |
| 17 | |
| 18 | % ==== 2. ==== |
| 19 | % Give a short overview of what the module does. |
| 20 | % If it is platform specific, mention this. |
| 21 | % Mention other important restrictions or general operating principles. |
| 22 | |
| 23 | The \code{parser} module provides an interface to Python's internal |
| 24 | parser and byte-code compiler. The primary purpose for this interface |
| 25 | is to allow Python code to edit the parse tree of a Python expression |
| 26 | and create executable code from this. This can be better than trying |
| 27 | to parse and modify an arbitrary Python code fragment as a string, and |
| 28 | ensures that parsing is performed in a manner identical to the code |
| 29 | forming the application. It's also faster. |
| 30 | |
| 31 | There are a few things to note about this module which are important |
| 32 | to making use of the data structures created. This is not a tutorial |
| 33 | on editing the parse trees for Python code. |
| 34 | |
| 35 | Most importantly, a good understanding of the Python grammar processed |
| 36 | by the internal parser is required. For full information on the |
| 37 | language syntax, refer to the Language Reference. The parser itself |
| 38 | is created from a grammar specification defined in the file |
| 39 | \code{Grammar/Grammar} in the standard Python distribution. The parse |
| 40 | trees stored in the ``AST objects'' created by this module are the |
| 41 | actual output from the internal parser when created by the |
| 42 | \code{expr()} or \code{suite()} functions, described below. The AST |
| 43 | objects created by \code{tuple2ast()} faithfully simulate those |
| 44 | structures. |
| 45 | |
| 46 | Each element of the tuples returned by \code{ast2tuple()} has a simple |
| 47 | form. Tuples representing non-terminal elements in the grammar always |
| 48 | have a length greater than one. The first element is an integer which |
| 49 | identifies a production in the grammar. These integers are given |
| 50 | symbolic names in the C header file \code{Include/graminit.h} and the |
| 51 | Python module \code{Lib/symbol.py}. Each additional element of the |
| 52 | tuple represents a component of the production as recognized in the |
| 53 | input string: these are always tuples which have the same form as the |
| 54 | parent. An important aspect of this structure which should be noted |
| 55 | is that keywords used to identify the parent node type, such as the |
| 56 | keyword \code{if} in an \emph{if\_stmt}, are included in the node tree |
| 57 | without any special treatment. For example, the \code{if} keyword is |
| 58 | represented by the tuple \code{(1, 'if')}, where \code{1} is the |
| 59 | numeric value associated with all \code{NAME} elements, including |
| 60 | variable and function names defined by the user. |
| 61 | |
| 62 | Terminal elements are represented in much the same way, but without |
| 63 | any child elements and the addition of the source text which was |
| 64 | identified. The example of the \code{if} keyword above is |
| 65 | representative. The various types of terminal symbols are defined in |
| 66 | the C header file \code{Include/token.h} and the Python module |
| 67 | \code{Lib/token.py}. |
| 68 | |
| 69 | The AST objects are not actually required to support the functionality |
| 70 | of this module, but are provided for three purposes: to allow an |
| 71 | application to amortize the cost of processing complex parse trees, to |
| 72 | provide a parse tree representation which conserves memory space when |
| 73 | compared to the Python tuple representation, and to ease the creation |
| 74 | of additional modules in C which manipulate parse trees. A simple |
Guido van Rossum | ed43073 | 1996-07-21 02:21:31 +0000 | [diff] [blame^] | 75 | ``wrapper'' module may be created in Python to hide the use of AST |
| 76 | objects. |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 77 | |
| 78 | |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 79 | The \code{parser} module defines the following functions: |
| 80 | |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 81 | \renewcommand{\indexsubitem}{(in module parser)} |
| 82 | |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 83 | \begin{funcdesc}{ast2tuple}{ast} |
| 84 | This function accepts an AST object from the caller in |
| 85 | \code{\var{ast}} and returns a Python tuple representing the |
| 86 | equivelent parse tree. The resulting tuple representation can be used |
| 87 | for inspection or the creation of a new parse tree in tuple form. |
| 88 | This function does not fail so long as memory is available to build |
| 89 | the tuple representation. |
| 90 | \end{funcdesc} |
| 91 | |
| 92 | |
| 93 | \begin{funcdesc}{compileast}{ast\optional{\, filename \code{= '<ast>'}}} |
| 94 | The Python byte compiler can be invoked on an AST object to produce |
| 95 | code objects which can be used as part of an \code{exec} statement or |
| 96 | a call to the built-in \code{eval()} function. This function provides |
| 97 | the interface to the compiler, passing the internal parse tree from |
| 98 | \code{\var{ast}} to the parser, using the source file name specified |
| 99 | by the \code{\var{filename}} parameter. The default value supplied |
| 100 | for \code{\var{filename}} indicates that the source was an AST object. |
| 101 | \end{funcdesc} |
| 102 | |
| 103 | |
| 104 | \begin{funcdesc}{expr}{string} |
| 105 | The \code{expr()} function parses the parameter \code{\var{string}} |
| 106 | as if it were an input to \code{compile(\var{string}, 'eval')}. If |
| 107 | the parse succeeds, an AST object is created to hold the internal |
| 108 | parse tree representation, otherwise an appropriate exception is |
| 109 | thrown. |
| 110 | \end{funcdesc} |
| 111 | |
| 112 | |
| 113 | \begin{funcdesc}{isexpr}{ast} |
| 114 | When \code{\var{ast}} represents an \code{'eval'} form, this function |
| 115 | returns a true value (\code{1}), otherwise it returns false |
| 116 | (\code{0}). This is useful, since code objects normally cannot be |
| 117 | queried for this information using existing built-in functions. Note |
| 118 | that the code objects created by \code{compileast()} cannot be queried |
| 119 | like this either, and are identical to those created by the built-in |
| 120 | \code{compile()} function. |
| 121 | \end{funcdesc} |
| 122 | |
| 123 | |
| 124 | \begin{funcdesc}{issuite}{ast} |
| 125 | This function mirrors \code{isexpr()} in that it reports whether an |
| 126 | AST object represents a suite of statements. It is not safe to assume |
| 127 | that this function is equivelent to \code{not isexpr(\var{ast})}, as |
| 128 | additional syntactic fragments may be supported in the future. |
| 129 | \end{funcdesc} |
| 130 | |
| 131 | |
| 132 | \begin{funcdesc}{suite}{string} |
| 133 | The \code{suite()} function parses the parameter \code{\var{string}} |
| 134 | as if it were an input to \code{compile(\var{string}, 'exec')}. If |
| 135 | the parse succeeds, an AST object is created to hold the internal |
| 136 | parse tree representation, otherwise an appropriate exception is |
| 137 | thrown. |
| 138 | \end{funcdesc} |
| 139 | |
| 140 | |
| 141 | \begin{funcdesc}{tuple2ast}{tuple} |
| 142 | This function accepts a parse tree represented as a tuple and builds |
| 143 | an internal representation if possible. If it can validate that the |
| 144 | tree conforms to the Python syntax and all nodes are valid node types |
| 145 | in the host version of Python, an AST object is created from the |
| 146 | internal representation and returned to the called. If there is a |
| 147 | problem creating the internal representation, or if the tree cannot be |
| 148 | validated, a \code{ParserError} exception is thrown. An AST object |
| 149 | created this way should not be assumed to compile correctly; normal |
| 150 | exceptions thrown by compilation may still be initiated when the AST |
| 151 | object is passed to \code{compileast()}. This will normally indicate |
| 152 | problems not related to syntax (such as a \code{MemoryError} |
| 153 | exception). |
| 154 | \end{funcdesc} |
| 155 | |
| 156 | |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 157 | \subsection{Exceptions and Error Handling} |
| 158 | |
| 159 | The parser module defines a single exception, but may also pass other |
| 160 | built-in exceptions from other portions of the Python runtime |
| 161 | environment. See each function for information about the exceptions |
| 162 | it can raise. |
| 163 | |
| 164 | \begin{excdesc}{ParserError} |
| 165 | Exception raised when a failure occurs within the parser module. This |
| 166 | is generally produced for validation failures rather than the built in |
| 167 | \code{SyntaxError} thrown during normal parsing. |
| 168 | The exception argument is either a string describing the reason of the |
| 169 | failure or a tuple containing a tuple causing the failure from a parse |
| 170 | tree passed to \code{tuple2ast()} and an explanatory string. Calls to |
| 171 | \code{tuple2ast()} need to be able to handle either type of exception, |
| 172 | while calls to other functions in the module will only need to be |
| 173 | aware of the simple string values. |
| 174 | \end{excdesc} |
| 175 | |
| 176 | Note that the functions \code{compileast()}, \code{expr()}, and |
| 177 | \code{suite()} may throw exceptions which are normally thrown by the |
| 178 | parsing and compilation process. These include the built in |
| 179 | exceptions \code{MemoryError}, \code{OverflowError}, |
| 180 | \code{SyntaxError}, and \code{SystemError}. In these cases, these |
| 181 | exceptions carry all the meaning normally associated with them. Refer |
| 182 | to the descriptions of each function for detailed information. |
| 183 | |
Guido van Rossum | 4b73a06 | 1995-10-11 17:30:04 +0000 | [diff] [blame] | 184 | |
| 185 | \subsection{Example} |
| 186 | |
| 187 | A simple example: |
| 188 | |
| 189 | \begin{verbatim} |
| 190 | >>> import parser |
| 191 | >>> ast = parser.expr('a + 5') |
| 192 | >>> code = parser.compileast(ast) |
| 193 | >>> a = 5 |
| 194 | >>> eval(code) |
| 195 | 10 |
| 196 | \end{verbatim} |
| 197 | |
| 198 | |
| 199 | \subsection{AST Objects} |
| 200 | |
| 201 | AST objects (returned by \code{expr()}, \code{suite()}, and |
| 202 | \code{tuple2ast()}, described above) have no methods of their own. |
| 203 | Some of the functions defined which accept an AST object as their |
| 204 | first argument may change to object methods in the future. |
| 205 | |
| 206 | Ordered and equality comparisons are supported between AST objects. |
| 207 | |
| 208 | \renewcommand{\indexsubitem}{(ast method)} |
| 209 | |
| 210 | %\begin{funcdesc}{empty}{} |
| 211 | %Empty the can into the trash. |
| 212 | %\end{funcdesc} |