| % libparser.tex | 
 | % | 
 | % Introductory documentation for the new parser built-in module. | 
 | % | 
 | % Copyright 1995 Virginia Polytechnic Institute and State University | 
 | % and Fred L. Drake, Jr.  This copyright notice must be distributed on | 
 | % all copies, but this document otherwise may be distributed as part | 
 | % of the Python distribution.  No fee may be charged for this document | 
 | % in any representation, either on paper or electronically.  This | 
 | % restriction does not affect other elements in a distributed package | 
 | % in any way. | 
 | % | 
 |  | 
 | \section{Built-in Module \sectcode{parser}} | 
 | \bimodindex{parser} | 
 |  | 
 | The \code{parser} module provides an interface to Python's internal | 
 | parser and byte-code compiler.  The primary purpose for this interface | 
 | is to allow Python code to edit the parse tree of a Python expression | 
 | and create executable code from this.  This is better than trying | 
 | to parse and modify an arbitrary Python code fragment as a string | 
 | because parsing is performed in a manner identical to the code | 
 | forming the application.  It is also faster. | 
 |  | 
 | There are a few things to note about this module which are important | 
 | to making use of the data structures created.  This is not a tutorial | 
 | on editing the parse trees for Python code, but some examples of using | 
 | the \code{parser} module are presented. | 
 |  | 
 | Most importantly, a good understanding of the Python grammar processed | 
 | by the internal parser is required.  For full information on the | 
 | language syntax, refer to the Language Reference.  The parser itself | 
 | is created from a grammar specification defined in the file | 
 | \file{Grammar/Grammar} in the standard Python distribution.  The parse | 
 | trees stored in the ``AST objects'' created by this module are the | 
 | actual output from the internal parser when created by the | 
 | \code{expr()} or \code{suite()} functions, described below.  The AST | 
 | objects created by \code{sequence2ast()} faithfully simulate those | 
 | structures.  Be aware that the values of the sequences which are | 
 | considered ``correct'' will vary from one version of Python to another | 
 | as the formal grammar for the language is revised.  However, | 
 | transporting code from one Python version to another as source text | 
 | will always allow correct parse trees to be created in the target | 
 | version, with the only restriction being that migrating to an older | 
 | version of the interpreter will not support more recent language | 
 | constructs.  The parse trees are not typically compatible from one | 
 | version to another, whereas source code has always been | 
 | forward-compatible. | 
 |  | 
 | Each element of the sequences returned by \code{ast2list} or | 
 | \code{ast2tuple()} has a simple form.  Sequences representing | 
 | non-terminal elements in the grammar always have a length greater than | 
 | one.  The first element is an integer which identifies a production in | 
 | the grammar.  These integers are given symbolic names in the C header | 
 | file \file{Include/graminit.h} and the Python module | 
 | \file{Lib/symbol.py}.  Each additional element of the sequence represents | 
 | a component of the production as recognized in the input string: these | 
 | are always sequences which have the same form as the parent.  An | 
 | important aspect of this structure which should be noted is that | 
 | keywords used to identify the parent node type, such as the keyword | 
 | \code{if} in an \code{if_stmt}, are included in the node tree without | 
 | any special treatment.  For example, the \code{if} keyword is | 
 | represented by the tuple \code{(1, 'if')}, where \code{1} is the | 
 | numeric value associated with all \code{NAME} tokens, including | 
 | variable and function names defined by the user.  In an alternate form | 
 | returned when line number information is requested, the same token | 
 | might be represented as \code{(1, 'if', 12)}, where the \code{12} | 
 | represents the line number at which the terminal symbol was found. | 
 |  | 
 | Terminal elements are represented in much the same way, but without | 
 | any child elements and the addition of the source text which was | 
 | identified.  The example of the \code{if} keyword above is | 
 | representative.  The various types of terminal symbols are defined in | 
 | the C header file \file{Include/token.h} and the Python module | 
 | \file{Lib/token.py}. | 
 |  | 
 | The AST objects are not required to support the functionality of this | 
 | module, but are provided for three purposes: to allow an application | 
 | to amortize the cost of processing complex parse trees, to provide a | 
 | parse tree representation which conserves memory space when compared | 
 | to the Python list or tuple representation, and to ease the creation | 
 | of additional modules in C which manipulate parse trees.  A simple | 
 | ``wrapper'' class may be created in Python to hide the use of AST | 
 | objects; the \code{AST} library module provides a variety of such | 
 | classes. | 
 |  | 
 | The \code{parser} module defines functions for a few distinct | 
 | purposes.  The most important purposes are to create AST objects and | 
 | to convert AST objects to other representations such as parse trees | 
 | and compiled code objects, but there are also functions which serve to | 
 | query the type of parse tree represented by an AST object. | 
 |  | 
 | \renewcommand{\indexsubitem}{(in module parser)} | 
 |  | 
 |  | 
 | \subsection{Creating AST Objects} | 
 |  | 
 | AST objects may be created from source code or from a parse tree. | 
 | When creating an AST object from source, different functions are used | 
 | to create the \code{'eval'} and \code{'exec'} forms. | 
 |  | 
 | \begin{funcdesc}{expr}{string} | 
 | The \code{expr()} function parses the parameter \code{\var{string}} | 
 | as if it were an input to \code{compile(\var{string}, 'eval')}.  If | 
 | the parse succeeds, an AST object is created to hold the internal | 
 | parse tree representation, otherwise an appropriate exception is | 
 | thrown. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{suite}{string} | 
 | The \code{suite()} function parses the parameter \code{\var{string}} | 
 | as if it were an input to \code{compile(\var{string}, 'exec')}.  If | 
 | the parse succeeds, an AST object is created to hold the internal | 
 | parse tree representation, otherwise an appropriate exception is | 
 | thrown. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{sequence2ast}{sequence} | 
 | This function accepts a parse tree represented as a sequence and | 
 | builds an internal representation if possible.  If it can validate | 
 | that the tree conforms to the Python grammar and all nodes are valid | 
 | node types in the host version of Python, an AST object is created | 
 | from the internal representation and returned to the called.  If there | 
 | is a problem creating the internal representation, or if the tree | 
 | cannot be validated, a \code{ParserError} exception is thrown.  An AST | 
 | object created this way should not be assumed to compile correctly; | 
 | normal exceptions thrown by compilation may still be initiated when | 
 | the AST object is passed to \code{compileast()}.  This may indicate | 
 | problems not related to syntax (such as a \code{MemoryError} | 
 | exception), but may also be due to constructs such as the result of | 
 | parsing \code{del f(0)}, which escapes the Python parser but is | 
 | checked by the bytecode compiler. | 
 |  | 
 | Sequences representing terminal tokens may be represented as either | 
 | two-element lists of the form \code{(1, 'name')} or as three-element | 
 | lists of the form \code{(1, 'name', 56)}.  If the third element is | 
 | present, it is assumed to be a valid line number.  The line number | 
 | may be specified for any subset of the terminal symbols in the input | 
 | tree. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{tuple2ast}{sequence} | 
 | This is the same function as \code{sequence2ast()}.  This entry point | 
 | is maintained for backward compatibility. | 
 | \end{funcdesc} | 
 |  | 
 |  | 
 | \subsection{Converting AST Objects} | 
 |  | 
 | AST objects, regardless of the input used to create them, may be | 
 | converted to parse trees represented as list- or tuple- trees, or may | 
 | be compiled into executable code objects.  Parse trees may be | 
 | extracted with or without line numbering information. | 
 |  | 
 | \begin{funcdesc}{ast2list}{ast\optional{\, line_info\code{ = 0}}} | 
 | This function accepts an AST object from the caller in | 
 | \code{\var{ast}} and returns a Python list representing the | 
 | equivelent parse tree.  The resulting list representation can be used | 
 | for inspection or the creation of a new parse tree in list form.  This | 
 | function does not fail so long as memory is available to build the | 
 | list representation.  If the parse tree will only be used for | 
 | inspection, \code{ast2tuple()} should be used instead to reduce memory | 
 | consumption and fragmentation.  When the list representation is | 
 | required, this function is significantly faster than retrieving a | 
 | tuple representation and converting that to nested lists. | 
 |  | 
 | If \code{\var{line_info}} is true, line number information will be | 
 | included for all terminal tokens as a third element of the list | 
 | representing the token.  This information is omitted if the flag is | 
 | false or omitted. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{ast2tuple}{ast\optional{\, line_info\code{ = 0}}} | 
 | This function accepts an AST object from the caller in | 
 | \code{\var{ast}} and returns a Python tuple representing the | 
 | equivelent parse tree.  Other than returning a tuple instead of a | 
 | list, this function is identical to \code{ast2list()}. | 
 |  | 
 | If \code{\var{line_info}} is true, line number information will be | 
 | included for all terminal tokens as a third element of the list | 
 | representing the token.  This information is omitted if the flag is | 
 | false or omitted. | 
 | \end{funcdesc} | 
 |  | 
 | \begin{funcdesc}{compileast}{ast\optional{\, filename\code{ = '<ast>'}}} | 
 | The Python byte compiler can be invoked on an AST object to produce | 
 | code objects which can be used as part of an \code{exec} statement or | 
 | a call to the built-in \code{eval()} function.  This function provides | 
 | the interface to the compiler, passing the internal parse tree from | 
 | \code{\var{ast}} to the parser, using the source file name specified | 
 | by the \code{\var{filename}} parameter.  The default value supplied | 
 | for \code{\var{filename}} indicates that the source was an AST object. | 
 |  | 
 | Compiling an AST object may result in exceptions related to | 
 | compilation; an example would be a \code{SyntaxError} caused by the | 
 | parse tree for \code{del f(0)}: this statement is considered legal | 
 | within the formal grammar for Python but is not a legal language | 
 | construct.  The \code{SyntaxError} raised for this condition is | 
 | actually generated by the Python byte-compiler normally, which is why | 
 | it can be raised at this point by the \code{parser} module.  Most | 
 | causes of compilation failure can be diagnosed programmatically by | 
 | inspection of the parse tree. | 
 | \end{funcdesc} | 
 |  | 
 |  | 
 | \subsection{Queries on AST Objects} | 
 |  | 
 | Two functions are provided which allow an application to determine if | 
 | an AST was create as an expression or a suite.  Neither of these | 
 | functions can be used to determine if an AST was created from source | 
 | code via \code{expr()} or \code{suite()} or from a parse tree via | 
 | \code{sequence2ast()}. | 
 |  | 
 | \begin{funcdesc}{isexpr}{ast} | 
 | When \code{\var{ast}} represents an \code{'eval'} form, this function | 
 | returns a true value (\code{1}), otherwise it returns false | 
 | (\code{0}).  This is useful, since code objects normally cannot be | 
 | queried for this information using existing built-in functions.  Note | 
 | that the code objects created by \code{compileast()} cannot be queried | 
 | like this either, and are identical to those created by the built-in | 
 | \code{compile()} function. | 
 | \end{funcdesc} | 
 |  | 
 |  | 
 | \begin{funcdesc}{issuite}{ast} | 
 | This function mirrors \code{isexpr()} in that it reports whether an | 
 | AST object represents an \code{'exec'} form, commonly known as a | 
 | ``suite.''  It is not safe to assume that this function is equivelent | 
 | to \code{not isexpr(\var{ast})}, as additional syntactic fragments may | 
 | be supported in the future. | 
 | \end{funcdesc} | 
 |  | 
 |  | 
 | \subsection{Exceptions and Error Handling} | 
 |  | 
 | The parser module defines a single exception, but may also pass other | 
 | built-in exceptions from other portions of the Python runtime | 
 | environment.  See each function for information about the exceptions | 
 | it can raise. | 
 |  | 
 | \begin{excdesc}{ParserError} | 
 | Exception raised when a failure occurs within the parser module.  This | 
 | is generally produced for validation failures rather than the built in | 
 | \code{SyntaxError} thrown during normal parsing. | 
 | The exception argument is either a string describing the reason of the | 
 | failure or a tuple containing a sequence causing the failure from a parse | 
 | tree passed to \code{sequence2ast()} and an explanatory string.  Calls to | 
 | \code{sequence2ast()} need to be able to handle either type of exception, | 
 | while calls to other functions in the module will only need to be | 
 | aware of the simple string values. | 
 | \end{excdesc} | 
 |  | 
 | Note that the functions \code{compileast()}, \code{expr()}, and | 
 | \code{suite()} may throw exceptions which are normally thrown by the | 
 | parsing and compilation process.  These include the built in | 
 | exceptions \code{MemoryError}, \code{OverflowError}, | 
 | \code{SyntaxError}, and \code{SystemError}.  In these cases, these | 
 | exceptions carry all the meaning normally associated with them.  Refer | 
 | to the descriptions of each function for detailed information. | 
 |  | 
 |  | 
 | \subsection{AST Objects} | 
 |  | 
 | AST objects returned by \code{expr()}, \code{suite()}, and | 
 | \code{sequence2ast()} have no methods of their own. | 
 | Some of the functions defined which accept an AST object as their | 
 | first argument may change to object methods in the future.  The type | 
 | of these objects is available as \code{ASTType} in the module. | 
 |  | 
 | Ordered and equality comparisons are supported between AST objects. | 
 |  | 
 |  | 
 | \subsection{Examples} | 
 |  | 
 | The parser modules allows operations to be performed on the parse tree | 
 | of Python source code before the bytecode is generated, and provides | 
 | for inspection of the parse tree for information gathering purposes. | 
 | Two examples are presented.  The simple example demonstrates emulation | 
 | of the \code{compile()} built-in function and the complex example | 
 | shows the use of a parse tree for information discovery. | 
 |  | 
 | \subsubsection{Emulation of \sectcode{compile()}} | 
 |  | 
 | While many useful operations may take place between parsing and | 
 | bytecode generation, the simplest operation is to do nothing.  For | 
 | this purpose, using the \code{parser} module to produce an | 
 | intermediate data structure is equivelent to the code | 
 |  | 
 | \begin{verbatim} | 
 | >>> code = compile('a + 5', 'eval') | 
 | >>> a = 5 | 
 | >>> eval(code) | 
 | 10 | 
 | \end{verbatim} | 
 |  | 
 | The equivelent operation using the \code{parser} module is somewhat | 
 | longer, and allows the intermediate internal parse tree to be retained | 
 | as an AST object: | 
 |  | 
 | \begin{verbatim} | 
 | >>> import parser | 
 | >>> ast = parser.expr('a + 5') | 
 | >>> code = parser.compileast(ast) | 
 | >>> a = 5 | 
 | >>> eval(code) | 
 | 10 | 
 | \end{verbatim} | 
 |  | 
 | An application which needs both AST and code objects can package this | 
 | code into readily available functions: | 
 |  | 
 | \begin{verbatim} | 
 | import parser | 
 |  | 
 | def load_suite(source_string): | 
 |     ast = parser.suite(source_string) | 
 |     code = parser.compileast(ast) | 
 |     return ast, code | 
 |  | 
 | def load_expression(source_string): | 
 |     ast = parser.expr(source_string) | 
 |     code = parser.compileast(ast) | 
 |     return ast, code | 
 | \end{verbatim} | 
 |  | 
 | \subsubsection{Information Discovery} | 
 |  | 
 | Some applications benefit from direct access to the parse tree.  The | 
 | remainder of this section demonstrates how the parse tree provides | 
 | access to module documentation defined in docstrings without requiring | 
 | that the code being examined be loaded into a running interpreter via | 
 | \code{import}.  This can be very useful for performing analyses of | 
 | untrusted code. | 
 |  | 
 | Generally, the example will demonstrate how the parse tree may be | 
 | traversed to distill interesting information.  Two functions and a set | 
 | of classes are developed which provide programmatic access to high | 
 | level function and class definitions provided by a module.  The | 
 | classes extract information from the parse tree and provide access to | 
 | the information at a useful semantic level, one function provides a | 
 | simple low-level pattern matching capability, and the other function | 
 | defines a high-level interface to the classes by handling file | 
 | operations on behalf of the caller.  All source files mentioned here | 
 | which are not part of the Python installation are located in the | 
 | \file{Demo/parser/} directory of the distribution. | 
 |  | 
 | The dynamic nature of Python allows the programmer a great deal of | 
 | flexibility, but most modules need only a limited measure of this when | 
 | defining classes, functions, and methods.  In this example, the only | 
 | definitions that will be considered are those which are defined in the | 
 | top level of their context, e.g., a function defined by a \code{def} | 
 | statement at column zero of a module, but not a function defined | 
 | within a branch of an \code{if} ... \code{else} construct, though | 
 | there are some good reasons for doing so in some situations.  Nesting | 
 | of definitions will be handled by the code developed in the example. | 
 |  | 
 | To construct the upper-level extraction methods, we need to know what | 
 | the parse tree structure looks like and how much of it we actually | 
 | need to be concerned about.  Python uses a moderately deep parse tree | 
 | so there are a large number of intermediate nodes.  It is important to | 
 | read and understand the formal grammar used by Python.  This is | 
 | specified in the file \file{Grammar/Grammar} in the distribution. | 
 | Consider the simplest case of interest when searching for docstrings: | 
 | a module consisting of a docstring and nothing else.  (See file | 
 | \file{docstring.py}.) | 
 |  | 
 | \begin{verbatim} | 
 | """Some documentation. | 
 | """ | 
 | \end{verbatim} | 
 |  | 
 | Using the interpreter to take a look at the parse tree, we find a | 
 | bewildering mass of numbers and parentheses, with the documentation | 
 | buried deep in nested tuples. | 
 |  | 
 | \begin{verbatim} | 
 | >>> import parser | 
 | >>> import pprint | 
 | >>> ast = parser.suite(open('docstring.py').read()) | 
 | >>> tup = parser.ast2tuple(ast) | 
 | >>> pprint.pprint(tup) | 
 | (257, | 
 |  (264, | 
 |   (265, | 
 |    (266, | 
 |     (267, | 
 |      (307, | 
 |       (287, | 
 |        (288, | 
 |         (289, | 
 |          (290, | 
 |           (292, | 
 |            (293, | 
 |             (294, | 
 |              (295, | 
 |               (296, | 
 |                (297, | 
 |                 (298, | 
 |                  (299, | 
 |                   (300, (3, '"""Some documentation.\012"""'))))))))))))))))), | 
 |    (4, ''))), | 
 |  (4, ''), | 
 |  (0, '')) | 
 | \end{verbatim} | 
 |  | 
 | The numbers at the first element of each node in the tree are the node | 
 | types; they map directly to terminal and non-terminal symbols in the | 
 | grammar.  Unfortunately, they are represented as integers in the | 
 | internal representation, and the Python structures generated do not | 
 | change that.  However, the \code{symbol} and \code{token} modules | 
 | provide symbolic names for the node types and dictionaries which map | 
 | from the integers to the symbolic names for the node types. | 
 |  | 
 | In the output presented above, the outermost tuple contains four | 
 | elements: the integer \code{257} and three additional tuples.  Node | 
 | type \code{257} has the symbolic name \code{file_input}.  Each of | 
 | these inner tuples contains an integer as the first element; these | 
 | integers, \code{264}, \code{4}, and \code{0}, represent the node types | 
 | \code{stmt}, \code{NEWLINE}, and \code{ENDMARKER}, respectively. | 
 | Note that these values may change depending on the version of Python | 
 | you are using; consult \file{symbol.py} and \file{token.py} for | 
 | details of the mapping.  It should be fairly clear that the outermost | 
 | node is related primarily to the input source rather than the contents | 
 | of the file, and may be disregarded for the moment.  The \code{stmt} | 
 | node is much more interesting.  In particular, all docstrings are | 
 | found in subtrees which are formed exactly as this node is formed, | 
 | with the only difference being the string itself.  The association | 
 | between the docstring in a similar tree and the defined entity (class, | 
 | function, or module) which it describes is given by the position of | 
 | the docstring subtree within the tree defining the described | 
 | structure. | 
 |  | 
 | By replacing the actual docstring with something to signify a variable | 
 | component of the tree, we allow a simple pattern matching approach to | 
 | check any given subtree for equivelence to the general pattern for | 
 | docstrings.  Since the example demonstrates information extraction, we | 
 | can safely require that the tree be in tuple form rather than list | 
 | form, allowing a simple variable representation to be | 
 | \code{['variable_name']}.  A simple recursive function can implement | 
 | the pattern matching, returning a boolean and a dictionary of variable | 
 | name to value mappings.  (See file \file{example.py}.) | 
 |  | 
 | \begin{verbatim} | 
 | from types import ListType, TupleType | 
 |  | 
 | def match(pattern, data, vars=None): | 
 |     if vars is None: | 
 |         vars = {} | 
 |     if type(pattern) is ListType: | 
 |         vars[pattern[0]] = data | 
 |         return 1, vars | 
 |     if type(pattern) is not TupleType: | 
 |         return (pattern == data), vars | 
 |     if len(data) != len(pattern): | 
 |         return 0, vars | 
 |     for pattern, data in map(None, pattern, data): | 
 |         same, vars = match(pattern, data, vars) | 
 |         if not same: | 
 |             break | 
 |     return same, vars | 
 | \end{verbatim} | 
 |  | 
 | Using this simple representation for syntactic variables and the symbolic | 
 | node types, the pattern for the candidate docstring subtrees becomes | 
 | fairly readable.  (See file \file{example.py}.) | 
 |  | 
 | \begin{verbatim} | 
 | import symbol | 
 | import token | 
 |  | 
 | DOCSTRING_STMT_PATTERN = ( | 
 |     symbol.stmt, | 
 |     (symbol.simple_stmt, | 
 |      (symbol.small_stmt, | 
 |       (symbol.expr_stmt, | 
 |        (symbol.testlist, | 
 |         (symbol.test, | 
 |          (symbol.and_test, | 
 |           (symbol.not_test, | 
 |            (symbol.comparison, | 
 |             (symbol.expr, | 
 |              (symbol.xor_expr, | 
 |               (symbol.and_expr, | 
 |                (symbol.shift_expr, | 
 |                 (symbol.arith_expr, | 
 |                  (symbol.term, | 
 |                   (symbol.factor, | 
 |                    (symbol.power, | 
 |                     (symbol.atom, | 
 |                      (token.STRING, ['docstring']) | 
 |                      )))))))))))))))), | 
 |      (token.NEWLINE, '') | 
 |      )) | 
 | \end{verbatim} | 
 |  | 
 | Using the \code{match()} function with this pattern, extracting the | 
 | module docstring from the parse tree created previously is easy: | 
 |  | 
 | \begin{verbatim} | 
 | >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1]) | 
 | >>> found | 
 | 1 | 
 | >>> vars | 
 | {'docstring': '"""Some documentation.\012"""'} | 
 | \end{verbatim} | 
 |  | 
 | Once specific data can be extracted from a location where it is | 
 | expected, the question of where information can be expected | 
 | needs to be answered.  When dealing with docstrings, the answer is | 
 | fairly simple: the docstring is the first \code{stmt} node in a code | 
 | block (\code{file_input} or \code{suite} node types).  A module | 
 | consists of a single \code{file_input} node, and class and function | 
 | definitions each contain exactly one \code{suite} node.  Classes and | 
 | functions are readily identified as subtrees of code block nodes which | 
 | start with \code{(stmt, (compound_stmt, (classdef, ...} or | 
 | \code{(stmt, (compound_stmt, (funcdef, ...}.  Note that these subtrees | 
 | cannot be matched by \code{match()} since it does not support multiple | 
 | sibling nodes to match without regard to number.  A more elaborate | 
 | matching function could be used to overcome this limitation, but this | 
 | is sufficient for the example. | 
 |  | 
 | Given the ability to determine whether a statement might be a | 
 | docstring and extract the actual string from the statement, some work | 
 | needs to be performed to walk the parse tree for an entire module and | 
 | extract information about the names defined in each context of the | 
 | module and associate any docstrings with the names.  The code to | 
 | perform this work is not complicated, but bears some explanation. | 
 |  | 
 | The public interface to the classes is straightforward and should | 
 | probably be somewhat more flexible.  Each ``major'' block of the | 
 | module is described by an object providing several methods for inquiry | 
 | and a constructor which accepts at least the subtree of the complete | 
 | parse tree which it represents.  The \code{ModuleInfo} constructor | 
 | accepts an optional \code{\var{name}} parameter since it cannot | 
 | otherwise determine the name of the module. | 
 |  | 
 | The public classes include \code{ClassInfo}, \code{FunctionInfo}, | 
 | and \code{ModuleInfo}.  All objects provide the | 
 | methods \code{get_name()}, \code{get_docstring()}, | 
 | \code{get_class_names()}, and \code{get_class_info()}.  The | 
 | \code{ClassInfo} objects support \code{get_method_names()} and | 
 | \code{get_method_info()} while the other classes provide | 
 | \code{get_function_names()} and \code{get_function_info()}. | 
 |  | 
 | Within each of the forms of code block that the public classes | 
 | represent, most of the required information is in the same form and is | 
 | accessed in the same way, with classes having the distinction that | 
 | functions defined at the top level are referred to as ``methods.'' | 
 | Since the difference in nomenclature reflects a real semantic | 
 | distinction from functions defined outside of a class, the | 
 | implementation needs to maintain the distinction. | 
 | Hence, most of the functionality of the public classes can be | 
 | implemented in a common base class, \code{SuiteInfoBase}, with the | 
 | accessors for function and method information provided elsewhere. | 
 | Note that there is only one class which represents function and method | 
 | information; this paralels the use of the \code{def} statement to | 
 | define both types of elements. | 
 |  | 
 | Most of the accessor functions are declared in \code{SuiteInfoBase} | 
 | and do not need to be overriden by subclasses.  More importantly, the | 
 | extraction of most information from a parse tree is handled through a | 
 | method called by the \code{SuiteInfoBase} constructor.  The example | 
 | code for most of the classes is clear when read alongside the formal | 
 | grammar, but the method which recursively creates new information | 
 | objects requires further examination.  Here is the relevant part of | 
 | the \code{SuiteInfoBase} definition from \file{example.py}: | 
 |  | 
 | \begin{verbatim} | 
 | class SuiteInfoBase: | 
 |     _docstring = '' | 
 |     _name = '' | 
 |  | 
 |     def __init__(self, tree = None): | 
 |         self._class_info = {} | 
 |         self._function_info = {} | 
 |         if tree: | 
 |             self._extract_info(tree) | 
 |  | 
 |     def _extract_info(self, tree): | 
 |         # extract docstring | 
 |         if len(tree) == 2: | 
 |             found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1]) | 
 |         else: | 
 |             found, vars = match(DOCSTRING_STMT_PATTERN, tree[3]) | 
 |         if found: | 
 |             self._docstring = eval(vars['docstring']) | 
 |         # discover inner definitions | 
 |         for node in tree[1:]: | 
 |             found, vars = match(COMPOUND_STMT_PATTERN, node) | 
 |             if found: | 
 |                 cstmt = vars['compound'] | 
 |                 if cstmt[0] == symbol.funcdef: | 
 |                     name = cstmt[2][1] | 
 |                     self._function_info[name] = FunctionInfo(cstmt) | 
 |                 elif cstmt[0] == symbol.classdef: | 
 |                     name = cstmt[2][1] | 
 |                     self._class_info[name] = ClassInfo(cstmt) | 
 | \end{verbatim} | 
 |  | 
 | After initializing some internal state, the constructor calls the | 
 | \code{_extract_info()} method.  This method performs the bulk of the | 
 | information extraction which takes place in the entire example.  The | 
 | extraction has two distinct phases: the location of the docstring for | 
 | the parse tree passed in, and the discovery of additional definitions | 
 | within the code block represented by the parse tree. | 
 |  | 
 | The initial \code{if} test determines whether the nested suite is of | 
 | the ``short form'' or the ``long form.''  The short form is used when | 
 | the code block is on the same line as the definition of the code | 
 | block, as in | 
 |  | 
 | \begin{verbatim} | 
 | def square(x): "Square an argument."; return x ** 2 | 
 | \end{verbatim} | 
 |  | 
 | while the long form uses an indented block and allows nested | 
 | definitions: | 
 |  | 
 | \begin{verbatim} | 
 | def make_power(exp): | 
 |     "Make a function that raises an argument to the exponent `exp'." | 
 |     def raiser(x, y=exp): | 
 |         return x ** y | 
 |     return raiser | 
 | \end{verbatim} | 
 |  | 
 | When the short form is used, the code block may contain a docstring as | 
 | the first, and possibly only, \code{small_stmt} element.  The | 
 | extraction of such a docstring is slightly different and requires only | 
 | a portion of the complete pattern used in the more common case.  As | 
 | implemented, the docstring will only be found if there is only | 
 | one \code{small_stmt} node in the \code{simple_stmt} node.  Since most | 
 | functions and methods which use the short form do not provide a | 
 | docstring, this may be considered sufficient.  The extraction of the | 
 | docstring proceeds using the \code{match()} function as described | 
 | above, and the value of the docstring is stored as an attribute of the | 
 | \code{SuiteInfoBase} object. | 
 |  | 
 | After docstring extraction, a simple definition discovery | 
 | algorithm operates on the \code{stmt} nodes of the \code{suite} node.  The | 
 | special case of the short form is not tested; since there are no | 
 | \code{stmt} nodes in the short form, the algorithm will silently skip | 
 | the single \code{simple_stmt} node and correctly not discover any | 
 | nested definitions. | 
 |  | 
 | Each statement in the code block is categorized as | 
 | a class definition, function or method definition, or | 
 | something else.  For the definition statements, the name of the | 
 | element defined is extracted and a representation object | 
 | appropriate to the definition is created with the defining subtree | 
 | passed as an argument to the constructor.  The repesentation objects | 
 | are stored in instance variables and may be retrieved by name using | 
 | the appropriate accessor methods. | 
 |  | 
 | The public classes provide any accessors required which are more | 
 | specific than those provided by the \code{SuiteInfoBase} class, but | 
 | the real extraction algorithm remains common to all forms of code | 
 | blocks.  A high-level function can be used to extract the complete set | 
 | of information from a source file.  (See file \file{example.py}.) | 
 |  | 
 | \begin{verbatim} | 
 | def get_docs(fileName): | 
 |     source = open(fileName).read() | 
 |     import os | 
 |     basename = os.path.basename(os.path.splitext(fileName)[0]) | 
 |     import parser | 
 |     ast = parser.suite(source) | 
 |     tup = parser.ast2tuple(ast) | 
 |     return ModuleInfo(tup, basename) | 
 | \end{verbatim} | 
 |  | 
 | This provides an easy-to-use interface to the documentation of a | 
 | module.  If information is required which is not extracted by the code | 
 | of this example, the code may be extended at clearly defined points to | 
 | provide additional capabilities. | 
 |  | 
 |  | 
 | %% | 
 | %%  end of file |