blob: 2264c89537570e16f95b77adcb3c82d4a02795cd [file] [log] [blame]
Guido van Rossum4b73a061995-10-11 17:30:04 +00001% libparser.tex
2%
3% Introductory documentation for the new parser built-in module.
4%
5% Copyright 1995 Virginia Polytechnic Institute and State University
6% and Fred L. Drake, Jr. This copyright notice must be distributed on
7% all copies, but this document otherwise may be distributed as part
8% of the Python distribution. No fee may be charged for this document
9% in any representation, either on paper or electronically. This
10% restriction does not affect other elements in a distributed package
11% in any way.
12%
13
14\section{Built-in Module \sectcode{parser}}
15\bimodindex{parser}
16
Guido van Rossum4b73a061995-10-11 17:30:04 +000017The \code{parser} module provides an interface to Python's internal
18parser and byte-code compiler. The primary purpose for this interface
19is to allow Python code to edit the parse tree of a Python expression
20and create executable code from this. This can be better than trying
21to parse and modify an arbitrary Python code fragment as a string, and
22ensures that parsing is performed in a manner identical to the code
23forming the application. It's also faster.
24
25There are a few things to note about this module which are important
26to making use of the data structures created. This is not a tutorial
27on editing the parse trees for Python code.
28
29Most importantly, a good understanding of the Python grammar processed
30by the internal parser is required. For full information on the
31language syntax, refer to the Language Reference. The parser itself
32is created from a grammar specification defined in the file
33\code{Grammar/Grammar} in the standard Python distribution. The parse
34trees stored in the ``AST objects'' created by this module are the
35actual output from the internal parser when created by the
36\code{expr()} or \code{suite()} functions, described below. The AST
Guido van Rossum47478871996-08-21 14:32:37 +000037objects created by \code{sequence2ast()} faithfully simulate those
38structures. Be aware that the values of the sequences which are
39considered ``correct'' will vary from one version of Python to another
40as the formal grammar for the language is revised. However,
41transporting code from one Python version to another as source text
42will always allow correct parse trees to be created in the target
43version, with the only restriction being that migrating to an older
44version of the interpreter will not support more recent language
45constructs. The parse trees are not typically compatible from one
46version to another, whereas source code has always been
47forward-compatible.
Guido van Rossum4b73a061995-10-11 17:30:04 +000048
Guido van Rossum47478871996-08-21 14:32:37 +000049Each element of the sequences returned by \code{ast2list} or
50\code{ast2tuple()} has a simple form. Sequences representing
51non-terminal elements in the grammar always have a length greater than
52one. The first element is an integer which identifies a production in
53the grammar. These integers are given symbolic names in the C header
54file \code{Include/graminit.h} and the Python module
55\code{Lib/symbol.py}. Each additional element of the sequence represents
56a component of the production as recognized in the input string: these
57are always sequences which have the same form as the parent. An
58important aspect of this structure which should be noted is that
59keywords used to identify the parent node type, such as the keyword
60\code{if} in an \emph{if\_stmt}, are included in the node tree without
61any special treatment. For example, the \code{if} keyword is
Guido van Rossum4b73a061995-10-11 17:30:04 +000062represented by the tuple \code{(1, 'if')}, where \code{1} is the
63numeric value associated with all \code{NAME} elements, including
Guido van Rossum47478871996-08-21 14:32:37 +000064variable and function names defined by the user. In an alternate form
65returned when line number information is requested, the same token
66might be represented as \code{(1, 'if', 12)}, where the \code{12}
67represents the line number at which the terminal symbol was found.
Guido van Rossum4b73a061995-10-11 17:30:04 +000068
69Terminal elements are represented in much the same way, but without
70any child elements and the addition of the source text which was
71identified. The example of the \code{if} keyword above is
72representative. The various types of terminal symbols are defined in
73the C header file \code{Include/token.h} and the Python module
74\code{Lib/token.py}.
75
76The AST objects are not actually required to support the functionality
77of this module, but are provided for three purposes: to allow an
78application to amortize the cost of processing complex parse trees, to
79provide a parse tree representation which conserves memory space when
Guido van Rossum47478871996-08-21 14:32:37 +000080compared to the Python list or tuple representation, and to ease the
81creation of additional modules in C which manipulate parse trees. A
82simple ``wrapper'' module may be created in Python to hide the use of
83AST objects.
Guido van Rossum4b73a061995-10-11 17:30:04 +000084
85
Guido van Rossum4b73a061995-10-11 17:30:04 +000086The \code{parser} module defines the following functions:
87
Guido van Rossum4b73a061995-10-11 17:30:04 +000088\renewcommand{\indexsubitem}{(in module parser)}
89
Guido van Rossum47478871996-08-21 14:32:37 +000090\begin{funcdesc}{ast2list}{ast\optional{\, line\_info\code{ = 0}}}
Guido van Rossum4b73a061995-10-11 17:30:04 +000091This function accepts an AST object from the caller in
Guido van Rossum47478871996-08-21 14:32:37 +000092\code{\var{ast}} and returns a Python list representing the
93equivelent parse tree. The resulting list representation can be used
94for inspection or the creation of a new parse tree in list form.
Guido van Rossum4b73a061995-10-11 17:30:04 +000095This function does not fail so long as memory is available to build
Guido van Rossum47478871996-08-21 14:32:37 +000096the list representation. If a parse tree will only be used for
97inspection, \code{ast2tuple()} should be used instead to reduce memory
98consumption and fragmentation. When modifications are to be made to
99the parse tree, this function is significantly faster than retrieving
100a tuple representation and converting that to nested lists.
101
102If the \code{line\_info} flag is given true value, line number
103information will be included for all terminal tokens as a third
104element of the list representing the token. This information is
105omitted if the flag is false or omitted.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000106\end{funcdesc}
107
Guido van Rossum47478871996-08-21 14:32:37 +0000108\begin{funcdesc}{ast2tuple}{ast\optional{\, line\_info\code{ = 0}}}
109This function accepts an AST object from the caller in
110\code{\var{ast}} and returns a Python tuple representing the
111equivelent parse tree. Other than returning a tuple instead of a
112list, this function is identical to \code{ast2list()}.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000113
Guido van Rossum47478871996-08-21 14:32:37 +0000114If the \code{line\_info} flag is given true value, line number
115information will be included for all terminal tokens as a third
116element of the list representing the token. This information is
117omitted if the flag is false or omitted.
118\end{funcdesc}
119
120\begin{funcdesc}{compileast}{ast\optional{\, filename\code{ = '<ast>'}}}
Guido van Rossum4b73a061995-10-11 17:30:04 +0000121The Python byte compiler can be invoked on an AST object to produce
122code objects which can be used as part of an \code{exec} statement or
123a call to the built-in \code{eval()} function. This function provides
124the interface to the compiler, passing the internal parse tree from
125\code{\var{ast}} to the parser, using the source file name specified
126by the \code{\var{filename}} parameter. The default value supplied
127for \code{\var{filename}} indicates that the source was an AST object.
Guido van Rossum47478871996-08-21 14:32:37 +0000128
129Compiling an AST object may result in exceptions related to
130compilation; an example would be a \code{SyntaxError} caused by the
131parse tree for \code{del f(0)}; this statement is considered legal
132within the formal grammar for Python but is not a legal language
133construct. The \code{SyntaxError} raised for this condition is
134actually generated by the Python byte-compiler normally, which is why
135it can be raised at this point by the \code{parser} module. Most
136causes of compilation failure can be diagnosed programmatically by
137inspection of the parse tree.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000138\end{funcdesc}
139
140
141\begin{funcdesc}{expr}{string}
142The \code{expr()} function parses the parameter \code{\var{string}}
143as if it were an input to \code{compile(\var{string}, 'eval')}. If
144the parse succeeds, an AST object is created to hold the internal
145parse tree representation, otherwise an appropriate exception is
146thrown.
147\end{funcdesc}
148
149
150\begin{funcdesc}{isexpr}{ast}
151When \code{\var{ast}} represents an \code{'eval'} form, this function
152returns a true value (\code{1}), otherwise it returns false
153(\code{0}). This is useful, since code objects normally cannot be
154queried for this information using existing built-in functions. Note
155that the code objects created by \code{compileast()} cannot be queried
156like this either, and are identical to those created by the built-in
157\code{compile()} function.
158\end{funcdesc}
159
160
161\begin{funcdesc}{issuite}{ast}
162This function mirrors \code{isexpr()} in that it reports whether an
163AST object represents a suite of statements. It is not safe to assume
164that this function is equivelent to \code{not isexpr(\var{ast})}, as
165additional syntactic fragments may be supported in the future.
166\end{funcdesc}
167
168
169\begin{funcdesc}{suite}{string}
170The \code{suite()} function parses the parameter \code{\var{string}}
171as if it were an input to \code{compile(\var{string}, 'exec')}. If
172the parse succeeds, an AST object is created to hold the internal
173parse tree representation, otherwise an appropriate exception is
174thrown.
175\end{funcdesc}
176
177
Guido van Rossum47478871996-08-21 14:32:37 +0000178\begin{funcdesc}{sequence2ast}{sequence}
179This function accepts a parse tree represented as a sequence and
180builds an internal representation if possible. If it can validate
181that the tree conforms to the Python grammar and all nodes are valid
182node types in the host version of Python, an AST object is created
183from the internal representation and returned to the called. If there
184is a problem creating the internal representation, or if the tree
185cannot be validated, a \code{ParserError} exception is thrown. An AST
186object created this way should not be assumed to compile correctly;
187normal exceptions thrown by compilation may still be initiated when
188the AST object is passed to \code{compileast()}. This will normally
189indicate problems not related to syntax (such as a \code{MemoryError}
190exception), but may also be due to constructs such as the result of
191parsing \code{del f(0)}, which escapes the Python parser but is
192checked by the bytecode compiler.
193
194Sequences representing terminal tokens may be represented as either
195two-element lists of the form \code{(1, 'name')} or as three-element
196lists of the form \code{(1, 'name', 56)}. If the third element is
197present, it is assumed to be a valid line number. The line number
198may be specified for any subset of the terminal symbols in the input
199tree.
200\end{funcdesc}
201
202\begin{funcdesc}{tuple2ast}{sequence}
203This is the same function as \code{sequence2ast}. This entry point is
204maintained for backward compatibility.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000205\end{funcdesc}
206
207
Guido van Rossum4b73a061995-10-11 17:30:04 +0000208\subsection{Exceptions and Error Handling}
209
210The parser module defines a single exception, but may also pass other
211built-in exceptions from other portions of the Python runtime
212environment. See each function for information about the exceptions
213it can raise.
214
215\begin{excdesc}{ParserError}
216Exception raised when a failure occurs within the parser module. This
217is generally produced for validation failures rather than the built in
218\code{SyntaxError} thrown during normal parsing.
219The exception argument is either a string describing the reason of the
Guido van Rossum47478871996-08-21 14:32:37 +0000220failure or a tuple containing a sequence causing the failure from a parse
221tree passed to \code{sequence2ast()} and an explanatory string. Calls to
222\code{sequence2ast()} need to be able to handle either type of exception,
Guido van Rossum4b73a061995-10-11 17:30:04 +0000223while calls to other functions in the module will only need to be
224aware of the simple string values.
225\end{excdesc}
226
227Note that the functions \code{compileast()}, \code{expr()}, and
228\code{suite()} may throw exceptions which are normally thrown by the
229parsing and compilation process. These include the built in
230exceptions \code{MemoryError}, \code{OverflowError},
231\code{SyntaxError}, and \code{SystemError}. In these cases, these
232exceptions carry all the meaning normally associated with them. Refer
233to the descriptions of each function for detailed information.
234
Guido van Rossum4b73a061995-10-11 17:30:04 +0000235
Guido van Rossum47478871996-08-21 14:32:37 +0000236\subsection{AST Objects}
237
238AST objects (returned by \code{expr()}, \code{suite()}, and
Guido van Rossum8206fb91996-08-26 00:33:29 +0000239\code{sequence2ast()}, described above) have no methods of their own.
Guido van Rossum47478871996-08-21 14:32:37 +0000240Some of the functions defined which accept an AST object as their
241first argument may change to object methods in the future.
242
243Ordered and equality comparisons are supported between AST objects.
244
245
Guido van Rossum8206fb91996-08-26 00:33:29 +0000246\subsection{Examples}
Guido van Rossum4b73a061995-10-11 17:30:04 +0000247
Guido van Rossum47478871996-08-21 14:32:37 +0000248The parser modules allows operations to be performed on the parse tree
249of Python source code before the bytecode is generated, and provides
250for inspection of the parse tree for information gathering purposes as
Guido van Rossum8206fb91996-08-26 00:33:29 +0000251well. Two examples are presented. The simple example demonstrates
252emulation of the \code{compile()} built-in function and the complex
253example shows the use of a parse tree for information discovery.
254
255\subsubsection{Emulation of {\tt compile()}}
256
257While many useful operations may take place between parsing and
Guido van Rossum47478871996-08-21 14:32:37 +0000258bytecode generation, the simplest operation is to do nothing. For
259this purpose, using the \code{parser} module to produce an
260intermediate data structure is equivelent to the code
261
262\begin{verbatim}
263>>> code = compile('a + 5', 'eval')
264>>> a = 5
265>>> eval(code)
26610
267\end{verbatim}
268
269The equivelent operation using the \code{parser} module is somewhat
270longer, and allows the intermediate internal parse tree to be retained
271as an AST object:
Guido van Rossum4b73a061995-10-11 17:30:04 +0000272
273\begin{verbatim}
274>>> import parser
275>>> ast = parser.expr('a + 5')
276>>> code = parser.compileast(ast)
277>>> a = 5
278>>> eval(code)
27910
280\end{verbatim}
281
Guido van Rossum8206fb91996-08-26 00:33:29 +0000282An application which needs both AST and code objects can package this
283code into readily available functions:
284
285\begin{verbatim}
286import parser
287
288def load_suite(source_string):
289 ast = parser.suite(source_string)
290 code = parser.compileast(ast)
291 return ast, code
292
293def load_expression(source_string):
294 ast = parser.expr(source_string)
295 code = parser.compileast(ast)
296 return ast, code
297\end{verbatim}
298
299\subsubsection{Information Discovery}
300
Guido van Rossum47478871996-08-21 14:32:37 +0000301Some applications can benfit from access to the parse tree itself, and
302can take advantage of the intermediate data structure provided by the
303\code{parser} module. The remainder of this section of examples will
304demonstrate how the intermediate data structure can provide access to
305module documentation defined in docstrings without requiring that the
306code being examined be imported into a running interpreter. This can
307be very useful for performing analyses of untrusted code.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000308
Guido van Rossum47478871996-08-21 14:32:37 +0000309Generally, the example will demonstrate how the parse tree may be
310traversed to distill interesting information. Two functions and a set
311of classes is developed which provide programmatic access to high
312level function and class definitions provided by a module. The
313classes extract information from the parse tree and provide access to
314the information at a useful semantic level, one function provides a
315simple low-level pattern matching capability, and the other function
316defines a high-level interface to the classes by handling file
317operations on behalf of the caller. All source files mentioned here
318which are not part of the Python installation are located in the
319\file{Demo/parser} directory of the distribution.
Guido van Rossum4b73a061995-10-11 17:30:04 +0000320
Guido van Rossum8206fb91996-08-26 00:33:29 +0000321The dynamic nature of Python allows the programmer a great deal of
322flexibility, but most modules need only a limited measure of this when
323defining classes, functions, and methods. In this example, the only
324definitions that will be considered are those which are defined in the
325top level of their context, e.g., a function defined by a \code{def}
326statement at column zero of a module, but not a function defined
327within a branch of an \code{if} ... \code{else} construct, thought
328there are some good reasons for doing so in some situations. Nesting
329of definitions will be handled by the code developed in the example.
330
Guido van Rossum47478871996-08-21 14:32:37 +0000331To construct the upper-level extraction methods, we need to know what
332the parse tree structure looks like and how much of it we actually
333need to be concerned about. Python uses a moderately deep parse tree,
334so there are a large number of intermediate nodes. It is important to
335read and understand the formal grammar used by Python. This is
336specified in the file \file{Grammar/Grammar} in the distribution.
337Consider the simplest case of interest when searching for docstrings:
Guido van Rossum8206fb91996-08-26 00:33:29 +0000338a module consisting of a docstring and nothing else. (See file
339\file{docstring.py}.)
Guido van Rossum4b73a061995-10-11 17:30:04 +0000340
Guido van Rossum47478871996-08-21 14:32:37 +0000341\begin{verbatim}
342"""Some documentation.
343"""
344\end{verbatim}
Guido van Rossum4b73a061995-10-11 17:30:04 +0000345
Guido van Rossum47478871996-08-21 14:32:37 +0000346Using the interpreter to take a look at the parse tree, we find a
347bewildering mass of numbers and parentheses, with the documentation
348buried deep in the nested tuples:
Guido van Rossum4b73a061995-10-11 17:30:04 +0000349
Guido van Rossum47478871996-08-21 14:32:37 +0000350\begin{verbatim}
351>>> import parser
352>>> import pprint
353>>> ast = parser.suite(open('docstring.py').read())
354>>> tup = parser.ast2tuple(ast)
355>>> pprint.pprint(tup)
356(257,
357 (264,
358 (265,
359 (266,
360 (267,
361 (307,
362 (287,
363 (288,
364 (289,
365 (290,
366 (292,
367 (293,
368 (294,
369 (295,
370 (296,
371 (297,
372 (298,
373 (299,
374 (300, (3, '"""Some documentation.\012"""'))))))))))))))))),
375 (4, ''))),
376 (4, ''),
377 (0, ''))
378\end{verbatim}
379
380The numbers at the first element of each node in the tree are the node
381types; they map directly to terminal and non-terminal symbols in the
382grammar. Unfortunately, they are represented as integers in the
383internal representation, and the Python structures generated do not
384change that. However, the \code{symbol} and \code{token} modules
385provide symbolic names for the node types and dictionaries which map
386from the integers to the symbolic names for the node types.
387
388In the output presented above, the outermost tuple contains four
389elements: the integer \code{257} and three additional tuples. Node
390type \code{257} has the symbolic name \code{file_input}. Each of
391these inner tuples contains an integer as the first element; these
392integers, \code{264}, \code{4}, and \code{0}, represent the node types
393\code{stmt}, \code{NEWLINE}, and \code{ENDMARKER}, respectively.
394Note that these values may change depending on the version of Python
395you are using; consult \file{symbol.py} and \file{token.py} for
396details of the mapping. It should be fairly clear that the outermost
397node is related primarily to the input source rather than the contents
398of the file, and may be disregarded for the moment. The \code{stmt}
399node is much more interesting. In particular, all docstrings are
400found in subtrees which are formed exactly as this node is formed,
401with the only difference being the string itself. The association
402between the docstring in a similar tree and the defined entity (class,
403function, or module) which it describes is given by the position of
404the docstring subtree within the tree defining the described
405structure.
406
407By replacing the actual docstring with something to signify a variable
408component of the tree, we allow a simple pattern matching approach may
409be taken to checking any given subtree for equivelence to the general
410pattern for docstrings. Since the example demonstrates information
411extraction, we can safely require that the tree be in tuple form
412rather than list form, allowing a simple variable representation to be
413\code{['variable\_name']}. A simple recursive function can implement
414the pattern matching, returning a boolean and a dictionary of variable
Guido van Rossum8206fb91996-08-26 00:33:29 +0000415name to value mappings. (See file \file{example.py}.)
Guido van Rossum47478871996-08-21 14:32:37 +0000416
417\begin{verbatim}
418from types import ListType, TupleType
419
420def match(pattern, data, vars=None):
421 if vars is None:
422 vars = {}
423 if type(pattern) is ListType:
424 vars[pattern[0]] = data
425 return 1, vars
426 if type(pattern) is not TupleType:
427 return (pattern == data), vars
428 if len(data) != len(pattern):
429 return 0, vars
430 for pattern, data in map(None, pattern, data):
431 same, vars = match(pattern, data, vars)
432 if not same:
433 break
434 return same, vars
435\end{verbatim}
436
437Using this simple recursive pattern matching function and the symbolic
Guido van Rossum8206fb91996-08-26 00:33:29 +0000438node types, the pattern for the candidate docstring subtrees becomes
439fairly readable. (See file \file{example.py}.)
Guido van Rossum47478871996-08-21 14:32:37 +0000440
441\begin{verbatim}
Guido van Rossum8206fb91996-08-26 00:33:29 +0000442import symbol
443import token
444
445DOCSTRING_STMT_PATTERN = (
446 symbol.stmt,
447 (symbol.simple_stmt,
448 (symbol.small_stmt,
449 (symbol.expr_stmt,
450 (symbol.testlist,
451 (symbol.test,
452 (symbol.and_test,
453 (symbol.not_test,
454 (symbol.comparison,
455 (symbol.expr,
456 (symbol.xor_expr,
457 (symbol.and_expr,
458 (symbol.shift_expr,
459 (symbol.arith_expr,
460 (symbol.term,
461 (symbol.factor,
462 (symbol.power,
463 (symbol.atom,
464 (token.STRING, ['docstring'])
465 )))))))))))))))),
466 (token.NEWLINE, '')
467 ))
Guido van Rossum47478871996-08-21 14:32:37 +0000468\end{verbatim}
469
470Using the \code{match()} function with this pattern, extracting the
471module docstring from the parse tree created previously is easy:
472
473\begin{verbatim}
474>>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
475>>> found
4761
477>>> vars
478{'docstring': '"""Some documentation.\012"""'}
479\end{verbatim}
480
481Once specific data can be extracted from a location where it is
482expected, the question of where information can be expected
483needs to be answered. When dealing with docstrings, the answer is
484fairly simple: the docstring is the first \code{stmt} node in a code
485block (\code{file_input} or \code{suite} node types). A module
486consists of a single \code{file_input} node, and class and function
487definitions each contain exactly one \code{suite} node. Classes and
488functions are readily identified as subtrees of code block nodes which
489start with \code{(stmt, (compound_stmt, (classdef, ...} or
490\code{(stmt, (compound_stmt, (funcdef, ...}. Note that these subtrees
491cannot be matched by \code{match()} since it does not support multiple
492sibling nodes to match without regard to number. A more elaborate
493matching function could be used to overcome this limitation, but this
494is sufficient for the example.
495
Guido van Rossum8206fb91996-08-26 00:33:29 +0000496Given the ability to determine whether a statement might be a
497docstring and extract the actual string from the statement, some work
498needs to be performed to walk the parse tree for an entire module and
499extract information about the names defined in each context of the
500module and associate any docstrings with the names. The code to
501perform this work is not complicated, but bears some explanation.
502
503The public interface to the classes is straightforward and should
504probably be somewhat more flexible. Each ``major'' block of the
505module is described by an object providing several methods for inquiry
506and a constructor which accepts at least the subtree of the complete
507parse tree which it represents. The \code{ModuleInfo} constructor
508accepts an optional \code{\var{name}} parameter since it cannot
509otherwise determine the name of the module.
510
511The public classes include \code{ClassInfo}, \code{FunctionInfo},
512and \code{ModuleInfo}. All objects provide the
513methods \code{get_name()}, \code{get_docstring()},
514\code{get_class_names()}, and \code{get_class_info()}. The
515\code{ClassInfo} objects support \code{get_method_names()} and
516\code{get_method_info()} while the other classes provide
517\code{get_function_names()} and \code{get_function_info()}.
518
519Within each of the forms of code block that the public classes
520represent, most of the required information is in the same form and is
521access in the same way, with classes having the distinction that
522functions defined at the top level are referred to as ``methods.''
523Since the difference in nomenclature reflects a real semantic
524distinction from functions defined outside of a class, our
525implementation needs to maintain the same measure of distinction.
526Hence, most of the functionality of the public classes can be
527implemented in a common base class, \code{SuiteInfoBase}, with the
528accessors for function and method information provided elsewhere.
529Note that there is only one class which represents function and method
530information; this mirrors the use of the \code{def} statement to
531define both types of functions.
532
533Most of the accessor functions are declared in \code{SuiteInfoBase}
534and do not need to be overriden by subclasses. More importantly, the
535extraction of most information from a parse tree is handled through a
536method called by the \code{SuiteInfoBase} constructor. The example
537code for most of the classes is clear when read alongside the formal
538grammar, but the method which recursively creates new information
539objects requires further examination. Here is the relevant part of
540the \code{SuiteInfoBase} definition from \file{example.py}:
541
542\begin{verbatim}
543class SuiteInfoBase:
544 _docstring = ''
545 _name = ''
546
547 def __init__(self, tree = None):
548 self._class_info = {}
549 self._function_info = {}
550 if tree:
551 self._extract_info(tree)
552
553 def _extract_info(self, tree):
554 # extract docstring
555 if len(tree) == 2:
556 found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
557 else:
558 found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
559 if found:
560 self._docstring = eval(vars['docstring'])
561 # discover inner definitions
562 for node in tree[1:]:
563 found, vars = match(COMPOUND_STMT_PATTERN, node)
564 if found:
565 cstmt = vars['compound']
566 if cstmt[0] == symbol.funcdef:
567 name = cstmt[2][1]
568 self._function_info[name] = FunctionInfo(cstmt)
569 elif cstmt[0] == symbol.classdef:
570 name = cstmt[2][1]
571 self._class_info[name] = ClassInfo(cstmt)
572\end{verbatim}
573
574After initializing some internal state, the constructor calls the
575\code{_extract_info()} method. This method performs the bulk of the
576information extraction which takes place in the entire example. The
577extraction has two distinct phases: the location of the docstring for
578the parse tree passed in, and the discovery of additional definitions
579within the code block represented by the parse tree.
580
581The initial \code{if} test determines whether the nested suite is of
582the ``short form'' or the ``long form.'' The short form is used when
583the code block is on the same line as the definition of the code
584block, as in
585
586\begin{verbatim}
587def square(x): "Square an argument."; return x ** 2
588\end{verbatim}
589
590while the long form uses an indented block and allows nested
591definitions:
592
593\begin{verbatim}
594def make_power(exp):
595 "Make a function that raises an argument to the exponent `exp'."
596 def raiser(x, y=exp):
597 return x ** y
598 return raiser
599\end{verbatim}
600
601When the short form is used, the code block may contain a docstring as
602the first, and possibly only, \code{small_stmt} element. The
603extraction of such a docstring is slightly different and requires only
604a portion of the complete pattern used in the more common case. As
605given in the code, the docstring will only be found if there is only
606one \code{small_stmt} node in the \code{simple_stmt} node. Since most
607functions and methods which use the short form do not provide
608docstring, this may be considered sufficient. The extraction of the
609docstring proceeds using the \code{match()} function as described
610above, and the value of the docstring is stored as an attribute of the
611\code{SuiteInfoBase} object.
612
613After docstring extraction, the operates a simple definition discovery
614algorithm on the \code{stmt} nodes of the \code{suite} node. The
615special case of the short form is not tested; since there are no
616\code{stmt} nodes in the short form, the algorithm will silently skip
617the single \code{simple_stmt} node and correctly not discover any
618nested definitions.
619
620Each statement in the code block bing examined is categorized as being
621a class definition, function definition (including methods), or
622something else. For the definition statements, the name of the
623element being defined is extracted and representation object
624appropriate to the definition is created with the defining subtree
625passed as an argument to the constructor. The repesentation objects
626are stored in instance variables and may be retrieved by name using
627the appropriate accessor methods.
628
629The public classes provide any accessors required which are more
630specific than those provided by the \code{SuiteInfoBase} class, but
631the real extraction algorithm remains common to all forms of code
632blocks. A high-level function can be used to extract the complete set
633of information from a source file:
634
635\begin{verbatim}
636def get_docs(fileName):
637 source = open(fileName).read()
638 import os
639 basename = os.path.basename(os.path.splitext(fileName)[0])
640 import parser
641 ast = parser.suite(source)
642 tup = parser.ast2tuple(ast)
643 return ModuleInfo(tup, basename)
644\end{verbatim}
645
646This provides an easy-to-use interface to the documentation of a
647module. If information is required which is not extracted by the code
648of this example, the code may be extended at clearly defined points to
649provide additional capabilities.
Guido van Rossum47478871996-08-21 14:32:37 +0000650
651
652%%
653%% end of file