blob: e361a26c682c81770c8e14f24a93b5363fe91039 [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`parser` --- Access Python parse trees
3===========================================
4
5.. module:: parser
6 :synopsis: Access parse trees for Python source code.
7.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
11.. % Copyright 1995 Virginia Polytechnic Institute and State University
12.. % and Fred L. Drake, Jr. This copyright notice must be distributed on
13.. % all copies, but this document otherwise may be distributed as part
14.. % of the Python distribution. No fee may be charged for this document
15.. % in any representation, either on paper or electronically. This
16.. % restriction does not affect other elements in a distributed package
17.. % in any way.
18
19.. index:: single: parsing; Python source code
20
21The :mod:`parser` module provides an interface to Python's internal parser and
22byte-code compiler. The primary purpose for this interface is to allow Python
23code to edit the parse tree of a Python expression and create executable code
24from this. This is better than trying to parse and modify an arbitrary Python
25code fragment as a string because parsing is performed in a manner identical to
26the code forming the application. It is also faster.
27
28There are a few things to note about this module which are important to making
29use of the data structures created. This is not a tutorial on editing the parse
30trees for Python code, but some examples of using the :mod:`parser` module are
31presented.
32
33Most importantly, a good understanding of the Python grammar processed by the
34internal parser is required. For full information on the language syntax, refer
35to :ref:`reference-index`. The parser
36itself is created from a grammar specification defined in the file
37:file:`Grammar/Grammar` in the standard Python distribution. The parse trees
38stored in the AST objects created by this module are the actual output from the
39internal parser when created by the :func:`expr` or :func:`suite` functions,
40described below. The AST objects created by :func:`sequence2ast` faithfully
41simulate those structures. Be aware that the values of the sequences which are
42considered "correct" will vary from one version of Python to another as the
43formal grammar for the language is revised. However, transporting code from one
44Python version to another as source text will always allow correct parse trees
45to be created in the target version, with the only restriction being that
46migrating to an older version of the interpreter will not support more recent
47language constructs. The parse trees are not typically compatible from one
48version to another, whereas source code has always been forward-compatible.
49
50Each element of the sequences returned by :func:`ast2list` or :func:`ast2tuple`
51has a simple form. Sequences representing non-terminal elements in the grammar
52always have a length greater than one. The first element is an integer which
53identifies a production in the grammar. These integers are given symbolic names
54in the C header file :file:`Include/graminit.h` and the Python module
55:mod:`symbol`. Each additional element of the sequence represents a component
56of the production as recognized in the input string: these are always sequences
57which have the same form as the parent. An important aspect of this structure
58which should be noted is that keywords used to identify the parent node type,
59such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
60node tree without any special treatment. For example, the :keyword:`if` keyword
61is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
62associated with all :const:`NAME` tokens, including variable and function names
63defined by the user. In an alternate form returned when line number information
64is requested, the same token might be represented as ``(1, 'if', 12)``, where
65the ``12`` represents the line number at which the terminal symbol was found.
66
67Terminal elements are represented in much the same way, but without any child
68elements and the addition of the source text which was identified. The example
69of the :keyword:`if` keyword above is representative. The various types of
70terminal symbols are defined in the C header file :file:`Include/token.h` and
71the Python module :mod:`token`.
72
73The AST objects are not required to support the functionality of this module,
74but are provided for three purposes: to allow an application to amortize the
75cost of processing complex parse trees, to provide a parse tree representation
76which conserves memory space when compared to the Python list or tuple
77representation, and to ease the creation of additional modules in C which
78manipulate parse trees. A simple "wrapper" class may be created in Python to
79hide the use of AST objects.
80
81The :mod:`parser` module defines functions for a few distinct purposes. The
82most important purposes are to create AST objects and to convert AST objects to
83other representations such as parse trees and compiled code objects, but there
84are also functions which serve to query the type of parse tree represented by an
85AST object.
86
87
88.. seealso::
89
90 Module :mod:`symbol`
91 Useful constants representing internal nodes of the parse tree.
92
93 Module :mod:`token`
94 Useful constants representing leaf nodes of the parse tree and functions for
95 testing node values.
96
97
98.. _creating-asts:
99
100Creating AST Objects
101--------------------
102
103AST objects may be created from source code or from a parse tree. When creating
104an AST object from source, different functions are used to create the ``'eval'``
105and ``'exec'`` forms.
106
107
108.. function:: expr(source)
109
110 The :func:`expr` function parses the parameter *source* as if it were an input
111 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an AST object
112 is created to hold the internal parse tree representation, otherwise an
113 appropriate exception is thrown.
114
115
116.. function:: suite(source)
117
118 The :func:`suite` function parses the parameter *source* as if it were an input
119 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an AST object
120 is created to hold the internal parse tree representation, otherwise an
121 appropriate exception is thrown.
122
123
124.. function:: sequence2ast(sequence)
125
126 This function accepts a parse tree represented as a sequence and builds an
127 internal representation if possible. If it can validate that the tree conforms
128 to the Python grammar and all nodes are valid node types in the host version of
129 Python, an AST object is created from the internal representation and returned
130 to the called. If there is a problem creating the internal representation, or
131 if the tree cannot be validated, a :exc:`ParserError` exception is thrown. An
132 AST object created this way should not be assumed to compile correctly; normal
133 exceptions thrown by compilation may still be initiated when the AST object is
134 passed to :func:`compileast`. This may indicate problems not related to syntax
135 (such as a :exc:`MemoryError` exception), but may also be due to constructs such
136 as the result of parsing ``del f(0)``, which escapes the Python parser but is
137 checked by the bytecode compiler.
138
139 Sequences representing terminal tokens may be represented as either two-element
140 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
141 'name', 56)``. If the third element is present, it is assumed to be a valid
142 line number. The line number may be specified for any subset of the terminal
143 symbols in the input tree.
144
145
146.. function:: tuple2ast(sequence)
147
148 This is the same function as :func:`sequence2ast`. This entry point is
149 maintained for backward compatibility.
150
151
152.. _converting-asts:
153
154Converting AST Objects
155----------------------
156
157AST objects, regardless of the input used to create them, may be converted to
158parse trees represented as list- or tuple- trees, or may be compiled into
159executable code objects. Parse trees may be extracted with or without line
160numbering information.
161
162
163.. function:: ast2list(ast[, line_info])
164
165 This function accepts an AST object from the caller in *ast* and returns a
166 Python list representing the equivalent parse tree. The resulting list
167 representation can be used for inspection or the creation of a new parse tree in
168 list form. This function does not fail so long as memory is available to build
169 the list representation. If the parse tree will only be used for inspection,
170 :func:`ast2tuple` should be used instead to reduce memory consumption and
171 fragmentation. When the list representation is required, this function is
172 significantly faster than retrieving a tuple representation and converting that
173 to nested lists.
174
175 If *line_info* is true, line number information will be included for all
176 terminal tokens as a third element of the list representing the token. Note
177 that the line number provided specifies the line on which the token *ends*.
178 This information is omitted if the flag is false or omitted.
179
180
181.. function:: ast2tuple(ast[, line_info])
182
183 This function accepts an AST object from the caller in *ast* and returns a
184 Python tuple representing the equivalent parse tree. Other than returning a
185 tuple instead of a list, this function is identical to :func:`ast2list`.
186
187 If *line_info* is true, line number information will be included for all
188 terminal tokens as a third element of the list representing the token. This
189 information is omitted if the flag is false or omitted.
190
191
192.. function:: compileast(ast[, filename='<ast>'])
193
194 .. index::
195 builtin: exec
196 builtin: eval
197
198 The Python byte compiler can be invoked on an AST object to produce code objects
199 which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
200 functions. This function provides the interface to the compiler, passing the
201 internal parse tree from *ast* to the parser, using the source file name
202 specified by the *filename* parameter. The default value supplied for *filename*
203 indicates that the source was an AST object.
204
205 Compiling an AST object may result in exceptions related to compilation; an
206 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
207 this statement is considered legal within the formal grammar for Python but is
208 not a legal language construct. The :exc:`SyntaxError` raised for this
209 condition is actually generated by the Python byte-compiler normally, which is
210 why it can be raised at this point by the :mod:`parser` module. Most causes of
211 compilation failure can be diagnosed programmatically by inspection of the parse
212 tree.
213
214
215.. _querying-asts:
216
217Queries on AST Objects
218----------------------
219
220Two functions are provided which allow an application to determine if an AST was
221created as an expression or a suite. Neither of these functions can be used to
222determine if an AST was created from source code via :func:`expr` or
223:func:`suite` or from a parse tree via :func:`sequence2ast`.
224
225
226.. function:: isexpr(ast)
227
228 .. index:: builtin: compile
229
230 When *ast* represents an ``'eval'`` form, this function returns true, otherwise
231 it returns false. This is useful, since code objects normally cannot be queried
232 for this information using existing built-in functions. Note that the code
233 objects created by :func:`compileast` cannot be queried like this either, and
234 are identical to those created by the built-in :func:`compile` function.
235
236
237.. function:: issuite(ast)
238
239 This function mirrors :func:`isexpr` in that it reports whether an AST object
240 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
241 assume that this function is equivalent to ``not isexpr(ast)``, as additional
242 syntactic fragments may be supported in the future.
243
244
245.. _ast-errors:
246
247Exceptions and Error Handling
248-----------------------------
249
250The parser module defines a single exception, but may also pass other built-in
251exceptions from other portions of the Python runtime environment. See each
252function for information about the exceptions it can raise.
253
254
255.. exception:: ParserError
256
257 Exception raised when a failure occurs within the parser module. This is
258 generally produced for validation failures rather than the built in
259 :exc:`SyntaxError` thrown during normal parsing. The exception argument is
260 either a string describing the reason of the failure or a tuple containing a
261 sequence causing the failure from a parse tree passed to :func:`sequence2ast`
262 and an explanatory string. Calls to :func:`sequence2ast` need to be able to
263 handle either type of exception, while calls to other functions in the module
264 will only need to be aware of the simple string values.
265
266Note that the functions :func:`compileast`, :func:`expr`, and :func:`suite` may
267throw exceptions which are normally thrown by the parsing and compilation
268process. These include the built in exceptions :exc:`MemoryError`,
269:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
270cases, these exceptions carry all the meaning normally associated with them.
271Refer to the descriptions of each function for detailed information.
272
273
274.. _ast-objects:
275
276AST Objects
277-----------
278
279Ordered and equality comparisons are supported between AST objects. Pickling of
280AST objects (using the :mod:`pickle` module) is also supported.
281
282
283.. data:: ASTType
284
285 The type of the objects returned by :func:`expr`, :func:`suite` and
286 :func:`sequence2ast`.
287
288AST objects have the following methods:
289
290
291.. method:: AST.compile([filename])
292
293 Same as ``compileast(ast, filename)``.
294
295
296.. method:: AST.isexpr()
297
298 Same as ``isexpr(ast)``.
299
300
301.. method:: AST.issuite()
302
303 Same as ``issuite(ast)``.
304
305
306.. method:: AST.tolist([line_info])
307
308 Same as ``ast2list(ast, line_info)``.
309
310
311.. method:: AST.totuple([line_info])
312
313 Same as ``ast2tuple(ast, line_info)``.
314
315
316.. _ast-examples:
317
318Examples
319--------
320
321.. index:: builtin: compile
322
323The parser modules allows operations to be performed on the parse tree of Python
324source code before the bytecode is generated, and provides for inspection of the
325parse tree for information gathering purposes. Two examples are presented. The
326simple example demonstrates emulation of the :func:`compile` built-in function
327and the complex example shows the use of a parse tree for information discovery.
328
329
330Emulation of :func:`compile`
331^^^^^^^^^^^^^^^^^^^^^^^^^^^^
332
333While many useful operations may take place between parsing and bytecode
334generation, the simplest operation is to do nothing. For this purpose, using
335the :mod:`parser` module to produce an intermediate data structure is equivalent
336to the code ::
337
338 >>> code = compile('a + 5', 'file.py', 'eval')
339 >>> a = 5
340 >>> eval(code)
341 10
342
343The equivalent operation using the :mod:`parser` module is somewhat longer, and
344allows the intermediate internal parse tree to be retained as an AST object::
345
346 >>> import parser
347 >>> ast = parser.expr('a + 5')
348 >>> code = ast.compile('file.py')
349 >>> a = 5
350 >>> eval(code)
351 10
352
353An application which needs both AST and code objects can package this code into
354readily available functions::
355
356 import parser
357
358 def load_suite(source_string):
359 ast = parser.suite(source_string)
360 return ast, ast.compile()
361
362 def load_expression(source_string):
363 ast = parser.expr(source_string)
364 return ast, ast.compile()
365
366
367Information Discovery
368^^^^^^^^^^^^^^^^^^^^^
369
370.. index::
371 single: string; documentation
372 single: docstrings
373
374Some applications benefit from direct access to the parse tree. The remainder
375of this section demonstrates how the parse tree provides access to module
376documentation defined in docstrings without requiring that the code being
377examined be loaded into a running interpreter via :keyword:`import`. This can
378be very useful for performing analyses of untrusted code.
379
380Generally, the example will demonstrate how the parse tree may be traversed to
381distill interesting information. Two functions and a set of classes are
382developed which provide programmatic access to high level function and class
383definitions provided by a module. The classes extract information from the
384parse tree and provide access to the information at a useful semantic level, one
385function provides a simple low-level pattern matching capability, and the other
386function defines a high-level interface to the classes by handling file
387operations on behalf of the caller. All source files mentioned here which are
388not part of the Python installation are located in the :file:`Demo/parser/`
389directory of the distribution.
390
391The dynamic nature of Python allows the programmer a great deal of flexibility,
392but most modules need only a limited measure of this when defining classes,
393functions, and methods. In this example, the only definitions that will be
394considered are those which are defined in the top level of their context, e.g.,
395a function defined by a :keyword:`def` statement at column zero of a module, but
396not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
397construct, though there are some good reasons for doing so in some situations.
398Nesting of definitions will be handled by the code developed in the example.
399
400To construct the upper-level extraction methods, we need to know what the parse
401tree structure looks like and how much of it we actually need to be concerned
402about. Python uses a moderately deep parse tree so there are a large number of
403intermediate nodes. It is important to read and understand the formal grammar
404used by Python. This is specified in the file :file:`Grammar/Grammar` in the
405distribution. Consider the simplest case of interest when searching for
406docstrings: a module consisting of a docstring and nothing else. (See file
407:file:`docstring.py`.) ::
408
409 """Some documentation.
410 """
411
412Using the interpreter to take a look at the parse tree, we find a bewildering
413mass of numbers and parentheses, with the documentation buried deep in nested
414tuples. ::
415
416 >>> import parser
417 >>> import pprint
418 >>> ast = parser.suite(open('docstring.py').read())
419 >>> tup = ast.totuple()
420 >>> pprint.pprint(tup)
421 (257,
422 (264,
423 (265,
424 (266,
425 (267,
426 (307,
427 (287,
428 (288,
429 (289,
430 (290,
431 (292,
432 (293,
433 (294,
434 (295,
435 (296,
436 (297,
437 (298,
438 (299,
439 (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
440 (4, ''))),
441 (4, ''),
442 (0, ''))
443
444The numbers at the first element of each node in the tree are the node types;
445they map directly to terminal and non-terminal symbols in the grammar.
446Unfortunately, they are represented as integers in the internal representation,
447and the Python structures generated do not change that. However, the
448:mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
449and dictionaries which map from the integers to the symbolic names for the node
450types.
451
452In the output presented above, the outermost tuple contains four elements: the
453integer ``257`` and three additional tuples. Node type ``257`` has the symbolic
454name :const:`file_input`. Each of these inner tuples contains an integer as the
455first element; these integers, ``264``, ``4``, and ``0``, represent the node
456types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
457Note that these values may change depending on the version of Python you are
458using; consult :file:`symbol.py` and :file:`token.py` for details of the
459mapping. It should be fairly clear that the outermost node is related primarily
460to the input source rather than the contents of the file, and may be disregarded
461for the moment. The :const:`stmt` node is much more interesting. In
462particular, all docstrings are found in subtrees which are formed exactly as
463this node is formed, with the only difference being the string itself. The
464association between the docstring in a similar tree and the defined entity
465(class, function, or module) which it describes is given by the position of the
466docstring subtree within the tree defining the described structure.
467
468By replacing the actual docstring with something to signify a variable component
469of the tree, we allow a simple pattern matching approach to check any given
470subtree for equivalence to the general pattern for docstrings. Since the
471example demonstrates information extraction, we can safely require that the tree
472be in tuple form rather than list form, allowing a simple variable
473representation to be ``['variable_name']``. A simple recursive function can
474implement the pattern matching, returning a Boolean and a dictionary of variable
475name to value mappings. (See file :file:`example.py`.) ::
476
Georg Brandl116aa622007-08-15 14:28:22 +0000477 def match(pattern, data, vars=None):
478 if vars is None:
479 vars = {}
Collin Winter1b1498b2007-08-28 06:10:19 +0000480 if isinstance(pattern, list):
Georg Brandl116aa622007-08-15 14:28:22 +0000481 vars[pattern[0]] = data
Collin Winter1b1498b2007-08-28 06:10:19 +0000482 return True, vars
483 if not instance(pattern, tuple):
Georg Brandl116aa622007-08-15 14:28:22 +0000484 return (pattern == data), vars
485 if len(data) != len(pattern):
Collin Winter1b1498b2007-08-28 06:10:19 +0000486 return False, vars
487 for pattern, data in zip(pattern, data):
Georg Brandl116aa622007-08-15 14:28:22 +0000488 same, vars = match(pattern, data, vars)
489 if not same:
490 break
491 return same, vars
492
493Using this simple representation for syntactic variables and the symbolic node
494types, the pattern for the candidate docstring subtrees becomes fairly readable.
495(See file :file:`example.py`.) ::
496
497 import symbol
498 import token
499
500 DOCSTRING_STMT_PATTERN = (
501 symbol.stmt,
502 (symbol.simple_stmt,
503 (symbol.small_stmt,
504 (symbol.expr_stmt,
505 (symbol.testlist,
506 (symbol.test,
507 (symbol.and_test,
508 (symbol.not_test,
509 (symbol.comparison,
510 (symbol.expr,
511 (symbol.xor_expr,
512 (symbol.and_expr,
513 (symbol.shift_expr,
514 (symbol.arith_expr,
515 (symbol.term,
516 (symbol.factor,
517 (symbol.power,
518 (symbol.atom,
519 (token.STRING, ['docstring'])
520 )))))))))))))))),
521 (token.NEWLINE, '')
522 ))
523
524Using the :func:`match` function with this pattern, extracting the module
525docstring from the parse tree created previously is easy::
526
527 >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
528 >>> found
Collin Winter1b1498b2007-08-28 06:10:19 +0000529 True
Georg Brandl116aa622007-08-15 14:28:22 +0000530 >>> vars
531 {'docstring': '"""Some documentation.\n"""'}
532
533Once specific data can be extracted from a location where it is expected, the
534question of where information can be expected needs to be answered. When
535dealing with docstrings, the answer is fairly simple: the docstring is the first
536:const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
537types). A module consists of a single :const:`file_input` node, and class and
538function definitions each contain exactly one :const:`suite` node. Classes and
539functions are readily identified as subtrees of code block nodes which start
540with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
541(funcdef, ...``. Note that these subtrees cannot be matched by :func:`match`
542since it does not support multiple sibling nodes to match without regard to
543number. A more elaborate matching function could be used to overcome this
544limitation, but this is sufficient for the example.
545
546Given the ability to determine whether a statement might be a docstring and
547extract the actual string from the statement, some work needs to be performed to
548walk the parse tree for an entire module and extract information about the names
549defined in each context of the module and associate any docstrings with the
550names. The code to perform this work is not complicated, but bears some
551explanation.
552
553The public interface to the classes is straightforward and should probably be
554somewhat more flexible. Each "major" block of the module is described by an
555object providing several methods for inquiry and a constructor which accepts at
556least the subtree of the complete parse tree which it represents. The
557:class:`ModuleInfo` constructor accepts an optional *name* parameter since it
558cannot otherwise determine the name of the module.
559
560The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
561:class:`ModuleInfo`. All objects provide the methods :meth:`get_name`,
562:meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`. The
563:class:`ClassInfo` objects support :meth:`get_method_names` and
564:meth:`get_method_info` while the other classes provide
565:meth:`get_function_names` and :meth:`get_function_info`.
566
567Within each of the forms of code block that the public classes represent, most
568of the required information is in the same form and is accessed in the same way,
569with classes having the distinction that functions defined at the top level are
570referred to as "methods." Since the difference in nomenclature reflects a real
571semantic distinction from functions defined outside of a class, the
572implementation needs to maintain the distinction. Hence, most of the
573functionality of the public classes can be implemented in a common base class,
574:class:`SuiteInfoBase`, with the accessors for function and method information
575provided elsewhere. Note that there is only one class which represents function
576and method information; this parallels the use of the :keyword:`def` statement
577to define both types of elements.
578
579Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
580need to be overridden by subclasses. More importantly, the extraction of most
581information from a parse tree is handled through a method called by the
582:class:`SuiteInfoBase` constructor. The example code for most of the classes is
583clear when read alongside the formal grammar, but the method which recursively
584creates new information objects requires further examination. Here is the
585relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
586
587 class SuiteInfoBase:
588 _docstring = ''
589 _name = ''
590
591 def __init__(self, tree = None):
592 self._class_info = {}
593 self._function_info = {}
594 if tree:
595 self._extract_info(tree)
596
597 def _extract_info(self, tree):
598 # extract docstring
599 if len(tree) == 2:
600 found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
601 else:
602 found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
603 if found:
604 self._docstring = eval(vars['docstring'])
605 # discover inner definitions
606 for node in tree[1:]:
607 found, vars = match(COMPOUND_STMT_PATTERN, node)
608 if found:
609 cstmt = vars['compound']
610 if cstmt[0] == symbol.funcdef:
611 name = cstmt[2][1]
612 self._function_info[name] = FunctionInfo(cstmt)
613 elif cstmt[0] == symbol.classdef:
614 name = cstmt[2][1]
615 self._class_info[name] = ClassInfo(cstmt)
616
617After initializing some internal state, the constructor calls the
618:meth:`_extract_info` method. This method performs the bulk of the information
619extraction which takes place in the entire example. The extraction has two
620distinct phases: the location of the docstring for the parse tree passed in, and
621the discovery of additional definitions within the code block represented by the
622parse tree.
623
624The initial :keyword:`if` test determines whether the nested suite is of the
625"short form" or the "long form." The short form is used when the code block is
626on the same line as the definition of the code block, as in ::
627
628 def square(x): "Square an argument."; return x ** 2
629
630while the long form uses an indented block and allows nested definitions::
631
632 def make_power(exp):
633 "Make a function that raises an argument to the exponent `exp'."
634 def raiser(x, y=exp):
635 return x ** y
636 return raiser
637
638When the short form is used, the code block may contain a docstring as the
639first, and possibly only, :const:`small_stmt` element. The extraction of such a
640docstring is slightly different and requires only a portion of the complete
641pattern used in the more common case. As implemented, the docstring will only
642be found if there is only one :const:`small_stmt` node in the
643:const:`simple_stmt` node. Since most functions and methods which use the short
644form do not provide a docstring, this may be considered sufficient. The
645extraction of the docstring proceeds using the :func:`match` function as
646described above, and the value of the docstring is stored as an attribute of the
647:class:`SuiteInfoBase` object.
648
649After docstring extraction, a simple definition discovery algorithm operates on
650the :const:`stmt` nodes of the :const:`suite` node. The special case of the
651short form is not tested; since there are no :const:`stmt` nodes in the short
652form, the algorithm will silently skip the single :const:`simple_stmt` node and
653correctly not discover any nested definitions.
654
655Each statement in the code block is categorized as a class definition, function
656or method definition, or something else. For the definition statements, the
657name of the element defined is extracted and a representation object appropriate
658to the definition is created with the defining subtree passed as an argument to
659the constructor. The representation objects are stored in instance variables
660and may be retrieved by name using the appropriate accessor methods.
661
662The public classes provide any accessors required which are more specific than
663those provided by the :class:`SuiteInfoBase` class, but the real extraction
664algorithm remains common to all forms of code blocks. A high-level function can
665be used to extract the complete set of information from a source file. (See
666file :file:`example.py`.) ::
667
668 def get_docs(fileName):
669 import os
670 import parser
671
672 source = open(fileName).read()
673 basename = os.path.basename(os.path.splitext(fileName)[0])
674 ast = parser.suite(source)
675 return ModuleInfo(ast.totuple(), basename)
676
677This provides an easy-to-use interface to the documentation of a module. If
678information is required which is not extracted by the code of this example, the
679code may be extended at clearly defined points to provide additional
680capabilities.
681