blob: 7ead2dd71c65e821f995509dffa513b8236ef015 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`parser` --- Access Python parse trees
3===========================================
4
5.. module:: parser
6 :synopsis: Access parse trees for Python source code.
7.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
Georg Brandlb19be572007-12-29 10:57:00 +000011.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
12 L. Drake, Jr. This copyright notice must be distributed on all copies, but
13 this document otherwise may be distributed as part of the Python
14 distribution. No fee may be charged for this document in any representation,
15 either on paper or electronically. This restriction does not affect other
16 elements in a distributed package in any way.
Georg Brandl8ec7f652007-08-15 14:28:01 +000017
18.. index:: single: parsing; Python source code
19
20The :mod:`parser` module provides an interface to Python's internal parser and
21byte-code compiler. The primary purpose for this interface is to allow Python
22code to edit the parse tree of a Python expression and create executable code
23from this. This is better than trying to parse and modify an arbitrary Python
24code fragment as a string because parsing is performed in a manner identical to
25the code forming the application. It is also faster.
26
27There are a few things to note about this module which are important to making
28use of the data structures created. This is not a tutorial on editing the parse
29trees for Python code, but some examples of using the :mod:`parser` module are
30presented.
31
32Most importantly, a good understanding of the Python grammar processed by the
33internal parser is required. For full information on the language syntax, refer
34to :ref:`reference-index`. The parser
35itself is created from a grammar specification defined in the file
36:file:`Grammar/Grammar` in the standard Python distribution. The parse trees
37stored in the AST objects created by this module are the actual output from the
38internal parser when created by the :func:`expr` or :func:`suite` functions,
39described below. The AST objects created by :func:`sequence2ast` faithfully
40simulate those structures. Be aware that the values of the sequences which are
41considered "correct" will vary from one version of Python to another as the
42formal grammar for the language is revised. However, transporting code from one
43Python version to another as source text will always allow correct parse trees
44to be created in the target version, with the only restriction being that
45migrating to an older version of the interpreter will not support more recent
46language constructs. The parse trees are not typically compatible from one
47version to another, whereas source code has always been forward-compatible.
48
49Each element of the sequences returned by :func:`ast2list` or :func:`ast2tuple`
50has a simple form. Sequences representing non-terminal elements in the grammar
51always have a length greater than one. The first element is an integer which
52identifies a production in the grammar. These integers are given symbolic names
53in the C header file :file:`Include/graminit.h` and the Python module
54:mod:`symbol`. Each additional element of the sequence represents a component
55of the production as recognized in the input string: these are always sequences
56which have the same form as the parent. An important aspect of this structure
57which should be noted is that keywords used to identify the parent node type,
58such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
59node tree without any special treatment. For example, the :keyword:`if` keyword
60is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
61associated with all :const:`NAME` tokens, including variable and function names
62defined by the user. In an alternate form returned when line number information
63is requested, the same token might be represented as ``(1, 'if', 12)``, where
64the ``12`` represents the line number at which the terminal symbol was found.
65
66Terminal elements are represented in much the same way, but without any child
67elements and the addition of the source text which was identified. The example
68of the :keyword:`if` keyword above is representative. The various types of
69terminal symbols are defined in the C header file :file:`Include/token.h` and
70the Python module :mod:`token`.
71
72The AST objects are not required to support the functionality of this module,
73but are provided for three purposes: to allow an application to amortize the
74cost of processing complex parse trees, to provide a parse tree representation
75which conserves memory space when compared to the Python list or tuple
76representation, and to ease the creation of additional modules in C which
77manipulate parse trees. A simple "wrapper" class may be created in Python to
78hide the use of AST objects.
79
80The :mod:`parser` module defines functions for a few distinct purposes. The
81most important purposes are to create AST objects and to convert AST objects to
82other representations such as parse trees and compiled code objects, but there
83are also functions which serve to query the type of parse tree represented by an
84AST object.
85
86
87.. seealso::
88
89 Module :mod:`symbol`
90 Useful constants representing internal nodes of the parse tree.
91
92 Module :mod:`token`
93 Useful constants representing leaf nodes of the parse tree and functions for
94 testing node values.
95
96
97.. _creating-asts:
98
99Creating AST Objects
100--------------------
101
102AST objects may be created from source code or from a parse tree. When creating
103an AST object from source, different functions are used to create the ``'eval'``
104and ``'exec'`` forms.
105
106
107.. function:: expr(source)
108
109 The :func:`expr` function parses the parameter *source* as if it were an input
110 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an AST object
111 is created to hold the internal parse tree representation, otherwise an
112 appropriate exception is thrown.
113
114
115.. function:: suite(source)
116
117 The :func:`suite` function parses the parameter *source* as if it were an input
118 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an AST object
119 is created to hold the internal parse tree representation, otherwise an
120 appropriate exception is thrown.
121
122
123.. function:: sequence2ast(sequence)
124
125 This function accepts a parse tree represented as a sequence and builds an
126 internal representation if possible. If it can validate that the tree conforms
127 to the Python grammar and all nodes are valid node types in the host version of
128 Python, an AST object is created from the internal representation and returned
129 to the called. If there is a problem creating the internal representation, or
130 if the tree cannot be validated, a :exc:`ParserError` exception is thrown. An
131 AST object created this way should not be assumed to compile correctly; normal
132 exceptions thrown by compilation may still be initiated when the AST object is
133 passed to :func:`compileast`. This may indicate problems not related to syntax
134 (such as a :exc:`MemoryError` exception), but may also be due to constructs such
135 as the result of parsing ``del f(0)``, which escapes the Python parser but is
136 checked by the bytecode compiler.
137
138 Sequences representing terminal tokens may be represented as either two-element
139 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
140 'name', 56)``. If the third element is present, it is assumed to be a valid
141 line number. The line number may be specified for any subset of the terminal
142 symbols in the input tree.
143
144
145.. function:: tuple2ast(sequence)
146
147 This is the same function as :func:`sequence2ast`. This entry point is
148 maintained for backward compatibility.
149
150
151.. _converting-asts:
152
153Converting AST Objects
154----------------------
155
156AST objects, regardless of the input used to create them, may be converted to
157parse trees represented as list- or tuple- trees, or may be compiled into
158executable code objects. Parse trees may be extracted with or without line
159numbering information.
160
161
162.. function:: ast2list(ast[, line_info])
163
164 This function accepts an AST object from the caller in *ast* and returns a
165 Python list representing the equivalent parse tree. The resulting list
166 representation can be used for inspection or the creation of a new parse tree in
167 list form. This function does not fail so long as memory is available to build
168 the list representation. If the parse tree will only be used for inspection,
169 :func:`ast2tuple` should be used instead to reduce memory consumption and
170 fragmentation. When the list representation is required, this function is
171 significantly faster than retrieving a tuple representation and converting that
172 to nested lists.
173
174 If *line_info* is true, line number information will be included for all
175 terminal tokens as a third element of the list representing the token. Note
176 that the line number provided specifies the line on which the token *ends*.
177 This information is omitted if the flag is false or omitted.
178
179
180.. function:: ast2tuple(ast[, line_info])
181
182 This function accepts an AST object from the caller in *ast* and returns a
183 Python tuple representing the equivalent parse tree. Other than returning a
184 tuple instead of a list, this function is identical to :func:`ast2list`.
185
186 If *line_info* is true, line number information will be included for all
187 terminal tokens as a third element of the list representing the token. This
188 information is omitted if the flag is false or omitted.
189
190
191.. function:: compileast(ast[, filename='<ast>'])
192
193 .. index:: builtin: eval
194
195 The Python byte compiler can be invoked on an AST object to produce code objects
196 which can be used as part of an :keyword:`exec` statement or a call to the
197 built-in :func:`eval` function. This function provides the interface to the
198 compiler, passing the internal parse tree from *ast* to the parser, using the
199 source file name specified by the *filename* parameter. The default value
200 supplied for *filename* indicates that the source was an AST object.
201
202 Compiling an AST object may result in exceptions related to compilation; an
203 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
204 this statement is considered legal within the formal grammar for Python but is
205 not a legal language construct. The :exc:`SyntaxError` raised for this
206 condition is actually generated by the Python byte-compiler normally, which is
207 why it can be raised at this point by the :mod:`parser` module. Most causes of
208 compilation failure can be diagnosed programmatically by inspection of the parse
209 tree.
210
211
212.. _querying-asts:
213
214Queries on AST Objects
215----------------------
216
217Two functions are provided which allow an application to determine if an AST was
218created as an expression or a suite. Neither of these functions can be used to
219determine if an AST was created from source code via :func:`expr` or
220:func:`suite` or from a parse tree via :func:`sequence2ast`.
221
222
223.. function:: isexpr(ast)
224
225 .. index:: builtin: compile
226
227 When *ast* represents an ``'eval'`` form, this function returns true, otherwise
228 it returns false. This is useful, since code objects normally cannot be queried
229 for this information using existing built-in functions. Note that the code
230 objects created by :func:`compileast` cannot be queried like this either, and
231 are identical to those created by the built-in :func:`compile` function.
232
233
234.. function:: issuite(ast)
235
236 This function mirrors :func:`isexpr` in that it reports whether an AST object
237 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
238 assume that this function is equivalent to ``not isexpr(ast)``, as additional
239 syntactic fragments may be supported in the future.
240
241
242.. _ast-errors:
243
244Exceptions and Error Handling
245-----------------------------
246
247The parser module defines a single exception, but may also pass other built-in
248exceptions from other portions of the Python runtime environment. See each
249function for information about the exceptions it can raise.
250
251
252.. exception:: ParserError
253
254 Exception raised when a failure occurs within the parser module. This is
255 generally produced for validation failures rather than the built in
256 :exc:`SyntaxError` thrown during normal parsing. The exception argument is
257 either a string describing the reason of the failure or a tuple containing a
258 sequence causing the failure from a parse tree passed to :func:`sequence2ast`
259 and an explanatory string. Calls to :func:`sequence2ast` need to be able to
260 handle either type of exception, while calls to other functions in the module
261 will only need to be aware of the simple string values.
262
263Note that the functions :func:`compileast`, :func:`expr`, and :func:`suite` may
264throw exceptions which are normally thrown by the parsing and compilation
265process. These include the built in exceptions :exc:`MemoryError`,
266:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
267cases, these exceptions carry all the meaning normally associated with them.
268Refer to the descriptions of each function for detailed information.
269
270
271.. _ast-objects:
272
273AST Objects
274-----------
275
276Ordered and equality comparisons are supported between AST objects. Pickling of
277AST objects (using the :mod:`pickle` module) is also supported.
278
279
280.. data:: ASTType
281
282 The type of the objects returned by :func:`expr`, :func:`suite` and
283 :func:`sequence2ast`.
284
285AST objects have the following methods:
286
287
288.. method:: AST.compile([filename])
289
290 Same as ``compileast(ast, filename)``.
291
292
293.. method:: AST.isexpr()
294
295 Same as ``isexpr(ast)``.
296
297
298.. method:: AST.issuite()
299
300 Same as ``issuite(ast)``.
301
302
303.. method:: AST.tolist([line_info])
304
305 Same as ``ast2list(ast, line_info)``.
306
307
308.. method:: AST.totuple([line_info])
309
310 Same as ``ast2tuple(ast, line_info)``.
311
312
313.. _ast-examples:
314
315Examples
316--------
317
318.. index:: builtin: compile
319
320The parser modules allows operations to be performed on the parse tree of Python
Georg Brandl63fa1682007-10-21 10:24:20 +0000321source code before the :term:`bytecode` is generated, and provides for inspection of the
Georg Brandl8ec7f652007-08-15 14:28:01 +0000322parse tree for information gathering purposes. Two examples are presented. The
323simple example demonstrates emulation of the :func:`compile` built-in function
324and the complex example shows the use of a parse tree for information discovery.
325
326
327Emulation of :func:`compile`
328^^^^^^^^^^^^^^^^^^^^^^^^^^^^
329
330While many useful operations may take place between parsing and bytecode
331generation, the simplest operation is to do nothing. For this purpose, using
332the :mod:`parser` module to produce an intermediate data structure is equivalent
333to the code ::
334
335 >>> code = compile('a + 5', 'file.py', 'eval')
336 >>> a = 5
337 >>> eval(code)
338 10
339
340The equivalent operation using the :mod:`parser` module is somewhat longer, and
341allows the intermediate internal parse tree to be retained as an AST object::
342
343 >>> import parser
344 >>> ast = parser.expr('a + 5')
345 >>> code = ast.compile('file.py')
346 >>> a = 5
347 >>> eval(code)
348 10
349
350An application which needs both AST and code objects can package this code into
351readily available functions::
352
353 import parser
354
355 def load_suite(source_string):
356 ast = parser.suite(source_string)
357 return ast, ast.compile()
358
359 def load_expression(source_string):
360 ast = parser.expr(source_string)
361 return ast, ast.compile()
362
363
364Information Discovery
365^^^^^^^^^^^^^^^^^^^^^
366
367.. index::
368 single: string; documentation
369 single: docstrings
370
371Some applications benefit from direct access to the parse tree. The remainder
372of this section demonstrates how the parse tree provides access to module
373documentation defined in docstrings without requiring that the code being
374examined be loaded into a running interpreter via :keyword:`import`. This can
375be very useful for performing analyses of untrusted code.
376
377Generally, the example will demonstrate how the parse tree may be traversed to
378distill interesting information. Two functions and a set of classes are
379developed which provide programmatic access to high level function and class
380definitions provided by a module. The classes extract information from the
381parse tree and provide access to the information at a useful semantic level, one
382function provides a simple low-level pattern matching capability, and the other
383function defines a high-level interface to the classes by handling file
384operations on behalf of the caller. All source files mentioned here which are
385not part of the Python installation are located in the :file:`Demo/parser/`
386directory of the distribution.
387
388The dynamic nature of Python allows the programmer a great deal of flexibility,
389but most modules need only a limited measure of this when defining classes,
390functions, and methods. In this example, the only definitions that will be
391considered are those which are defined in the top level of their context, e.g.,
392a function defined by a :keyword:`def` statement at column zero of a module, but
393not a function defined within a branch of an :keyword:`if` ... :keyword:`else`
394construct, though there are some good reasons for doing so in some situations.
395Nesting of definitions will be handled by the code developed in the example.
396
397To construct the upper-level extraction methods, we need to know what the parse
398tree structure looks like and how much of it we actually need to be concerned
399about. Python uses a moderately deep parse tree so there are a large number of
400intermediate nodes. It is important to read and understand the formal grammar
401used by Python. This is specified in the file :file:`Grammar/Grammar` in the
402distribution. Consider the simplest case of interest when searching for
403docstrings: a module consisting of a docstring and nothing else. (See file
404:file:`docstring.py`.) ::
405
406 """Some documentation.
407 """
408
409Using the interpreter to take a look at the parse tree, we find a bewildering
410mass of numbers and parentheses, with the documentation buried deep in nested
411tuples. ::
412
413 >>> import parser
414 >>> import pprint
415 >>> ast = parser.suite(open('docstring.py').read())
416 >>> tup = ast.totuple()
417 >>> pprint.pprint(tup)
418 (257,
419 (264,
420 (265,
421 (266,
422 (267,
423 (307,
424 (287,
425 (288,
426 (289,
427 (290,
428 (292,
429 (293,
430 (294,
431 (295,
432 (296,
433 (297,
434 (298,
435 (299,
436 (300, (3, '"""Some documentation.\n"""'))))))))))))))))),
437 (4, ''))),
438 (4, ''),
439 (0, ''))
440
441The numbers at the first element of each node in the tree are the node types;
442they map directly to terminal and non-terminal symbols in the grammar.
443Unfortunately, they are represented as integers in the internal representation,
444and the Python structures generated do not change that. However, the
445:mod:`symbol` and :mod:`token` modules provide symbolic names for the node types
446and dictionaries which map from the integers to the symbolic names for the node
447types.
448
449In the output presented above, the outermost tuple contains four elements: the
450integer ``257`` and three additional tuples. Node type ``257`` has the symbolic
451name :const:`file_input`. Each of these inner tuples contains an integer as the
452first element; these integers, ``264``, ``4``, and ``0``, represent the node
453types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively.
454Note that these values may change depending on the version of Python you are
455using; consult :file:`symbol.py` and :file:`token.py` for details of the
456mapping. It should be fairly clear that the outermost node is related primarily
457to the input source rather than the contents of the file, and may be disregarded
458for the moment. The :const:`stmt` node is much more interesting. In
459particular, all docstrings are found in subtrees which are formed exactly as
460this node is formed, with the only difference being the string itself. The
461association between the docstring in a similar tree and the defined entity
462(class, function, or module) which it describes is given by the position of the
463docstring subtree within the tree defining the described structure.
464
465By replacing the actual docstring with something to signify a variable component
466of the tree, we allow a simple pattern matching approach to check any given
467subtree for equivalence to the general pattern for docstrings. Since the
468example demonstrates information extraction, we can safely require that the tree
469be in tuple form rather than list form, allowing a simple variable
470representation to be ``['variable_name']``. A simple recursive function can
471implement the pattern matching, returning a Boolean and a dictionary of variable
472name to value mappings. (See file :file:`example.py`.) ::
473
474 from types import ListType, TupleType
475
476 def match(pattern, data, vars=None):
477 if vars is None:
478 vars = {}
479 if type(pattern) is ListType:
480 vars[pattern[0]] = data
481 return 1, vars
482 if type(pattern) is not TupleType:
483 return (pattern == data), vars
484 if len(data) != len(pattern):
485 return 0, vars
486 for pattern, data in map(None, pattern, data):
487 same, vars = match(pattern, data, vars)
488 if not same:
489 break
490 return same, vars
491
492Using this simple representation for syntactic variables and the symbolic node
493types, the pattern for the candidate docstring subtrees becomes fairly readable.
494(See file :file:`example.py`.) ::
495
496 import symbol
497 import token
498
499 DOCSTRING_STMT_PATTERN = (
500 symbol.stmt,
501 (symbol.simple_stmt,
502 (symbol.small_stmt,
503 (symbol.expr_stmt,
504 (symbol.testlist,
505 (symbol.test,
506 (symbol.and_test,
507 (symbol.not_test,
508 (symbol.comparison,
509 (symbol.expr,
510 (symbol.xor_expr,
511 (symbol.and_expr,
512 (symbol.shift_expr,
513 (symbol.arith_expr,
514 (symbol.term,
515 (symbol.factor,
516 (symbol.power,
517 (symbol.atom,
518 (token.STRING, ['docstring'])
519 )))))))))))))))),
520 (token.NEWLINE, '')
521 ))
522
523Using the :func:`match` function with this pattern, extracting the module
524docstring from the parse tree created previously is easy::
525
526 >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1])
527 >>> found
528 1
529 >>> vars
530 {'docstring': '"""Some documentation.\n"""'}
531
532Once specific data can be extracted from a location where it is expected, the
533question of where information can be expected needs to be answered. When
534dealing with docstrings, the answer is fairly simple: the docstring is the first
535:const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node
536types). A module consists of a single :const:`file_input` node, and class and
537function definitions each contain exactly one :const:`suite` node. Classes and
538functions are readily identified as subtrees of code block nodes which start
539with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt,
540(funcdef, ...``. Note that these subtrees cannot be matched by :func:`match`
541since it does not support multiple sibling nodes to match without regard to
542number. A more elaborate matching function could be used to overcome this
543limitation, but this is sufficient for the example.
544
545Given the ability to determine whether a statement might be a docstring and
546extract the actual string from the statement, some work needs to be performed to
547walk the parse tree for an entire module and extract information about the names
548defined in each context of the module and associate any docstrings with the
549names. The code to perform this work is not complicated, but bears some
550explanation.
551
552The public interface to the classes is straightforward and should probably be
553somewhat more flexible. Each "major" block of the module is described by an
554object providing several methods for inquiry and a constructor which accepts at
555least the subtree of the complete parse tree which it represents. The
556:class:`ModuleInfo` constructor accepts an optional *name* parameter since it
557cannot otherwise determine the name of the module.
558
559The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and
560:class:`ModuleInfo`. All objects provide the methods :meth:`get_name`,
561:meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`. The
562:class:`ClassInfo` objects support :meth:`get_method_names` and
563:meth:`get_method_info` while the other classes provide
564:meth:`get_function_names` and :meth:`get_function_info`.
565
566Within each of the forms of code block that the public classes represent, most
567of the required information is in the same form and is accessed in the same way,
568with classes having the distinction that functions defined at the top level are
569referred to as "methods." Since the difference in nomenclature reflects a real
570semantic distinction from functions defined outside of a class, the
571implementation needs to maintain the distinction. Hence, most of the
572functionality of the public classes can be implemented in a common base class,
573:class:`SuiteInfoBase`, with the accessors for function and method information
574provided elsewhere. Note that there is only one class which represents function
575and method information; this parallels the use of the :keyword:`def` statement
576to define both types of elements.
577
578Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not
579need to be overridden by subclasses. More importantly, the extraction of most
580information from a parse tree is handled through a method called by the
581:class:`SuiteInfoBase` constructor. The example code for most of the classes is
582clear when read alongside the formal grammar, but the method which recursively
583creates new information objects requires further examination. Here is the
584relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`::
585
586 class SuiteInfoBase:
587 _docstring = ''
588 _name = ''
589
590 def __init__(self, tree = None):
591 self._class_info = {}
592 self._function_info = {}
593 if tree:
594 self._extract_info(tree)
595
596 def _extract_info(self, tree):
597 # extract docstring
598 if len(tree) == 2:
599 found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1])
600 else:
601 found, vars = match(DOCSTRING_STMT_PATTERN, tree[3])
602 if found:
603 self._docstring = eval(vars['docstring'])
604 # discover inner definitions
605 for node in tree[1:]:
606 found, vars = match(COMPOUND_STMT_PATTERN, node)
607 if found:
608 cstmt = vars['compound']
609 if cstmt[0] == symbol.funcdef:
610 name = cstmt[2][1]
611 self._function_info[name] = FunctionInfo(cstmt)
612 elif cstmt[0] == symbol.classdef:
613 name = cstmt[2][1]
614 self._class_info[name] = ClassInfo(cstmt)
615
616After initializing some internal state, the constructor calls the
617:meth:`_extract_info` method. This method performs the bulk of the information
618extraction which takes place in the entire example. The extraction has two
619distinct phases: the location of the docstring for the parse tree passed in, and
620the discovery of additional definitions within the code block represented by the
621parse tree.
622
623The initial :keyword:`if` test determines whether the nested suite is of the
624"short form" or the "long form." The short form is used when the code block is
625on the same line as the definition of the code block, as in ::
626
627 def square(x): "Square an argument."; return x ** 2
628
629while the long form uses an indented block and allows nested definitions::
630
631 def make_power(exp):
632 "Make a function that raises an argument to the exponent `exp'."
633 def raiser(x, y=exp):
634 return x ** y
635 return raiser
636
637When the short form is used, the code block may contain a docstring as the
638first, and possibly only, :const:`small_stmt` element. The extraction of such a
639docstring is slightly different and requires only a portion of the complete
640pattern used in the more common case. As implemented, the docstring will only
641be found if there is only one :const:`small_stmt` node in the
642:const:`simple_stmt` node. Since most functions and methods which use the short
643form do not provide a docstring, this may be considered sufficient. The
644extraction of the docstring proceeds using the :func:`match` function as
645described above, and the value of the docstring is stored as an attribute of the
646:class:`SuiteInfoBase` object.
647
648After docstring extraction, a simple definition discovery algorithm operates on
649the :const:`stmt` nodes of the :const:`suite` node. The special case of the
650short form is not tested; since there are no :const:`stmt` nodes in the short
651form, the algorithm will silently skip the single :const:`simple_stmt` node and
652correctly not discover any nested definitions.
653
654Each statement in the code block is categorized as a class definition, function
655or method definition, or something else. For the definition statements, the
656name of the element defined is extracted and a representation object appropriate
657to the definition is created with the defining subtree passed as an argument to
658the constructor. The representation objects are stored in instance variables
659and may be retrieved by name using the appropriate accessor methods.
660
661The public classes provide any accessors required which are more specific than
662those provided by the :class:`SuiteInfoBase` class, but the real extraction
663algorithm remains common to all forms of code blocks. A high-level function can
664be used to extract the complete set of information from a source file. (See
665file :file:`example.py`.) ::
666
667 def get_docs(fileName):
668 import os
669 import parser
670
671 source = open(fileName).read()
672 basename = os.path.basename(os.path.splitext(fileName)[0])
673 ast = parser.suite(source)
674 return ModuleInfo(ast.totuple(), basename)
675
676This provides an easy-to-use interface to the documentation of a module. If
677information is required which is not extracted by the code of this example, the
678code may be extended at clearly defined points to provide additional
679capabilities.
680