| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | :mod:`parser` --- Access Python parse trees | 
 | 2 | =========================================== | 
 | 3 |  | 
 | 4 | .. module:: parser | 
 | 5 |    :synopsis: Access parse trees for Python source code. | 
 | 6 | .. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 7 | .. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org> | 
 | 8 |  | 
 | 9 |  | 
| Christian Heimes | 5b5e81c | 2007-12-31 16:14:33 +0000 | [diff] [blame] | 10 | .. Copyright 1995 Virginia Polytechnic Institute and State University and Fred | 
 | 11 |    L. Drake, Jr.  This copyright notice must be distributed on all copies, but | 
 | 12 |    this document otherwise may be distributed as part of the Python | 
 | 13 |    distribution.  No fee may be charged for this document in any representation, | 
 | 14 |    either on paper or electronically.  This restriction does not affect other | 
 | 15 |    elements in a distributed package in any way. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 16 |  | 
 | 17 | .. index:: single: parsing; Python source code | 
 | 18 |  | 
 | 19 | The :mod:`parser` module provides an interface to Python's internal parser and | 
 | 20 | byte-code compiler.  The primary purpose for this interface is to allow Python | 
 | 21 | code to edit the parse tree of a Python expression and create executable code | 
 | 22 | from this.  This is better than trying to parse and modify an arbitrary Python | 
 | 23 | code fragment as a string because parsing is performed in a manner identical to | 
 | 24 | the code forming the application.  It is also faster. | 
 | 25 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 26 | .. note:: | 
 | 27 |  | 
 | 28 |    From Python 2.5 onward, it's much more convenient to cut in at the Abstract | 
 | 29 |    Syntax Tree (AST) generation and compilation stage, using the :mod:`ast` | 
 | 30 |    module. | 
 | 31 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 32 | There are a few things to note about this module which are important to making | 
 | 33 | use of the data structures created.  This is not a tutorial on editing the parse | 
 | 34 | trees for Python code, but some examples of using the :mod:`parser` module are | 
 | 35 | presented. | 
 | 36 |  | 
 | 37 | Most importantly, a good understanding of the Python grammar processed by the | 
 | 38 | internal parser is required.  For full information on the language syntax, refer | 
 | 39 | to :ref:`reference-index`.  The parser | 
 | 40 | itself is created from a grammar specification defined in the file | 
 | 41 | :file:`Grammar/Grammar` in the standard Python distribution.  The parse trees | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 42 | stored in the ST objects created by this module are the actual output from the | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 43 | internal parser when created by the :func:`expr` or :func:`suite` functions, | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 44 | described below.  The ST objects created by :func:`sequence2st` faithfully | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 45 | simulate those structures.  Be aware that the values of the sequences which are | 
 | 46 | considered "correct" will vary from one version of Python to another as the | 
 | 47 | formal grammar for the language is revised.  However, transporting code from one | 
 | 48 | Python version to another as source text will always allow correct parse trees | 
 | 49 | to be created in the target version, with the only restriction being that | 
 | 50 | migrating to an older version of the interpreter will not support more recent | 
 | 51 | language constructs.  The parse trees are not typically compatible from one | 
 | 52 | version to another, whereas source code has always been forward-compatible. | 
 | 53 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 54 | Each element of the sequences returned by :func:`st2list` or :func:`st2tuple` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 55 | has a simple form.  Sequences representing non-terminal elements in the grammar | 
 | 56 | always have a length greater than one.  The first element is an integer which | 
 | 57 | identifies a production in the grammar.  These integers are given symbolic names | 
 | 58 | in the C header file :file:`Include/graminit.h` and the Python module | 
 | 59 | :mod:`symbol`.  Each additional element of the sequence represents a component | 
 | 60 | of the production as recognized in the input string: these are always sequences | 
 | 61 | which have the same form as the parent.  An important aspect of this structure | 
 | 62 | which should be noted is that keywords used to identify the parent node type, | 
 | 63 | such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the | 
 | 64 | node tree without any special treatment.  For example, the :keyword:`if` keyword | 
 | 65 | is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value | 
 | 66 | associated with all :const:`NAME` tokens, including variable and function names | 
 | 67 | defined by the user.  In an alternate form returned when line number information | 
 | 68 | is requested, the same token might be represented as ``(1, 'if', 12)``, where | 
 | 69 | the ``12`` represents the line number at which the terminal symbol was found. | 
 | 70 |  | 
 | 71 | Terminal elements are represented in much the same way, but without any child | 
 | 72 | elements and the addition of the source text which was identified.  The example | 
 | 73 | of the :keyword:`if` keyword above is representative.  The various types of | 
 | 74 | terminal symbols are defined in the C header file :file:`Include/token.h` and | 
 | 75 | the Python module :mod:`token`. | 
 | 76 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 77 | The ST objects are not required to support the functionality of this module, | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 78 | but are provided for three purposes: to allow an application to amortize the | 
 | 79 | cost of processing complex parse trees, to provide a parse tree representation | 
 | 80 | which conserves memory space when compared to the Python list or tuple | 
 | 81 | representation, and to ease the creation of additional modules in C which | 
 | 82 | manipulate parse trees.  A simple "wrapper" class may be created in Python to | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 83 | hide the use of ST objects. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 84 |  | 
 | 85 | The :mod:`parser` module defines functions for a few distinct purposes.  The | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 86 | most important purposes are to create ST objects and to convert ST objects to | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 87 | other representations such as parse trees and compiled code objects, but there | 
 | 88 | are also functions which serve to query the type of parse tree represented by an | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 89 | ST object. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 90 |  | 
 | 91 |  | 
 | 92 | .. seealso:: | 
 | 93 |  | 
 | 94 |    Module :mod:`symbol` | 
 | 95 |       Useful constants representing internal nodes of the parse tree. | 
 | 96 |  | 
 | 97 |    Module :mod:`token` | 
 | 98 |       Useful constants representing leaf nodes of the parse tree and functions for | 
 | 99 |       testing node values. | 
 | 100 |  | 
 | 101 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 102 | .. _creating-sts: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 103 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 104 | Creating ST Objects | 
 | 105 | ------------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 106 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 107 | ST objects may be created from source code or from a parse tree. When creating | 
 | 108 | an ST object from source, different functions are used to create the ``'eval'`` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 109 | and ``'exec'`` forms. | 
 | 110 |  | 
 | 111 |  | 
 | 112 | .. function:: expr(source) | 
 | 113 |  | 
 | 114 |    The :func:`expr` function parses the parameter *source* as if it were an input | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 115 |    to ``compile(source, 'file.py', 'eval')``.  If the parse succeeds, an ST object | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 |    is created to hold the internal parse tree representation, otherwise an | 
 | 117 |    appropriate exception is thrown. | 
 | 118 |  | 
 | 119 |  | 
 | 120 | .. function:: suite(source) | 
 | 121 |  | 
 | 122 |    The :func:`suite` function parses the parameter *source* as if it were an input | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 123 |    to ``compile(source, 'file.py', 'exec')``.  If the parse succeeds, an ST object | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 124 |    is created to hold the internal parse tree representation, otherwise an | 
 | 125 |    appropriate exception is thrown. | 
 | 126 |  | 
 | 127 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 128 | .. function:: sequence2st(sequence) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 129 |  | 
 | 130 |    This function accepts a parse tree represented as a sequence and builds an | 
 | 131 |    internal representation if possible.  If it can validate that the tree conforms | 
 | 132 |    to the Python grammar and all nodes are valid node types in the host version of | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 133 |    Python, an ST object is created from the internal representation and returned | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 134 |    to the called.  If there is a problem creating the internal representation, or | 
 | 135 |    if the tree cannot be validated, a :exc:`ParserError` exception is thrown.  An | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 136 |    ST object created this way should not be assumed to compile correctly; normal | 
 | 137 |    exceptions thrown by compilation may still be initiated when the ST object is | 
 | 138 |    passed to :func:`compilest`.  This may indicate problems not related to syntax | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 139 |    (such as a :exc:`MemoryError` exception), but may also be due to constructs such | 
 | 140 |    as the result of parsing ``del f(0)``, which escapes the Python parser but is | 
 | 141 |    checked by the bytecode compiler. | 
 | 142 |  | 
 | 143 |    Sequences representing terminal tokens may be represented as either two-element | 
 | 144 |    lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1, | 
 | 145 |    'name', 56)``.  If the third element is present, it is assumed to be a valid | 
 | 146 |    line number.  The line number may be specified for any subset of the terminal | 
 | 147 |    symbols in the input tree. | 
 | 148 |  | 
 | 149 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 150 | .. function:: tuple2st(sequence) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 152 |    This is the same function as :func:`sequence2st`.  This entry point is | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 153 |    maintained for backward compatibility. | 
 | 154 |  | 
 | 155 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 156 | .. _converting-sts: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 157 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 158 | Converting ST Objects | 
 | 159 | --------------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 161 | ST objects, regardless of the input used to create them, may be converted to | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 162 | parse trees represented as list- or tuple- trees, or may be compiled into | 
 | 163 | executable code objects.  Parse trees may be extracted with or without line | 
 | 164 | numbering information. | 
 | 165 |  | 
 | 166 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 167 | .. function:: st2list(st, line_info=False, col_info=False) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 168 |  | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 169 |    This function accepts an ST object from the caller in *st* and returns a | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 170 |    Python list representing the equivalent parse tree.  The resulting list | 
 | 171 |    representation can be used for inspection or the creation of a new parse tree in | 
 | 172 |    list form.  This function does not fail so long as memory is available to build | 
 | 173 |    the list representation.  If the parse tree will only be used for inspection, | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 174 |    :func:`st2tuple` should be used instead to reduce memory consumption and | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 175 |    fragmentation.  When the list representation is required, this function is | 
 | 176 |    significantly faster than retrieving a tuple representation and converting that | 
 | 177 |    to nested lists. | 
 | 178 |  | 
 | 179 |    If *line_info* is true, line number information will be included for all | 
 | 180 |    terminal tokens as a third element of the list representing the token.  Note | 
 | 181 |    that the line number provided specifies the line on which the token *ends*. | 
 | 182 |    This information is omitted if the flag is false or omitted. | 
 | 183 |  | 
 | 184 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 185 | .. function:: st2tuple(st, line_info=False, col_info=False) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 186 |  | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 187 |    This function accepts an ST object from the caller in *st* and returns a | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 188 |    Python tuple representing the equivalent parse tree.  Other than returning a | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 189 |    tuple instead of a list, this function is identical to :func:`st2list`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 190 |  | 
 | 191 |    If *line_info* is true, line number information will be included for all | 
 | 192 |    terminal tokens as a third element of the list representing the token.  This | 
 | 193 |    information is omitted if the flag is false or omitted. | 
 | 194 |  | 
 | 195 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 196 | .. function:: compilest(st, filename='<syntax-tree>') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 197 |  | 
 | 198 |    .. index:: | 
 | 199 |       builtin: exec | 
 | 200 |       builtin: eval | 
 | 201 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 202 |    The Python byte compiler can be invoked on an ST object to produce code objects | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 203 |    which can be used as part of a call to the built-in :func:`exec` or :func:`eval` | 
 | 204 |    functions. This function provides the interface to the compiler, passing the | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 205 |    internal parse tree from *st* to the parser, using the source file name | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 206 |    specified by the *filename* parameter. The default value supplied for *filename* | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 207 |    indicates that the source was an ST object. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 208 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 209 |    Compiling an ST object may result in exceptions related to compilation; an | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 210 |    example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``: | 
 | 211 |    this statement is considered legal within the formal grammar for Python but is | 
 | 212 |    not a legal language construct.  The :exc:`SyntaxError` raised for this | 
 | 213 |    condition is actually generated by the Python byte-compiler normally, which is | 
 | 214 |    why it can be raised at this point by the :mod:`parser` module.  Most causes of | 
 | 215 |    compilation failure can be diagnosed programmatically by inspection of the parse | 
 | 216 |    tree. | 
 | 217 |  | 
 | 218 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 219 | .. _querying-sts: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 220 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 221 | Queries on ST Objects | 
 | 222 | --------------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 223 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 224 | Two functions are provided which allow an application to determine if an ST was | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 225 | created as an expression or a suite.  Neither of these functions can be used to | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 226 | determine if an ST was created from source code via :func:`expr` or | 
 | 227 | :func:`suite` or from a parse tree via :func:`sequence2st`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 228 |  | 
 | 229 |  | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 230 | .. function:: isexpr(st) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 231 |  | 
 | 232 |    .. index:: builtin: compile | 
 | 233 |  | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 234 |    When *st* represents an ``'eval'`` form, this function returns true, otherwise | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 235 |    it returns false.  This is useful, since code objects normally cannot be queried | 
 | 236 |    for this information using existing built-in functions.  Note that the code | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 237 |    objects created by :func:`compilest` cannot be queried like this either, and | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 238 |    are identical to those created by the built-in :func:`compile` function. | 
 | 239 |  | 
 | 240 |  | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 241 | .. function:: issuite(st) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 242 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 243 |    This function mirrors :func:`isexpr` in that it reports whether an ST object | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 244 |    represents an ``'exec'`` form, commonly known as a "suite."  It is not safe to | 
| Georg Brandl | 30704ea0 | 2008-07-23 15:07:12 +0000 | [diff] [blame] | 245 |    assume that this function is equivalent to ``not isexpr(st)``, as additional | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 246 |    syntactic fragments may be supported in the future. | 
 | 247 |  | 
 | 248 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 249 | .. _st-errors: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 250 |  | 
 | 251 | Exceptions and Error Handling | 
 | 252 | ----------------------------- | 
 | 253 |  | 
 | 254 | The parser module defines a single exception, but may also pass other built-in | 
 | 255 | exceptions from other portions of the Python runtime environment.  See each | 
 | 256 | function for information about the exceptions it can raise. | 
 | 257 |  | 
 | 258 |  | 
 | 259 | .. exception:: ParserError | 
 | 260 |  | 
 | 261 |    Exception raised when a failure occurs within the parser module.  This is | 
 | 262 |    generally produced for validation failures rather than the built in | 
 | 263 |    :exc:`SyntaxError` thrown during normal parsing. The exception argument is | 
 | 264 |    either a string describing the reason of the failure or a tuple containing a | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 265 |    sequence causing the failure from a parse tree passed to :func:`sequence2st` | 
 | 266 |    and an explanatory string.  Calls to :func:`sequence2st` need to be able to | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 267 |    handle either type of exception, while calls to other functions in the module | 
 | 268 |    will only need to be aware of the simple string values. | 
 | 269 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 270 | Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 271 | throw exceptions which are normally thrown by the parsing and compilation | 
 | 272 | process.  These include the built in exceptions :exc:`MemoryError`, | 
 | 273 | :exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`.  In these | 
 | 274 | cases, these exceptions carry all the meaning normally associated with them. | 
 | 275 | Refer to the descriptions of each function for detailed information. | 
 | 276 |  | 
 | 277 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 278 | .. _st-objects: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 279 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 280 | ST Objects | 
 | 281 | ---------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 282 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 283 | Ordered and equality comparisons are supported between ST objects. Pickling of | 
 | 284 | ST objects (using the :mod:`pickle` module) is also supported. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 285 |  | 
 | 286 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 287 | .. data:: STType | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 288 |  | 
 | 289 |    The type of the objects returned by :func:`expr`, :func:`suite` and | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 290 |    :func:`sequence2st`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 291 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 292 | ST objects have the following methods: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 293 |  | 
 | 294 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 295 | .. method:: ST.compile(filename='<syntax-tree>') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 296 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 297 |    Same as ``compilest(st, filename)``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 298 |  | 
 | 299 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 300 | .. method:: ST.isexpr() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 301 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 302 |    Same as ``isexpr(st)``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 303 |  | 
 | 304 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 305 | .. method:: ST.issuite() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 306 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 307 |    Same as ``issuite(st)``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 308 |  | 
 | 309 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 310 | .. method:: ST.tolist(line_info=False, col_info=False) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 311 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 312 |    Same as ``st2list(st, line_info, col_info)``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 313 |  | 
 | 314 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 315 | .. method:: ST.totuple(line_info=False, col_info=False) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 316 |  | 
| Georg Brandl | 1824415 | 2009-09-02 20:34:52 +0000 | [diff] [blame] | 317 |    Same as ``st2tuple(st, line_info, col_info)``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 318 |  | 
 | 319 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 320 | .. _st-examples: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 321 |  | 
 | 322 | Examples | 
 | 323 | -------- | 
 | 324 |  | 
 | 325 | .. index:: builtin: compile | 
 | 326 |  | 
 | 327 | The parser modules allows operations to be performed on the parse tree of Python | 
| Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 328 | source code before the :term:`bytecode` is generated, and provides for inspection of the | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 329 | parse tree for information gathering purposes. Two examples are presented.  The | 
 | 330 | simple example demonstrates emulation of the :func:`compile` built-in function | 
 | 331 | and the complex example shows the use of a parse tree for information discovery. | 
 | 332 |  | 
 | 333 |  | 
 | 334 | Emulation of :func:`compile` | 
 | 335 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | 336 |  | 
 | 337 | While many useful operations may take place between parsing and bytecode | 
 | 338 | generation, the simplest operation is to do nothing.  For this purpose, using | 
 | 339 | the :mod:`parser` module to produce an intermediate data structure is equivalent | 
 | 340 | to the code :: | 
 | 341 |  | 
 | 342 |    >>> code = compile('a + 5', 'file.py', 'eval') | 
 | 343 |    >>> a = 5 | 
 | 344 |    >>> eval(code) | 
 | 345 |    10 | 
 | 346 |  | 
 | 347 | The equivalent operation using the :mod:`parser` module is somewhat longer, and | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 348 | allows the intermediate internal parse tree to be retained as an ST object:: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 349 |  | 
 | 350 |    >>> import parser | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 351 |    >>> st = parser.expr('a + 5') | 
 | 352 |    >>> code = st.compile('file.py') | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 353 |    >>> a = 5 | 
 | 354 |    >>> eval(code) | 
 | 355 |    10 | 
 | 356 |  | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 357 | An application which needs both ST and code objects can package this code into | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 358 | readily available functions:: | 
 | 359 |  | 
 | 360 |    import parser | 
 | 361 |  | 
 | 362 |    def load_suite(source_string): | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 363 |        st = parser.suite(source_string) | 
 | 364 |        return st, st.compile() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 365 |  | 
 | 366 |    def load_expression(source_string): | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 367 |        st = parser.expr(source_string) | 
 | 368 |        return st, st.compile() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 369 |  | 
 | 370 |  | 
 | 371 | Information Discovery | 
 | 372 | ^^^^^^^^^^^^^^^^^^^^^ | 
 | 373 |  | 
 | 374 | .. index:: | 
 | 375 |    single: string; documentation | 
 | 376 |    single: docstrings | 
 | 377 |  | 
 | 378 | Some applications benefit from direct access to the parse tree.  The remainder | 
 | 379 | of this section demonstrates how the parse tree provides access to module | 
 | 380 | documentation defined in docstrings without requiring that the code being | 
 | 381 | examined be loaded into a running interpreter via :keyword:`import`.  This can | 
 | 382 | be very useful for performing analyses of untrusted code. | 
 | 383 |  | 
 | 384 | Generally, the example will demonstrate how the parse tree may be traversed to | 
 | 385 | distill interesting information.  Two functions and a set of classes are | 
 | 386 | developed which provide programmatic access to high level function and class | 
 | 387 | definitions provided by a module.  The classes extract information from the | 
 | 388 | parse tree and provide access to the information at a useful semantic level, one | 
 | 389 | function provides a simple low-level pattern matching capability, and the other | 
 | 390 | function defines a high-level interface to the classes by handling file | 
 | 391 | operations on behalf of the caller.  All source files mentioned here which are | 
 | 392 | not part of the Python installation are located in the :file:`Demo/parser/` | 
 | 393 | directory of the distribution. | 
 | 394 |  | 
 | 395 | The dynamic nature of Python allows the programmer a great deal of flexibility, | 
 | 396 | but most modules need only a limited measure of this when defining classes, | 
 | 397 | functions, and methods.  In this example, the only definitions that will be | 
 | 398 | considered are those which are defined in the top level of their context, e.g., | 
 | 399 | a function defined by a :keyword:`def` statement at column zero of a module, but | 
 | 400 | not a function defined within a branch of an :keyword:`if` ... :keyword:`else` | 
 | 401 | construct, though there are some good reasons for doing so in some situations. | 
 | 402 | Nesting of definitions will be handled by the code developed in the example. | 
 | 403 |  | 
 | 404 | To construct the upper-level extraction methods, we need to know what the parse | 
 | 405 | tree structure looks like and how much of it we actually need to be concerned | 
 | 406 | about.  Python uses a moderately deep parse tree so there are a large number of | 
 | 407 | intermediate nodes.  It is important to read and understand the formal grammar | 
 | 408 | used by Python.  This is specified in the file :file:`Grammar/Grammar` in the | 
 | 409 | distribution. Consider the simplest case of interest when searching for | 
 | 410 | docstrings: a module consisting of a docstring and nothing else.  (See file | 
 | 411 | :file:`docstring.py`.) :: | 
 | 412 |  | 
 | 413 |    """Some documentation. | 
 | 414 |    """ | 
 | 415 |  | 
 | 416 | Using the interpreter to take a look at the parse tree, we find a bewildering | 
 | 417 | mass of numbers and parentheses, with the documentation buried deep in nested | 
 | 418 | tuples. :: | 
 | 419 |  | 
 | 420 |    >>> import parser | 
 | 421 |    >>> import pprint | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 422 |    >>> st = parser.suite(open('docstring.py').read()) | 
 | 423 |    >>> tup = st.totuple() | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 424 |    >>> pprint.pprint(tup) | 
 | 425 |    (257, | 
 | 426 |     (264, | 
 | 427 |      (265, | 
 | 428 |       (266, | 
 | 429 |        (267, | 
 | 430 |         (307, | 
 | 431 |          (287, | 
 | 432 |           (288, | 
 | 433 |            (289, | 
 | 434 |             (290, | 
 | 435 |              (292, | 
 | 436 |               (293, | 
 | 437 |                (294, | 
 | 438 |                 (295, | 
 | 439 |                  (296, | 
 | 440 |                   (297, | 
 | 441 |                    (298, | 
 | 442 |                     (299, | 
 | 443 |                      (300, (3, '"""Some documentation.\n"""'))))))))))))))))), | 
 | 444 |       (4, ''))), | 
 | 445 |     (4, ''), | 
 | 446 |     (0, '')) | 
 | 447 |  | 
 | 448 | The numbers at the first element of each node in the tree are the node types; | 
 | 449 | they map directly to terminal and non-terminal symbols in the grammar. | 
 | 450 | Unfortunately, they are represented as integers in the internal representation, | 
 | 451 | and the Python structures generated do not change that.  However, the | 
 | 452 | :mod:`symbol` and :mod:`token` modules provide symbolic names for the node types | 
 | 453 | and dictionaries which map from the integers to the symbolic names for the node | 
 | 454 | types. | 
 | 455 |  | 
 | 456 | In the output presented above, the outermost tuple contains four elements: the | 
 | 457 | integer ``257`` and three additional tuples.  Node type ``257`` has the symbolic | 
 | 458 | name :const:`file_input`.  Each of these inner tuples contains an integer as the | 
 | 459 | first element; these integers, ``264``, ``4``, and ``0``, represent the node | 
 | 460 | types :const:`stmt`, :const:`NEWLINE`, and :const:`ENDMARKER`, respectively. | 
 | 461 | Note that these values may change depending on the version of Python you are | 
 | 462 | using; consult :file:`symbol.py` and :file:`token.py` for details of the | 
 | 463 | mapping.  It should be fairly clear that the outermost node is related primarily | 
 | 464 | to the input source rather than the contents of the file, and may be disregarded | 
 | 465 | for the moment.  The :const:`stmt` node is much more interesting.  In | 
 | 466 | particular, all docstrings are found in subtrees which are formed exactly as | 
 | 467 | this node is formed, with the only difference being the string itself.  The | 
 | 468 | association between the docstring in a similar tree and the defined entity | 
 | 469 | (class, function, or module) which it describes is given by the position of the | 
 | 470 | docstring subtree within the tree defining the described structure. | 
 | 471 |  | 
 | 472 | By replacing the actual docstring with something to signify a variable component | 
 | 473 | of the tree, we allow a simple pattern matching approach to check any given | 
 | 474 | subtree for equivalence to the general pattern for docstrings.  Since the | 
 | 475 | example demonstrates information extraction, we can safely require that the tree | 
 | 476 | be in tuple form rather than list form, allowing a simple variable | 
 | 477 | representation to be ``['variable_name']``.  A simple recursive function can | 
 | 478 | implement the pattern matching, returning a Boolean and a dictionary of variable | 
 | 479 | name to value mappings.  (See file :file:`example.py`.) :: | 
 | 480 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 481 |    def match(pattern, data, vars=None): | 
 | 482 |        if vars is None: | 
 | 483 |            vars = {} | 
| Collin Winter | 1b1498b | 2007-08-28 06:10:19 +0000 | [diff] [blame] | 484 |        if isinstance(pattern, list): | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 485 |            vars[pattern[0]] = data | 
| Collin Winter | 1b1498b | 2007-08-28 06:10:19 +0000 | [diff] [blame] | 486 |            return True, vars | 
 | 487 |        if not instance(pattern, tuple): | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 488 |            return (pattern == data), vars | 
 | 489 |        if len(data) != len(pattern): | 
| Collin Winter | 1b1498b | 2007-08-28 06:10:19 +0000 | [diff] [blame] | 490 |            return False, vars | 
 | 491 |        for pattern, data in zip(pattern, data): | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 492 |            same, vars = match(pattern, data, vars) | 
 | 493 |            if not same: | 
 | 494 |                break | 
 | 495 |        return same, vars | 
 | 496 |  | 
 | 497 | Using this simple representation for syntactic variables and the symbolic node | 
 | 498 | types, the pattern for the candidate docstring subtrees becomes fairly readable. | 
 | 499 | (See file :file:`example.py`.) :: | 
 | 500 |  | 
 | 501 |    import symbol | 
 | 502 |    import token | 
 | 503 |  | 
 | 504 |    DOCSTRING_STMT_PATTERN = ( | 
 | 505 |        symbol.stmt, | 
 | 506 |        (symbol.simple_stmt, | 
 | 507 |         (symbol.small_stmt, | 
 | 508 |          (symbol.expr_stmt, | 
 | 509 |           (symbol.testlist, | 
 | 510 |            (symbol.test, | 
 | 511 |             (symbol.and_test, | 
 | 512 |              (symbol.not_test, | 
 | 513 |               (symbol.comparison, | 
 | 514 |                (symbol.expr, | 
 | 515 |                 (symbol.xor_expr, | 
 | 516 |                  (symbol.and_expr, | 
 | 517 |                   (symbol.shift_expr, | 
 | 518 |                    (symbol.arith_expr, | 
 | 519 |                     (symbol.term, | 
 | 520 |                      (symbol.factor, | 
 | 521 |                       (symbol.power, | 
 | 522 |                        (symbol.atom, | 
 | 523 |                         (token.STRING, ['docstring']) | 
 | 524 |                         )))))))))))))))), | 
 | 525 |         (token.NEWLINE, '') | 
 | 526 |         )) | 
 | 527 |  | 
 | 528 | Using the :func:`match` function with this pattern, extracting the module | 
 | 529 | docstring from the parse tree created previously is easy:: | 
 | 530 |  | 
 | 531 |    >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1]) | 
 | 532 |    >>> found | 
| Collin Winter | 1b1498b | 2007-08-28 06:10:19 +0000 | [diff] [blame] | 533 |    True | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 534 |    >>> vars | 
 | 535 |    {'docstring': '"""Some documentation.\n"""'} | 
 | 536 |  | 
 | 537 | Once specific data can be extracted from a location where it is expected, the | 
 | 538 | question of where information can be expected needs to be answered.  When | 
 | 539 | dealing with docstrings, the answer is fairly simple: the docstring is the first | 
 | 540 | :const:`stmt` node in a code block (:const:`file_input` or :const:`suite` node | 
 | 541 | types).  A module consists of a single :const:`file_input` node, and class and | 
 | 542 | function definitions each contain exactly one :const:`suite` node.  Classes and | 
 | 543 | functions are readily identified as subtrees of code block nodes which start | 
 | 544 | with ``(stmt, (compound_stmt, (classdef, ...`` or ``(stmt, (compound_stmt, | 
 | 545 | (funcdef, ...``.  Note that these subtrees cannot be matched by :func:`match` | 
 | 546 | since it does not support multiple sibling nodes to match without regard to | 
 | 547 | number.  A more elaborate matching function could be used to overcome this | 
 | 548 | limitation, but this is sufficient for the example. | 
 | 549 |  | 
 | 550 | Given the ability to determine whether a statement might be a docstring and | 
 | 551 | extract the actual string from the statement, some work needs to be performed to | 
 | 552 | walk the parse tree for an entire module and extract information about the names | 
 | 553 | defined in each context of the module and associate any docstrings with the | 
 | 554 | names.  The code to perform this work is not complicated, but bears some | 
 | 555 | explanation. | 
 | 556 |  | 
 | 557 | The public interface to the classes is straightforward and should probably be | 
 | 558 | somewhat more flexible.  Each "major" block of the module is described by an | 
 | 559 | object providing several methods for inquiry and a constructor which accepts at | 
 | 560 | least the subtree of the complete parse tree which it represents.  The | 
 | 561 | :class:`ModuleInfo` constructor accepts an optional *name* parameter since it | 
 | 562 | cannot otherwise determine the name of the module. | 
 | 563 |  | 
 | 564 | The public classes include :class:`ClassInfo`, :class:`FunctionInfo`, and | 
 | 565 | :class:`ModuleInfo`.  All objects provide the methods :meth:`get_name`, | 
 | 566 | :meth:`get_docstring`, :meth:`get_class_names`, and :meth:`get_class_info`.  The | 
 | 567 | :class:`ClassInfo` objects support :meth:`get_method_names` and | 
 | 568 | :meth:`get_method_info` while the other classes provide | 
 | 569 | :meth:`get_function_names` and :meth:`get_function_info`. | 
 | 570 |  | 
 | 571 | Within each of the forms of code block that the public classes represent, most | 
 | 572 | of the required information is in the same form and is accessed in the same way, | 
 | 573 | with classes having the distinction that functions defined at the top level are | 
 | 574 | referred to as "methods." Since the difference in nomenclature reflects a real | 
 | 575 | semantic distinction from functions defined outside of a class, the | 
 | 576 | implementation needs to maintain the distinction. Hence, most of the | 
 | 577 | functionality of the public classes can be implemented in a common base class, | 
 | 578 | :class:`SuiteInfoBase`, with the accessors for function and method information | 
 | 579 | provided elsewhere. Note that there is only one class which represents function | 
 | 580 | and method information; this parallels the use of the :keyword:`def` statement | 
 | 581 | to define both types of elements. | 
 | 582 |  | 
 | 583 | Most of the accessor functions are declared in :class:`SuiteInfoBase` and do not | 
 | 584 | need to be overridden by subclasses.  More importantly, the extraction of most | 
 | 585 | information from a parse tree is handled through a method called by the | 
 | 586 | :class:`SuiteInfoBase` constructor.  The example code for most of the classes is | 
 | 587 | clear when read alongside the formal grammar, but the method which recursively | 
 | 588 | creates new information objects requires further examination.  Here is the | 
 | 589 | relevant part of the :class:`SuiteInfoBase` definition from :file:`example.py`:: | 
 | 590 |  | 
 | 591 |    class SuiteInfoBase: | 
 | 592 |        _docstring = '' | 
 | 593 |        _name = '' | 
 | 594 |  | 
 | 595 |        def __init__(self, tree = None): | 
 | 596 |            self._class_info = {} | 
 | 597 |            self._function_info = {} | 
 | 598 |            if tree: | 
 | 599 |                self._extract_info(tree) | 
 | 600 |  | 
 | 601 |        def _extract_info(self, tree): | 
 | 602 |            # extract docstring | 
 | 603 |            if len(tree) == 2: | 
 | 604 |                found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1]) | 
 | 605 |            else: | 
 | 606 |                found, vars = match(DOCSTRING_STMT_PATTERN, tree[3]) | 
 | 607 |            if found: | 
 | 608 |                self._docstring = eval(vars['docstring']) | 
 | 609 |            # discover inner definitions | 
 | 610 |            for node in tree[1:]: | 
 | 611 |                found, vars = match(COMPOUND_STMT_PATTERN, node) | 
 | 612 |                if found: | 
 | 613 |                    cstmt = vars['compound'] | 
 | 614 |                    if cstmt[0] == symbol.funcdef: | 
 | 615 |                        name = cstmt[2][1] | 
 | 616 |                        self._function_info[name] = FunctionInfo(cstmt) | 
 | 617 |                    elif cstmt[0] == symbol.classdef: | 
 | 618 |                        name = cstmt[2][1] | 
 | 619 |                        self._class_info[name] = ClassInfo(cstmt) | 
 | 620 |  | 
 | 621 | After initializing some internal state, the constructor calls the | 
 | 622 | :meth:`_extract_info` method.  This method performs the bulk of the information | 
 | 623 | extraction which takes place in the entire example.  The extraction has two | 
 | 624 | distinct phases: the location of the docstring for the parse tree passed in, and | 
 | 625 | the discovery of additional definitions within the code block represented by the | 
 | 626 | parse tree. | 
 | 627 |  | 
 | 628 | The initial :keyword:`if` test determines whether the nested suite is of the | 
 | 629 | "short form" or the "long form."  The short form is used when the code block is | 
 | 630 | on the same line as the definition of the code block, as in :: | 
 | 631 |  | 
 | 632 |    def square(x): "Square an argument."; return x ** 2 | 
 | 633 |  | 
 | 634 | while the long form uses an indented block and allows nested definitions:: | 
 | 635 |  | 
 | 636 |    def make_power(exp): | 
| Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 637 |        "Make a function that raises an argument to the exponent `exp`." | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 638 |        def raiser(x, y=exp): | 
 | 639 |            return x ** y | 
 | 640 |        return raiser | 
 | 641 |  | 
 | 642 | When the short form is used, the code block may contain a docstring as the | 
 | 643 | first, and possibly only, :const:`small_stmt` element.  The extraction of such a | 
 | 644 | docstring is slightly different and requires only a portion of the complete | 
 | 645 | pattern used in the more common case.  As implemented, the docstring will only | 
 | 646 | be found if there is only one :const:`small_stmt` node in the | 
 | 647 | :const:`simple_stmt` node. Since most functions and methods which use the short | 
 | 648 | form do not provide a docstring, this may be considered sufficient.  The | 
 | 649 | extraction of the docstring proceeds using the :func:`match` function as | 
 | 650 | described above, and the value of the docstring is stored as an attribute of the | 
 | 651 | :class:`SuiteInfoBase` object. | 
 | 652 |  | 
 | 653 | After docstring extraction, a simple definition discovery algorithm operates on | 
 | 654 | the :const:`stmt` nodes of the :const:`suite` node.  The special case of the | 
 | 655 | short form is not tested; since there are no :const:`stmt` nodes in the short | 
 | 656 | form, the algorithm will silently skip the single :const:`simple_stmt` node and | 
 | 657 | correctly not discover any nested definitions. | 
 | 658 |  | 
 | 659 | Each statement in the code block is categorized as a class definition, function | 
 | 660 | or method definition, or something else.  For the definition statements, the | 
 | 661 | name of the element defined is extracted and a representation object appropriate | 
 | 662 | to the definition is created with the defining subtree passed as an argument to | 
 | 663 | the constructor.  The representation objects are stored in instance variables | 
 | 664 | and may be retrieved by name using the appropriate accessor methods. | 
 | 665 |  | 
 | 666 | The public classes provide any accessors required which are more specific than | 
 | 667 | those provided by the :class:`SuiteInfoBase` class, but the real extraction | 
 | 668 | algorithm remains common to all forms of code blocks.  A high-level function can | 
 | 669 | be used to extract the complete set of information from a source file.  (See | 
 | 670 | file :file:`example.py`.) :: | 
 | 671 |  | 
 | 672 |    def get_docs(fileName): | 
 | 673 |        import os | 
 | 674 |        import parser | 
 | 675 |  | 
 | 676 |        source = open(fileName).read() | 
 | 677 |        basename = os.path.basename(os.path.splitext(fileName)[0]) | 
| Georg Brandl | 0c77a82 | 2008-06-10 16:37:50 +0000 | [diff] [blame] | 678 |        st = parser.suite(source) | 
 | 679 |        return ModuleInfo(st.totuple(), basename) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 680 |  | 
 | 681 | This provides an easy-to-use interface to the documentation of a module.  If | 
 | 682 | information is required which is not extracted by the code of this example, the | 
 | 683 | code may be extended at clearly defined points to provide additional | 
 | 684 | capabilities. | 
 | 685 |  |