blob: acda372ef08fa1ea58aaf8dfd928f99a83b5911f [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`parser` --- Access Python parse trees
3===========================================
4
5.. module:: parser
6 :synopsis: Access parse trees for Python source code.
7.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
Georg Brandlb19be572007-12-29 10:57:00 +000011.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
12 L. Drake, Jr. This copyright notice must be distributed on all copies, but
13 this document otherwise may be distributed as part of the Python
14 distribution. No fee may be charged for this document in any representation,
15 either on paper or electronically. This restriction does not affect other
16 elements in a distributed package in any way.
Georg Brandl8ec7f652007-08-15 14:28:01 +000017
18.. index:: single: parsing; Python source code
19
20The :mod:`parser` module provides an interface to Python's internal parser and
21byte-code compiler. The primary purpose for this interface is to allow Python
22code to edit the parse tree of a Python expression and create executable code
23from this. This is better than trying to parse and modify an arbitrary Python
24code fragment as a string because parsing is performed in a manner identical to
25the code forming the application. It is also faster.
26
Georg Brandl9cea5112008-06-07 18:17:37 +000027.. note::
28
29 From Python 2.5 onward, it's much more convenient to cut in at the Abstract
30 Syntax Tree (AST) generation and compilation stage, using the :mod:`ast`
31 module.
32
33 The :mod:`parser` module exports the names documented here also with "st"
34 replaced by "ast"; this is a legacy from the time when there was no other
35 AST and has nothing to do with the AST found in Python 2.5. This is also the
36 reason for the functions' keyword arguments being called *ast*, not *st*.
Éric Araujo06176a82012-07-02 17:46:40 -040037 The "ast" functions have been removed in Python 3.
Georg Brandl9cea5112008-06-07 18:17:37 +000038
Georg Brandl8ec7f652007-08-15 14:28:01 +000039There are a few things to note about this module which are important to making
40use of the data structures created. This is not a tutorial on editing the parse
41trees for Python code, but some examples of using the :mod:`parser` module are
42presented.
43
44Most importantly, a good understanding of the Python grammar processed by the
45internal parser is required. For full information on the language syntax, refer
46to :ref:`reference-index`. The parser
47itself is created from a grammar specification defined in the file
48:file:`Grammar/Grammar` in the standard Python distribution. The parse trees
Georg Brandl9cea5112008-06-07 18:17:37 +000049stored in the ST objects created by this module are the actual output from the
Georg Brandl8ec7f652007-08-15 14:28:01 +000050internal parser when created by the :func:`expr` or :func:`suite` functions,
Georg Brandl9cea5112008-06-07 18:17:37 +000051described below. The ST objects created by :func:`sequence2st` faithfully
Georg Brandl8ec7f652007-08-15 14:28:01 +000052simulate those structures. Be aware that the values of the sequences which are
53considered "correct" will vary from one version of Python to another as the
54formal grammar for the language is revised. However, transporting code from one
55Python version to another as source text will always allow correct parse trees
56to be created in the target version, with the only restriction being that
57migrating to an older version of the interpreter will not support more recent
58language constructs. The parse trees are not typically compatible from one
59version to another, whereas source code has always been forward-compatible.
60
Georg Brandl9cea5112008-06-07 18:17:37 +000061Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
Georg Brandl8ec7f652007-08-15 14:28:01 +000062has a simple form. Sequences representing non-terminal elements in the grammar
63always have a length greater than one. The first element is an integer which
64identifies a production in the grammar. These integers are given symbolic names
65in the C header file :file:`Include/graminit.h` and the Python module
66:mod:`symbol`. Each additional element of the sequence represents a component
67of the production as recognized in the input string: these are always sequences
68which have the same form as the parent. An important aspect of this structure
69which should be noted is that keywords used to identify the parent node type,
70such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
71node tree without any special treatment. For example, the :keyword:`if` keyword
72is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
73associated with all :const:`NAME` tokens, including variable and function names
74defined by the user. In an alternate form returned when line number information
75is requested, the same token might be represented as ``(1, 'if', 12)``, where
76the ``12`` represents the line number at which the terminal symbol was found.
77
78Terminal elements are represented in much the same way, but without any child
79elements and the addition of the source text which was identified. The example
80of the :keyword:`if` keyword above is representative. The various types of
81terminal symbols are defined in the C header file :file:`Include/token.h` and
82the Python module :mod:`token`.
83
Georg Brandl9cea5112008-06-07 18:17:37 +000084The ST objects are not required to support the functionality of this module,
Georg Brandl8ec7f652007-08-15 14:28:01 +000085but are provided for three purposes: to allow an application to amortize the
86cost of processing complex parse trees, to provide a parse tree representation
87which conserves memory space when compared to the Python list or tuple
88representation, and to ease the creation of additional modules in C which
89manipulate parse trees. A simple "wrapper" class may be created in Python to
Georg Brandl9cea5112008-06-07 18:17:37 +000090hide the use of ST objects.
Georg Brandl8ec7f652007-08-15 14:28:01 +000091
92The :mod:`parser` module defines functions for a few distinct purposes. The
Georg Brandl9cea5112008-06-07 18:17:37 +000093most important purposes are to create ST objects and to convert ST objects to
Georg Brandl8ec7f652007-08-15 14:28:01 +000094other representations such as parse trees and compiled code objects, but there
95are also functions which serve to query the type of parse tree represented by an
Georg Brandl9cea5112008-06-07 18:17:37 +000096ST object.
Georg Brandl8ec7f652007-08-15 14:28:01 +000097
98
99.. seealso::
100
101 Module :mod:`symbol`
102 Useful constants representing internal nodes of the parse tree.
103
104 Module :mod:`token`
105 Useful constants representing leaf nodes of the parse tree and functions for
106 testing node values.
107
108
Georg Brandl9cea5112008-06-07 18:17:37 +0000109.. _creating-sts:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000110
Georg Brandl9cea5112008-06-07 18:17:37 +0000111Creating ST Objects
112-------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000113
Georg Brandl9cea5112008-06-07 18:17:37 +0000114ST objects may be created from source code or from a parse tree. When creating
115an ST object from source, different functions are used to create the ``'eval'``
Georg Brandl8ec7f652007-08-15 14:28:01 +0000116and ``'exec'`` forms.
117
118
119.. function:: expr(source)
120
121 The :func:`expr` function parses the parameter *source* as if it were an input
Georg Brandl9cea5112008-06-07 18:17:37 +0000122 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object
Georg Brandl8ec7f652007-08-15 14:28:01 +0000123 is created to hold the internal parse tree representation, otherwise an
Georg Brandl21946af2010-10-06 09:28:45 +0000124 appropriate exception is raised.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000125
126
127.. function:: suite(source)
128
129 The :func:`suite` function parses the parameter *source* as if it were an input
Georg Brandl9cea5112008-06-07 18:17:37 +0000130 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object
Georg Brandl8ec7f652007-08-15 14:28:01 +0000131 is created to hold the internal parse tree representation, otherwise an
Georg Brandl21946af2010-10-06 09:28:45 +0000132 appropriate exception is raised.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000133
134
Georg Brandl9cea5112008-06-07 18:17:37 +0000135.. function:: sequence2st(sequence)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000136
137 This function accepts a parse tree represented as a sequence and builds an
138 internal representation if possible. If it can validate that the tree conforms
139 to the Python grammar and all nodes are valid node types in the host version of
Georg Brandl9cea5112008-06-07 18:17:37 +0000140 Python, an ST object is created from the internal representation and returned
Georg Brandl8ec7f652007-08-15 14:28:01 +0000141 to the called. If there is a problem creating the internal representation, or
Georg Brandl21946af2010-10-06 09:28:45 +0000142 if the tree cannot be validated, a :exc:`ParserError` exception is raised. An
Georg Brandl9cea5112008-06-07 18:17:37 +0000143 ST object created this way should not be assumed to compile correctly; normal
Georg Brandl21946af2010-10-06 09:28:45 +0000144 exceptions raised by compilation may still be initiated when the ST object is
Georg Brandl9cea5112008-06-07 18:17:37 +0000145 passed to :func:`compilest`. This may indicate problems not related to syntax
Georg Brandl8ec7f652007-08-15 14:28:01 +0000146 (such as a :exc:`MemoryError` exception), but may also be due to constructs such
147 as the result of parsing ``del f(0)``, which escapes the Python parser but is
148 checked by the bytecode compiler.
149
150 Sequences representing terminal tokens may be represented as either two-element
151 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
152 'name', 56)``. If the third element is present, it is assumed to be a valid
153 line number. The line number may be specified for any subset of the terminal
154 symbols in the input tree.
155
156
Georg Brandl9cea5112008-06-07 18:17:37 +0000157.. function:: tuple2st(sequence)
Georg Brandl8ec7f652007-08-15 14:28:01 +0000158
Georg Brandl9cea5112008-06-07 18:17:37 +0000159 This is the same function as :func:`sequence2st`. This entry point is
Georg Brandl8ec7f652007-08-15 14:28:01 +0000160 maintained for backward compatibility.
161
162
Georg Brandl9cea5112008-06-07 18:17:37 +0000163.. _converting-sts:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000164
Georg Brandl9cea5112008-06-07 18:17:37 +0000165Converting ST Objects
166---------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000167
Georg Brandl9cea5112008-06-07 18:17:37 +0000168ST objects, regardless of the input used to create them, may be converted to
Georg Brandl8ec7f652007-08-15 14:28:01 +0000169parse trees represented as list- or tuple- trees, or may be compiled into
170executable code objects. Parse trees may be extracted with or without line
171numbering information.
172
173
Georg Brandl9cea5112008-06-07 18:17:37 +0000174.. function:: st2list(ast[, line_info])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000175
Georg Brandl9cea5112008-06-07 18:17:37 +0000176 This function accepts an ST object from the caller in *ast* and returns a
Georg Brandl8ec7f652007-08-15 14:28:01 +0000177 Python list representing the equivalent parse tree. The resulting list
178 representation can be used for inspection or the creation of a new parse tree in
179 list form. This function does not fail so long as memory is available to build
180 the list representation. If the parse tree will only be used for inspection,
Georg Brandl9cea5112008-06-07 18:17:37 +0000181 :func:`st2tuple` should be used instead to reduce memory consumption and
Georg Brandl8ec7f652007-08-15 14:28:01 +0000182 fragmentation. When the list representation is required, this function is
183 significantly faster than retrieving a tuple representation and converting that
184 to nested lists.
185
186 If *line_info* is true, line number information will be included for all
187 terminal tokens as a third element of the list representing the token. Note
188 that the line number provided specifies the line on which the token *ends*.
189 This information is omitted if the flag is false or omitted.
190
191
Georg Brandl9cea5112008-06-07 18:17:37 +0000192.. function:: st2tuple(ast[, line_info])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000193
Georg Brandl9cea5112008-06-07 18:17:37 +0000194 This function accepts an ST object from the caller in *ast* and returns a
Georg Brandl8ec7f652007-08-15 14:28:01 +0000195 Python tuple representing the equivalent parse tree. Other than returning a
Georg Brandl9cea5112008-06-07 18:17:37 +0000196 tuple instead of a list, this function is identical to :func:`st2list`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000197
198 If *line_info* is true, line number information will be included for all
199 terminal tokens as a third element of the list representing the token. This
200 information is omitted if the flag is false or omitted.
201
202
Hynek Schlawacke58ce012012-05-22 10:27:40 +0200203.. function:: compilest(ast, filename='<syntax-tree>')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000204
205 .. index:: builtin: eval
206
Georg Brandl9cea5112008-06-07 18:17:37 +0000207 The Python byte compiler can be invoked on an ST object to produce code objects
Georg Brandl8ec7f652007-08-15 14:28:01 +0000208 which can be used as part of an :keyword:`exec` statement or a call to the
209 built-in :func:`eval` function. This function provides the interface to the
210 compiler, passing the internal parse tree from *ast* to the parser, using the
211 source file name specified by the *filename* parameter. The default value
Georg Brandl9cea5112008-06-07 18:17:37 +0000212 supplied for *filename* indicates that the source was an ST object.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000213
Georg Brandl9cea5112008-06-07 18:17:37 +0000214 Compiling an ST object may result in exceptions related to compilation; an
Georg Brandl8ec7f652007-08-15 14:28:01 +0000215 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
216 this statement is considered legal within the formal grammar for Python but is
217 not a legal language construct. The :exc:`SyntaxError` raised for this
218 condition is actually generated by the Python byte-compiler normally, which is
219 why it can be raised at this point by the :mod:`parser` module. Most causes of
220 compilation failure can be diagnosed programmatically by inspection of the parse
221 tree.
222
223
Georg Brandl9cea5112008-06-07 18:17:37 +0000224.. _querying-sts:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000225
Georg Brandl9cea5112008-06-07 18:17:37 +0000226Queries on ST Objects
227---------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000228
Georg Brandl9cea5112008-06-07 18:17:37 +0000229Two functions are provided which allow an application to determine if an ST was
Georg Brandl8ec7f652007-08-15 14:28:01 +0000230created as an expression or a suite. Neither of these functions can be used to
Georg Brandl9cea5112008-06-07 18:17:37 +0000231determine if an ST was created from source code via :func:`expr` or
232:func:`suite` or from a parse tree via :func:`sequence2st`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000233
234
235.. function:: isexpr(ast)
236
237 .. index:: builtin: compile
238
239 When *ast* represents an ``'eval'`` form, this function returns true, otherwise
240 it returns false. This is useful, since code objects normally cannot be queried
241 for this information using existing built-in functions. Note that the code
Georg Brandl9cea5112008-06-07 18:17:37 +0000242 objects created by :func:`compilest` cannot be queried like this either, and
Georg Brandl8ec7f652007-08-15 14:28:01 +0000243 are identical to those created by the built-in :func:`compile` function.
244
245
246.. function:: issuite(ast)
247
Georg Brandl9cea5112008-06-07 18:17:37 +0000248 This function mirrors :func:`isexpr` in that it reports whether an ST object
Georg Brandl8ec7f652007-08-15 14:28:01 +0000249 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
250 assume that this function is equivalent to ``not isexpr(ast)``, as additional
251 syntactic fragments may be supported in the future.
252
253
Georg Brandl9cea5112008-06-07 18:17:37 +0000254.. _st-errors:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000255
256Exceptions and Error Handling
257-----------------------------
258
259The parser module defines a single exception, but may also pass other built-in
260exceptions from other portions of the Python runtime environment. See each
261function for information about the exceptions it can raise.
262
263
264.. exception:: ParserError
265
266 Exception raised when a failure occurs within the parser module. This is
Georg Brandl21946af2010-10-06 09:28:45 +0000267 generally produced for validation failures rather than the built-in
268 :exc:`SyntaxError` raised during normal parsing. The exception argument is
Georg Brandl8ec7f652007-08-15 14:28:01 +0000269 either a string describing the reason of the failure or a tuple containing a
Georg Brandl9cea5112008-06-07 18:17:37 +0000270 sequence causing the failure from a parse tree passed to :func:`sequence2st`
271 and an explanatory string. Calls to :func:`sequence2st` need to be able to
Georg Brandl8ec7f652007-08-15 14:28:01 +0000272 handle either type of exception, while calls to other functions in the module
273 will only need to be aware of the simple string values.
274
Georg Brandl9cea5112008-06-07 18:17:37 +0000275Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
Éric Araujo4c8d6b62010-11-30 17:53:45 +0000276raise exceptions which are normally raised by the parsing and compilation
Georg Brandl8ec7f652007-08-15 14:28:01 +0000277process. These include the built in exceptions :exc:`MemoryError`,
278:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
279cases, these exceptions carry all the meaning normally associated with them.
280Refer to the descriptions of each function for detailed information.
281
282
Georg Brandl9cea5112008-06-07 18:17:37 +0000283.. _st-objects:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000284
Georg Brandl9cea5112008-06-07 18:17:37 +0000285ST Objects
286----------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000287
Georg Brandl9cea5112008-06-07 18:17:37 +0000288Ordered and equality comparisons are supported between ST objects. Pickling of
289ST objects (using the :mod:`pickle` module) is also supported.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000290
291
Georg Brandl9cea5112008-06-07 18:17:37 +0000292.. data:: STType
Georg Brandl8ec7f652007-08-15 14:28:01 +0000293
294 The type of the objects returned by :func:`expr`, :func:`suite` and
Georg Brandl9cea5112008-06-07 18:17:37 +0000295 :func:`sequence2st`.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000296
Georg Brandl9cea5112008-06-07 18:17:37 +0000297ST objects have the following methods:
Georg Brandl8ec7f652007-08-15 14:28:01 +0000298
299
Georg Brandl9cea5112008-06-07 18:17:37 +0000300.. method:: ST.compile([filename])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000301
Georg Brandl9cea5112008-06-07 18:17:37 +0000302 Same as ``compilest(st, filename)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000303
304
Georg Brandl9cea5112008-06-07 18:17:37 +0000305.. method:: ST.isexpr()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000306
Georg Brandl9cea5112008-06-07 18:17:37 +0000307 Same as ``isexpr(st)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000308
309
Georg Brandl9cea5112008-06-07 18:17:37 +0000310.. method:: ST.issuite()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000311
Georg Brandl9cea5112008-06-07 18:17:37 +0000312 Same as ``issuite(st)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000313
314
Georg Brandl9cea5112008-06-07 18:17:37 +0000315.. method:: ST.tolist([line_info])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000316
Georg Brandl9cea5112008-06-07 18:17:37 +0000317 Same as ``st2list(st, line_info)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000318
319
Georg Brandl9cea5112008-06-07 18:17:37 +0000320.. method:: ST.totuple([line_info])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000321
Georg Brandl9cea5112008-06-07 18:17:37 +0000322 Same as ``st2tuple(st, line_info)``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000323
324
Georg Brandlb8d0e362010-11-26 07:53:50 +0000325Example: Emulation of :func:`compile`
326-------------------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000327
328While many useful operations may take place between parsing and bytecode
329generation, the simplest operation is to do nothing. For this purpose, using
330the :mod:`parser` module to produce an intermediate data structure is equivalent
331to the code ::
332
333 >>> code = compile('a + 5', 'file.py', 'eval')
334 >>> a = 5
335 >>> eval(code)
336 10
337
338The equivalent operation using the :mod:`parser` module is somewhat longer, and
Georg Brandl9cea5112008-06-07 18:17:37 +0000339allows the intermediate internal parse tree to be retained as an ST object::
Georg Brandl8ec7f652007-08-15 14:28:01 +0000340
341 >>> import parser
Georg Brandl9cea5112008-06-07 18:17:37 +0000342 >>> st = parser.expr('a + 5')
343 >>> code = st.compile('file.py')
Georg Brandl8ec7f652007-08-15 14:28:01 +0000344 >>> a = 5
345 >>> eval(code)
346 10
347
Georg Brandl9cea5112008-06-07 18:17:37 +0000348An application which needs both ST and code objects can package this code into
Georg Brandl8ec7f652007-08-15 14:28:01 +0000349readily available functions::
350
351 import parser
352
353 def load_suite(source_string):
Georg Brandl9cea5112008-06-07 18:17:37 +0000354 st = parser.suite(source_string)
355 return st, st.compile()
Georg Brandl8ec7f652007-08-15 14:28:01 +0000356
357 def load_expression(source_string):
Georg Brandl9cea5112008-06-07 18:17:37 +0000358 st = parser.expr(source_string)
359 return st, st.compile()