blob: 79fc10d0675af48c7cfcf7954ecb0c90bca751ae [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001:mod:`parser` --- Access Python parse trees
2===========================================
3
4.. module:: parser
5 :synopsis: Access parse trees for Python source code.
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04006
Georg Brandl116aa622007-08-15 14:28:22 +00007.. moduleauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
Christian Heimes5b5e81c2007-12-31 16:14:33 +000010.. Copyright 1995 Virginia Polytechnic Institute and State University and Fred
11 L. Drake, Jr. This copyright notice must be distributed on all copies, but
12 this document otherwise may be distributed as part of the Python
13 distribution. No fee may be charged for this document in any representation,
14 either on paper or electronically. This restriction does not affect other
15 elements in a distributed package in any way.
Georg Brandl116aa622007-08-15 14:28:22 +000016
17.. index:: single: parsing; Python source code
18
Terry Jan Reedyfa089b92016-06-11 15:02:54 -040019--------------
20
Georg Brandl116aa622007-08-15 14:28:22 +000021The :mod:`parser` module provides an interface to Python's internal parser and
22byte-code compiler. The primary purpose for this interface is to allow Python
23code to edit the parse tree of a Python expression and create executable code
24from this. This is better than trying to parse and modify an arbitrary Python
25code fragment as a string because parsing is performed in a manner identical to
26the code forming the application. It is also faster.
27
Pablo Galindo9211e2f2019-07-30 12:04:01 +010028.. warning::
Georg Brandl0c77a822008-06-10 16:37:50 +000029
Pablo Galindo9211e2f2019-07-30 12:04:01 +010030 The parser module is deprecated and will be removed in future versions of
31 Python. For the majority of use cases you can leverage the Abstract Syntax
32 Tree (AST) generation and compilation stage, using the :mod:`ast` module.
Georg Brandl0c77a822008-06-10 16:37:50 +000033
Georg Brandl116aa622007-08-15 14:28:22 +000034There are a few things to note about this module which are important to making
35use of the data structures created. This is not a tutorial on editing the parse
36trees for Python code, but some examples of using the :mod:`parser` module are
37presented.
38
39Most importantly, a good understanding of the Python grammar processed by the
40internal parser is required. For full information on the language syntax, refer
41to :ref:`reference-index`. The parser
42itself is created from a grammar specification defined in the file
43:file:`Grammar/Grammar` in the standard Python distribution. The parse trees
Georg Brandl0c77a822008-06-10 16:37:50 +000044stored in the ST objects created by this module are the actual output from the
Georg Brandl116aa622007-08-15 14:28:22 +000045internal parser when created by the :func:`expr` or :func:`suite` functions,
Georg Brandl0c77a822008-06-10 16:37:50 +000046described below. The ST objects created by :func:`sequence2st` faithfully
Georg Brandl116aa622007-08-15 14:28:22 +000047simulate those structures. Be aware that the values of the sequences which are
48considered "correct" will vary from one version of Python to another as the
49formal grammar for the language is revised. However, transporting code from one
50Python version to another as source text will always allow correct parse trees
51to be created in the target version, with the only restriction being that
52migrating to an older version of the interpreter will not support more recent
53language constructs. The parse trees are not typically compatible from one
Prateek Nayak062cfe32019-09-20 19:25:26 +053054version to another, though source code has usually been forward-compatible within
55a major release series.
Georg Brandl116aa622007-08-15 14:28:22 +000056
Georg Brandl0c77a822008-06-10 16:37:50 +000057Each element of the sequences returned by :func:`st2list` or :func:`st2tuple`
Georg Brandl116aa622007-08-15 14:28:22 +000058has a simple form. Sequences representing non-terminal elements in the grammar
59always have a length greater than one. The first element is an integer which
60identifies a production in the grammar. These integers are given symbolic names
61in the C header file :file:`Include/graminit.h` and the Python module
62:mod:`symbol`. Each additional element of the sequence represents a component
63of the production as recognized in the input string: these are always sequences
64which have the same form as the parent. An important aspect of this structure
65which should be noted is that keywords used to identify the parent node type,
66such as the keyword :keyword:`if` in an :const:`if_stmt`, are included in the
Serhiy Storchaka2b57c432018-12-19 08:09:46 +020067node tree without any special treatment. For example, the :keyword:`!if` keyword
Georg Brandl116aa622007-08-15 14:28:22 +000068is represented by the tuple ``(1, 'if')``, where ``1`` is the numeric value
69associated with all :const:`NAME` tokens, including variable and function names
70defined by the user. In an alternate form returned when line number information
71is requested, the same token might be represented as ``(1, 'if', 12)``, where
72the ``12`` represents the line number at which the terminal symbol was found.
73
74Terminal elements are represented in much the same way, but without any child
75elements and the addition of the source text which was identified. The example
76of the :keyword:`if` keyword above is representative. The various types of
77terminal symbols are defined in the C header file :file:`Include/token.h` and
78the Python module :mod:`token`.
79
Georg Brandl0c77a822008-06-10 16:37:50 +000080The ST objects are not required to support the functionality of this module,
Georg Brandl116aa622007-08-15 14:28:22 +000081but are provided for three purposes: to allow an application to amortize the
82cost of processing complex parse trees, to provide a parse tree representation
83which conserves memory space when compared to the Python list or tuple
84representation, and to ease the creation of additional modules in C which
85manipulate parse trees. A simple "wrapper" class may be created in Python to
Georg Brandl0c77a822008-06-10 16:37:50 +000086hide the use of ST objects.
Georg Brandl116aa622007-08-15 14:28:22 +000087
88The :mod:`parser` module defines functions for a few distinct purposes. The
Georg Brandl0c77a822008-06-10 16:37:50 +000089most important purposes are to create ST objects and to convert ST objects to
Georg Brandl116aa622007-08-15 14:28:22 +000090other representations such as parse trees and compiled code objects, but there
91are also functions which serve to query the type of parse tree represented by an
Georg Brandl0c77a822008-06-10 16:37:50 +000092ST object.
Georg Brandl116aa622007-08-15 14:28:22 +000093
94
95.. seealso::
96
97 Module :mod:`symbol`
98 Useful constants representing internal nodes of the parse tree.
99
100 Module :mod:`token`
101 Useful constants representing leaf nodes of the parse tree and functions for
102 testing node values.
103
104
Georg Brandl0c77a822008-06-10 16:37:50 +0000105.. _creating-sts:
Georg Brandl116aa622007-08-15 14:28:22 +0000106
Georg Brandl0c77a822008-06-10 16:37:50 +0000107Creating ST Objects
108-------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000109
Georg Brandl0c77a822008-06-10 16:37:50 +0000110ST objects may be created from source code or from a parse tree. When creating
111an ST object from source, different functions are used to create the ``'eval'``
Georg Brandl116aa622007-08-15 14:28:22 +0000112and ``'exec'`` forms.
113
114
115.. function:: expr(source)
116
117 The :func:`expr` function parses the parameter *source* as if it were an input
Georg Brandl0c77a822008-06-10 16:37:50 +0000118 to ``compile(source, 'file.py', 'eval')``. If the parse succeeds, an ST object
Georg Brandl116aa622007-08-15 14:28:22 +0000119 is created to hold the internal parse tree representation, otherwise an
Georg Brandl7cb13192010-08-03 12:06:29 +0000120 appropriate exception is raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000121
122
123.. function:: suite(source)
124
125 The :func:`suite` function parses the parameter *source* as if it were an input
Georg Brandl0c77a822008-06-10 16:37:50 +0000126 to ``compile(source, 'file.py', 'exec')``. If the parse succeeds, an ST object
Georg Brandl116aa622007-08-15 14:28:22 +0000127 is created to hold the internal parse tree representation, otherwise an
Georg Brandl7cb13192010-08-03 12:06:29 +0000128 appropriate exception is raised.
Georg Brandl116aa622007-08-15 14:28:22 +0000129
130
Georg Brandl0c77a822008-06-10 16:37:50 +0000131.. function:: sequence2st(sequence)
Georg Brandl116aa622007-08-15 14:28:22 +0000132
133 This function accepts a parse tree represented as a sequence and builds an
134 internal representation if possible. If it can validate that the tree conforms
135 to the Python grammar and all nodes are valid node types in the host version of
Georg Brandl0c77a822008-06-10 16:37:50 +0000136 Python, an ST object is created from the internal representation and returned
Georg Brandl116aa622007-08-15 14:28:22 +0000137 to the called. If there is a problem creating the internal representation, or
Georg Brandl7cb13192010-08-03 12:06:29 +0000138 if the tree cannot be validated, a :exc:`ParserError` exception is raised. An
Georg Brandl0c77a822008-06-10 16:37:50 +0000139 ST object created this way should not be assumed to compile correctly; normal
Georg Brandl7cb13192010-08-03 12:06:29 +0000140 exceptions raised by compilation may still be initiated when the ST object is
Georg Brandl0c77a822008-06-10 16:37:50 +0000141 passed to :func:`compilest`. This may indicate problems not related to syntax
Georg Brandl116aa622007-08-15 14:28:22 +0000142 (such as a :exc:`MemoryError` exception), but may also be due to constructs such
143 as the result of parsing ``del f(0)``, which escapes the Python parser but is
144 checked by the bytecode compiler.
145
146 Sequences representing terminal tokens may be represented as either two-element
147 lists of the form ``(1, 'name')`` or as three-element lists of the form ``(1,
148 'name', 56)``. If the third element is present, it is assumed to be a valid
149 line number. The line number may be specified for any subset of the terminal
150 symbols in the input tree.
151
152
Georg Brandl0c77a822008-06-10 16:37:50 +0000153.. function:: tuple2st(sequence)
Georg Brandl116aa622007-08-15 14:28:22 +0000154
Georg Brandl0c77a822008-06-10 16:37:50 +0000155 This is the same function as :func:`sequence2st`. This entry point is
Georg Brandl116aa622007-08-15 14:28:22 +0000156 maintained for backward compatibility.
157
158
Georg Brandl0c77a822008-06-10 16:37:50 +0000159.. _converting-sts:
Georg Brandl116aa622007-08-15 14:28:22 +0000160
Georg Brandl0c77a822008-06-10 16:37:50 +0000161Converting ST Objects
162---------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000163
Georg Brandl0c77a822008-06-10 16:37:50 +0000164ST objects, regardless of the input used to create them, may be converted to
Georg Brandl116aa622007-08-15 14:28:22 +0000165parse trees represented as list- or tuple- trees, or may be compiled into
166executable code objects. Parse trees may be extracted with or without line
167numbering information.
168
169
Georg Brandl18244152009-09-02 20:34:52 +0000170.. function:: st2list(st, line_info=False, col_info=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000171
Georg Brandl30704ea02008-07-23 15:07:12 +0000172 This function accepts an ST object from the caller in *st* and returns a
Georg Brandl116aa622007-08-15 14:28:22 +0000173 Python list representing the equivalent parse tree. The resulting list
174 representation can be used for inspection or the creation of a new parse tree in
175 list form. This function does not fail so long as memory is available to build
176 the list representation. If the parse tree will only be used for inspection,
Georg Brandl0c77a822008-06-10 16:37:50 +0000177 :func:`st2tuple` should be used instead to reduce memory consumption and
Georg Brandl116aa622007-08-15 14:28:22 +0000178 fragmentation. When the list representation is required, this function is
179 significantly faster than retrieving a tuple representation and converting that
180 to nested lists.
181
182 If *line_info* is true, line number information will be included for all
183 terminal tokens as a third element of the list representing the token. Note
184 that the line number provided specifies the line on which the token *ends*.
185 This information is omitted if the flag is false or omitted.
186
187
Georg Brandl18244152009-09-02 20:34:52 +0000188.. function:: st2tuple(st, line_info=False, col_info=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000189
Georg Brandl30704ea02008-07-23 15:07:12 +0000190 This function accepts an ST object from the caller in *st* and returns a
Georg Brandl116aa622007-08-15 14:28:22 +0000191 Python tuple representing the equivalent parse tree. Other than returning a
Georg Brandl0c77a822008-06-10 16:37:50 +0000192 tuple instead of a list, this function is identical to :func:`st2list`.
Georg Brandl116aa622007-08-15 14:28:22 +0000193
194 If *line_info* is true, line number information will be included for all
195 terminal tokens as a third element of the list representing the token. This
196 information is omitted if the flag is false or omitted.
197
198
Georg Brandl18244152009-09-02 20:34:52 +0000199.. function:: compilest(st, filename='<syntax-tree>')
Georg Brandl116aa622007-08-15 14:28:22 +0000200
201 .. index::
202 builtin: exec
203 builtin: eval
204
Georg Brandl0c77a822008-06-10 16:37:50 +0000205 The Python byte compiler can be invoked on an ST object to produce code objects
Georg Brandl116aa622007-08-15 14:28:22 +0000206 which can be used as part of a call to the built-in :func:`exec` or :func:`eval`
207 functions. This function provides the interface to the compiler, passing the
Georg Brandl30704ea02008-07-23 15:07:12 +0000208 internal parse tree from *st* to the parser, using the source file name
Georg Brandl116aa622007-08-15 14:28:22 +0000209 specified by the *filename* parameter. The default value supplied for *filename*
Georg Brandl0c77a822008-06-10 16:37:50 +0000210 indicates that the source was an ST object.
Georg Brandl116aa622007-08-15 14:28:22 +0000211
Georg Brandl0c77a822008-06-10 16:37:50 +0000212 Compiling an ST object may result in exceptions related to compilation; an
Georg Brandl116aa622007-08-15 14:28:22 +0000213 example would be a :exc:`SyntaxError` caused by the parse tree for ``del f(0)``:
214 this statement is considered legal within the formal grammar for Python but is
215 not a legal language construct. The :exc:`SyntaxError` raised for this
216 condition is actually generated by the Python byte-compiler normally, which is
217 why it can be raised at this point by the :mod:`parser` module. Most causes of
218 compilation failure can be diagnosed programmatically by inspection of the parse
219 tree.
220
221
Georg Brandl0c77a822008-06-10 16:37:50 +0000222.. _querying-sts:
Georg Brandl116aa622007-08-15 14:28:22 +0000223
Georg Brandl0c77a822008-06-10 16:37:50 +0000224Queries on ST Objects
225---------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000226
Georg Brandl0c77a822008-06-10 16:37:50 +0000227Two functions are provided which allow an application to determine if an ST was
Georg Brandl116aa622007-08-15 14:28:22 +0000228created as an expression or a suite. Neither of these functions can be used to
Georg Brandl0c77a822008-06-10 16:37:50 +0000229determine if an ST was created from source code via :func:`expr` or
230:func:`suite` or from a parse tree via :func:`sequence2st`.
Georg Brandl116aa622007-08-15 14:28:22 +0000231
232
Georg Brandl30704ea02008-07-23 15:07:12 +0000233.. function:: isexpr(st)
Georg Brandl116aa622007-08-15 14:28:22 +0000234
235 .. index:: builtin: compile
236
Serhiy Storchaka138ccbb2019-11-12 16:57:03 +0200237 When *st* represents an ``'eval'`` form, this function returns ``True``, otherwise
238 it returns ``False``. This is useful, since code objects normally cannot be queried
Georg Brandl116aa622007-08-15 14:28:22 +0000239 for this information using existing built-in functions. Note that the code
Georg Brandl0c77a822008-06-10 16:37:50 +0000240 objects created by :func:`compilest` cannot be queried like this either, and
Georg Brandl116aa622007-08-15 14:28:22 +0000241 are identical to those created by the built-in :func:`compile` function.
242
243
Georg Brandl30704ea02008-07-23 15:07:12 +0000244.. function:: issuite(st)
Georg Brandl116aa622007-08-15 14:28:22 +0000245
Georg Brandl0c77a822008-06-10 16:37:50 +0000246 This function mirrors :func:`isexpr` in that it reports whether an ST object
Georg Brandl116aa622007-08-15 14:28:22 +0000247 represents an ``'exec'`` form, commonly known as a "suite." It is not safe to
Georg Brandl30704ea02008-07-23 15:07:12 +0000248 assume that this function is equivalent to ``not isexpr(st)``, as additional
Georg Brandl116aa622007-08-15 14:28:22 +0000249 syntactic fragments may be supported in the future.
250
251
Georg Brandl0c77a822008-06-10 16:37:50 +0000252.. _st-errors:
Georg Brandl116aa622007-08-15 14:28:22 +0000253
254Exceptions and Error Handling
255-----------------------------
256
257The parser module defines a single exception, but may also pass other built-in
258exceptions from other portions of the Python runtime environment. See each
259function for information about the exceptions it can raise.
260
261
262.. exception:: ParserError
263
264 Exception raised when a failure occurs within the parser module. This is
Georg Brandl7cb13192010-08-03 12:06:29 +0000265 generally produced for validation failures rather than the built-in
266 :exc:`SyntaxError` raised during normal parsing. The exception argument is
Georg Brandl116aa622007-08-15 14:28:22 +0000267 either a string describing the reason of the failure or a tuple containing a
Georg Brandl0c77a822008-06-10 16:37:50 +0000268 sequence causing the failure from a parse tree passed to :func:`sequence2st`
269 and an explanatory string. Calls to :func:`sequence2st` need to be able to
Georg Brandl116aa622007-08-15 14:28:22 +0000270 handle either type of exception, while calls to other functions in the module
271 will only need to be aware of the simple string values.
272
Georg Brandl0c77a822008-06-10 16:37:50 +0000273Note that the functions :func:`compilest`, :func:`expr`, and :func:`suite` may
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000274raise exceptions which are normally raised by the parsing and compilation
Georg Brandl116aa622007-08-15 14:28:22 +0000275process. These include the built in exceptions :exc:`MemoryError`,
276:exc:`OverflowError`, :exc:`SyntaxError`, and :exc:`SystemError`. In these
277cases, these exceptions carry all the meaning normally associated with them.
278Refer to the descriptions of each function for detailed information.
279
280
Georg Brandl0c77a822008-06-10 16:37:50 +0000281.. _st-objects:
Georg Brandl116aa622007-08-15 14:28:22 +0000282
Georg Brandl0c77a822008-06-10 16:37:50 +0000283ST Objects
284----------
Georg Brandl116aa622007-08-15 14:28:22 +0000285
Georg Brandl0c77a822008-06-10 16:37:50 +0000286Ordered and equality comparisons are supported between ST objects. Pickling of
287ST objects (using the :mod:`pickle` module) is also supported.
Georg Brandl116aa622007-08-15 14:28:22 +0000288
289
Georg Brandl0c77a822008-06-10 16:37:50 +0000290.. data:: STType
Georg Brandl116aa622007-08-15 14:28:22 +0000291
292 The type of the objects returned by :func:`expr`, :func:`suite` and
Georg Brandl0c77a822008-06-10 16:37:50 +0000293 :func:`sequence2st`.
Georg Brandl116aa622007-08-15 14:28:22 +0000294
Georg Brandl0c77a822008-06-10 16:37:50 +0000295ST objects have the following methods:
Georg Brandl116aa622007-08-15 14:28:22 +0000296
297
Georg Brandl18244152009-09-02 20:34:52 +0000298.. method:: ST.compile(filename='<syntax-tree>')
Georg Brandl116aa622007-08-15 14:28:22 +0000299
Georg Brandl0c77a822008-06-10 16:37:50 +0000300 Same as ``compilest(st, filename)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000301
302
Georg Brandl0c77a822008-06-10 16:37:50 +0000303.. method:: ST.isexpr()
Georg Brandl116aa622007-08-15 14:28:22 +0000304
Georg Brandl0c77a822008-06-10 16:37:50 +0000305 Same as ``isexpr(st)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000306
307
Georg Brandl0c77a822008-06-10 16:37:50 +0000308.. method:: ST.issuite()
Georg Brandl116aa622007-08-15 14:28:22 +0000309
Georg Brandl0c77a822008-06-10 16:37:50 +0000310 Same as ``issuite(st)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000311
312
Georg Brandl18244152009-09-02 20:34:52 +0000313.. method:: ST.tolist(line_info=False, col_info=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000314
Georg Brandl18244152009-09-02 20:34:52 +0000315 Same as ``st2list(st, line_info, col_info)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000316
317
Georg Brandl18244152009-09-02 20:34:52 +0000318.. method:: ST.totuple(line_info=False, col_info=False)
Georg Brandl116aa622007-08-15 14:28:22 +0000319
Georg Brandl18244152009-09-02 20:34:52 +0000320 Same as ``st2tuple(st, line_info, col_info)``.
Georg Brandl116aa622007-08-15 14:28:22 +0000321
322
Georg Brandl047e4862010-10-17 10:22:28 +0000323Example: Emulation of :func:`compile`
324-------------------------------------
Georg Brandl116aa622007-08-15 14:28:22 +0000325
326While many useful operations may take place between parsing and bytecode
327generation, the simplest operation is to do nothing. For this purpose, using
328the :mod:`parser` module to produce an intermediate data structure is equivalent
329to the code ::
330
331 >>> code = compile('a + 5', 'file.py', 'eval')
332 >>> a = 5
333 >>> eval(code)
334 10
335
336The equivalent operation using the :mod:`parser` module is somewhat longer, and
Georg Brandl0c77a822008-06-10 16:37:50 +0000337allows the intermediate internal parse tree to be retained as an ST object::
Georg Brandl116aa622007-08-15 14:28:22 +0000338
339 >>> import parser
Georg Brandl0c77a822008-06-10 16:37:50 +0000340 >>> st = parser.expr('a + 5')
341 >>> code = st.compile('file.py')
Georg Brandl116aa622007-08-15 14:28:22 +0000342 >>> a = 5
343 >>> eval(code)
344 10
345
Georg Brandl0c77a822008-06-10 16:37:50 +0000346An application which needs both ST and code objects can package this code into
Georg Brandl116aa622007-08-15 14:28:22 +0000347readily available functions::
348
349 import parser
350
351 def load_suite(source_string):
Georg Brandl0c77a822008-06-10 16:37:50 +0000352 st = parser.suite(source_string)
353 return st, st.compile()
Georg Brandl116aa622007-08-15 14:28:22 +0000354
355 def load_expression(source_string):
Georg Brandl0c77a822008-06-10 16:37:50 +0000356 st = parser.expr(source_string)
357 return st, st.compile()