blob: 0359f84d42db2c5c1dfa9671826273c08ebe19ea [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001
2:mod:`tokenize` --- Tokenizer for Python source
3===============================================
4
5.. module:: tokenize
6 :synopsis: Lexical scanner for Python source code.
7.. moduleauthor:: Ka Ping Yee
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
11The :mod:`tokenize` module provides a lexical scanner for Python source code,
12implemented in Python. The scanner in this module returns comments as tokens as
13well, making it useful for implementing "pretty-printers," including colorizers
14for on-screen displays.
15
16The primary entry point is a generator:
17
18
19.. function:: generate_tokens(readline)
20
21 The :func:`generate_tokens` generator requires one argment, *readline*, which
22 must be a callable object which provides the same interface as the
23 :meth:`readline` method of built-in file objects (see section
24 :ref:`bltin-file-objects`). Each call to the function should return one line of
25 input as a string.
26
27 The generator produces 5-tuples with these members: the token type; the token
28 string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column where
29 the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints specifying
30 the row and column where the token ends in the source; and the line on which the
31 token was found. The line passed is the *logical* line; continuation lines are
32 included.
33
Georg Brandl116aa622007-08-15 14:28:22 +000034
35An older entry point is retained for backward compatibility:
36
Georg Brandl116aa622007-08-15 14:28:22 +000037.. function:: tokenize(readline[, tokeneater])
38
39 The :func:`tokenize` function accepts two parameters: one representing the input
40 stream, and one providing an output mechanism for :func:`tokenize`.
41
42 The first parameter, *readline*, must be a callable object which provides the
43 same interface as the :meth:`readline` method of built-in file objects (see
44 section :ref:`bltin-file-objects`). Each call to the function should return one
45 line of input as a string. Alternately, *readline* may be a callable object that
46 signals completion by raising :exc:`StopIteration`.
47
Georg Brandl116aa622007-08-15 14:28:22 +000048 The second parameter, *tokeneater*, must also be a callable object. It is
49 called once for each token, with five arguments, corresponding to the tuples
50 generated by :func:`generate_tokens`.
51
Georg Brandl55ac8f02007-09-01 13:51:09 +000052
Georg Brandl116aa622007-08-15 14:28:22 +000053All constants from the :mod:`token` module are also exported from
54:mod:`tokenize`, as are two additional token type values that might be passed to
55the *tokeneater* function by :func:`tokenize`:
56
Georg Brandl116aa622007-08-15 14:28:22 +000057.. data:: COMMENT
58
59 Token value used to indicate a comment.
60
61
62.. data:: NL
63
64 Token value used to indicate a non-terminating newline. The NEWLINE token
65 indicates the end of a logical line of Python code; NL tokens are generated when
66 a logical line of code is continued over multiple physical lines.
67
68Another function is provided to reverse the tokenization process. This is useful
69for creating tools that tokenize a script, modify the token stream, and write
70back the modified script.
71
72
73.. function:: untokenize(iterable)
74
75 Converts tokens back into Python source code. The *iterable* must return
76 sequences with at least two elements, the token type and the token string. Any
77 additional sequence elements are ignored.
78
79 The reconstructed script is returned as a single string. The result is
80 guaranteed to tokenize back to match the input so that the conversion is
81 lossless and round-trips are assured. The guarantee applies only to the token
82 type and token string as the spacing between tokens (column positions) may
83 change.
84
Georg Brandl116aa622007-08-15 14:28:22 +000085
86Example of a script re-writer that transforms float literals into Decimal
87objects::
88
89 def decistmt(s):
90 """Substitute Decimals for floats in a string of statements.
91
92 >>> from decimal import Decimal
Georg Brandl6911e3c2007-09-04 07:15:32 +000093 >>> s = 'print(+21.3e-5*-.1234/81.7)'
Georg Brandl116aa622007-08-15 14:28:22 +000094 >>> decistmt(s)
Georg Brandl6911e3c2007-09-04 07:15:32 +000095 "print(+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
Georg Brandl116aa622007-08-15 14:28:22 +000096
97 >>> exec(s)
98 -3.21716034272e-007
99 >>> exec(decistmt(s))
100 -3.217160342717258261933904529E-7
101
102 """
103 result = []
104 g = generate_tokens(StringIO(s).readline) # tokenize the string
105 for toknum, tokval, _, _, _ in g:
106 if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
107 result.extend([
108 (NAME, 'Decimal'),
109 (OP, '('),
110 (STRING, repr(tokval)),
111 (OP, ')')
112 ])
113 else:
114 result.append((toknum, tokval))
115 return untokenize(result)
116