blob: 9a2a11a7dd1254930587d3b2929a5903f53b124d [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001
2:mod:`tokenize` --- Tokenizer for Python source
3===============================================
4
5.. module:: tokenize
6 :synopsis: Lexical scanner for Python source code.
7.. moduleauthor:: Ka Ping Yee
8.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
9
10
11The :mod:`tokenize` module provides a lexical scanner for Python source code,
12implemented in Python. The scanner in this module returns comments as tokens as
13well, making it useful for implementing "pretty-printers," including colorizers
14for on-screen displays.
15
Georg Brandlcf3fb252007-10-21 10:52:38 +000016The primary entry point is a :term:`generator`:
Georg Brandl8ec7f652007-08-15 14:28:01 +000017
18
19.. function:: generate_tokens(readline)
20
21 The :func:`generate_tokens` generator requires one argment, *readline*, which
22 must be a callable object which provides the same interface as the
23 :meth:`readline` method of built-in file objects (see section
24 :ref:`bltin-file-objects`). Each call to the function should return one line of
25 input as a string.
26
27 The generator produces 5-tuples with these members: the token type; the token
28 string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column where
29 the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints specifying
30 the row and column where the token ends in the source; and the line on which the
31 token was found. The line passed is the *logical* line; continuation lines are
32 included.
33
34 .. versionadded:: 2.2
35
36An older entry point is retained for backward compatibility:
37
38
39.. function:: tokenize(readline[, tokeneater])
40
41 The :func:`tokenize` function accepts two parameters: one representing the input
42 stream, and one providing an output mechanism for :func:`tokenize`.
43
44 The first parameter, *readline*, must be a callable object which provides the
45 same interface as the :meth:`readline` method of built-in file objects (see
46 section :ref:`bltin-file-objects`). Each call to the function should return one
47 line of input as a string. Alternately, *readline* may be a callable object that
48 signals completion by raising :exc:`StopIteration`.
49
50 .. versionchanged:: 2.5
51 Added :exc:`StopIteration` support.
52
53 The second parameter, *tokeneater*, must also be a callable object. It is
54 called once for each token, with five arguments, corresponding to the tuples
55 generated by :func:`generate_tokens`.
56
57All constants from the :mod:`token` module are also exported from
58:mod:`tokenize`, as are two additional token type values that might be passed to
59the *tokeneater* function by :func:`tokenize`:
60
61
62.. data:: COMMENT
63
64 Token value used to indicate a comment.
65
66
67.. data:: NL
68
69 Token value used to indicate a non-terminating newline. The NEWLINE token
70 indicates the end of a logical line of Python code; NL tokens are generated when
71 a logical line of code is continued over multiple physical lines.
72
73Another function is provided to reverse the tokenization process. This is useful
74for creating tools that tokenize a script, modify the token stream, and write
75back the modified script.
76
77
78.. function:: untokenize(iterable)
79
80 Converts tokens back into Python source code. The *iterable* must return
81 sequences with at least two elements, the token type and the token string. Any
82 additional sequence elements are ignored.
83
84 The reconstructed script is returned as a single string. The result is
85 guaranteed to tokenize back to match the input so that the conversion is
86 lossless and round-trips are assured. The guarantee applies only to the token
87 type and token string as the spacing between tokens (column positions) may
88 change.
89
90 .. versionadded:: 2.5
91
92Example of a script re-writer that transforms float literals into Decimal
93objects::
94
95 def decistmt(s):
96 """Substitute Decimals for floats in a string of statements.
97
98 >>> from decimal import Decimal
99 >>> s = 'print +21.3e-5*-.1234/81.7'
100 >>> decistmt(s)
101 "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"
102
103 >>> exec(s)
104 -3.21716034272e-007
105 >>> exec(decistmt(s))
106 -3.217160342717258261933904529E-7
107
108 """
109 result = []
110 g = generate_tokens(StringIO(s).readline) # tokenize the string
111 for toknum, tokval, _, _, _ in g:
112 if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
113 result.extend([
114 (NAME, 'Decimal'),
115 (OP, '('),
116 (STRING, repr(tokval)),
117 (OP, ')')
118 ])
119 else:
120 result.append((toknum, tokval))
121 return untokenize(result)
122