Blame - Doc/library/tokenize.rst - platform/external/python/cpython2

blob: 70750352815706789f8e940f8a2694b572e93f96 [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1	:mod:`tokenize` --- Tokenizer for Python source
				2	===============================================
				3
				4	.. module:: tokenize
				5	:synopsis: Lexical scanner for Python source code.
				6	.. moduleauthor:: Ka Ping Yee
				7	.. sectionauthor:: Fred L. Drake, Jr. <fdrake@acm.org>
				8
Éric Araujo	29a0b57	2011-08-19 02:14:03 +0200	[diff] [blame]	9	Source code: :source:`Lib/tokenize.py`
				10
				11	--------------
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	12
				13	The :mod:`tokenize` module provides a lexical scanner for Python source code,
				14	implemented in Python. The scanner in this module returns comments as tokens as
				15	well, making it useful for implementing "pretty-printers," including colorizers
				16	for on-screen displays.
				17
Meador Inge	da747c3	2012-01-19 00:17:44 -0600	[diff] [blame^]	18	To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`
				19	tokens are returned using the generic :data:`token.OP` token type. The exact
				20	type can be determined by checking the token ``string`` field on the
				21	:term:`named tuple` returned from :func:`tokenize.tokenize` for the character
				22	sequence that identifies a specific operator token.
				23
Georg Brandl	cf3fb25	2007-10-21 10:52:38 +0000	[diff] [blame]	24	The primary entry point is a :term:`generator`:
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	25
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	26	.. function:: generate_tokens(readline)
				27
Georg Brandl	ebd662d	2008-06-08 08:54:40 +0000	[diff] [blame]	28	The :func:`generate_tokens` generator requires one argument, readline,
				29	which must be a callable object which provides the same interface as the
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	30	:meth:`readline` method of built-in file objects (see section
Georg Brandl	ebd662d	2008-06-08 08:54:40 +0000	[diff] [blame]	31	:ref:`bltin-file-objects`). Each call to the function should return one line
				32	of input as a string.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	33
				34	The generator produces 5-tuples with these members: the token type; the token
Georg Brandl	ebd662d	2008-06-08 08:54:40 +0000	[diff] [blame]	35	string; a 2-tuple ``(srow, scol)`` of ints specifying the row and column
				36	where the token begins in the source; a 2-tuple ``(erow, ecol)`` of ints
				37	specifying the row and column where the token ends in the source; and the
Georg Brandl	3219df1	2008-06-08 08:59:38 +0000	[diff] [blame]	38	line on which the token was found. The line passed (the last tuple item) is
				39	the logical line; continuation lines are included.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	40
				41	.. versionadded:: 2.2
				42
				43	An older entry point is retained for backward compatibility:
				44
				45
				46	.. function:: tokenize(readline[, tokeneater])
				47
				48	The :func:`tokenize` function accepts two parameters: one representing the input
				49	stream, and one providing an output mechanism for :func:`tokenize`.
				50
				51	The first parameter, readline, must be a callable object which provides the
				52	same interface as the :meth:`readline` method of built-in file objects (see
				53	section :ref:`bltin-file-objects`). Each call to the function should return one
				54	line of input as a string. Alternately, readline may be a callable object that
				55	signals completion by raising :exc:`StopIteration`.
				56
				57	.. versionchanged:: 2.5
				58	Added :exc:`StopIteration` support.
				59
				60	The second parameter, tokeneater, must also be a callable object. It is
				61	called once for each token, with five arguments, corresponding to the tuples
				62	generated by :func:`generate_tokens`.
				63
				64	All constants from the :mod:`token` module are also exported from
				65	:mod:`tokenize`, as are two additional token type values that might be passed to
				66	the tokeneater function by :func:`tokenize`:
				67
				68
				69	.. data:: COMMENT
				70
				71	Token value used to indicate a comment.
				72
				73
				74	.. data:: NL
				75
				76	Token value used to indicate a non-terminating newline. The NEWLINE token
				77	indicates the end of a logical line of Python code; NL tokens are generated when
				78	a logical line of code is continued over multiple physical lines.
				79
				80	Another function is provided to reverse the tokenization process. This is useful
				81	for creating tools that tokenize a script, modify the token stream, and write
				82	back the modified script.
				83
				84
				85	.. function:: untokenize(iterable)
				86
				87	Converts tokens back into Python source code. The iterable must return
				88	sequences with at least two elements, the token type and the token string. Any
				89	additional sequence elements are ignored.
				90
				91	The reconstructed script is returned as a single string. The result is
				92	guaranteed to tokenize back to match the input so that the conversion is
				93	lossless and round-trips are assured. The guarantee applies only to the token
				94	type and token string as the spacing between tokens (column positions) may
				95	change.
				96
				97	.. versionadded:: 2.5
				98
				99	Example of a script re-writer that transforms float literals into Decimal
				100	objects::
				101
				102	def decistmt(s):
				103	"""Substitute Decimals for floats in a string of statements.
				104
				105	>>> from decimal import Decimal
				106	>>> s = 'print +21.3e-5*-.1234/81.7'
				107	>>> decistmt(s)
				108	"print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"
				109
				110	>>> exec(s)
				111	-3.21716034272e-007
				112	>>> exec(decistmt(s))
				113	-3.217160342717258261933904529E-7
				114
				115	"""
				116	result = []
				117	g = generate_tokens(StringIO(s).readline) # tokenize the string
				118	for toknum, tokval, _, _, _ in g:
				119	if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
				120	result.extend([
				121	(NAME, 'Decimal'),
				122	(OP, '('),
				123	(STRING, repr(tokval)),
				124	(OP, ')')
				125	])
				126	else:
				127	result.append((toknum, tokval))
				128	return untokenize(result)
				129