Blame - Doc/ref/ref2.tex - platform/external/python/cpython2

blob: 6650b99098453e1a18d724964204734f1e5d90dc [file] [log] [blame]

Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	1	\chapter{Lexical analysis\label{lexical}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	2
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	3	A Python program is read by a \emph{parser}. Input to the parser is a
				4	stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	5	chapter describes how the lexical analyzer breaks a file into tokens.
				6	\index{lexical analysis}
				7	\index{parser}
				8	\index{token}
				9
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	10	Python uses the 7-bit \ASCII{} character set for program text and string
				11	literals. 8-bit characters may be used in string literals and comments
				12	but their interpretation is platform dependent; the proper way to
				13	insert 8-bit characters in string literals is by using octal or
				14	hexadecimal escape sequences.
				15
				16	The run-time character set depends on the I/O devices connected to the
				17	program but is generally a superset of \ASCII{}.
				18
				19	\strong{Future compatibility note:} It may be tempting to assume that the
				20	character set for 8-bit characters is ISO Latin-1 (an \ASCII{}
				21	superset that covers most western languages that use the Latin
				22	alphabet), but it is possible that in the future Unicode text editors
				23	will become common. These generally use the UTF-8 encoding, which is
				24	also an \ASCII{} superset, but with very different use for the
				25	characters with ordinals 128-255. While there is no consensus on this
				26	subject yet, it is unwise to assume either Latin-1 or UTF-8, even
				27	though the current implementation appears to favor Latin-1. This
				28	applies both to the source character set and the run-time character
				29	set.
				30
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	31
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	32	\section{Line structure\label{line-structure}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	33
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	34	A Python program is divided into a number of \emph{logical lines}.
				35	\index{line structure}
				36
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	37
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	38	\subsection{Logical lines\label{logical}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	39
				40	The end of
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	41	a logical line is represented by the token NEWLINE. Statements cannot
				42	cross logical line boundaries except where NEWLINE is allowed by the
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	43	syntax (e.g., between statements in compound statements).
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	44	A logical line is constructed from one or more \emph{physical lines}
				45	by following the explicit or implicit \emph{line joining} rules.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	46	\index{logical line}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	47	\index{physical line}
				48	\index{line joining}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	49	\index{NEWLINE token}
				50
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	51
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	52	\subsection{Physical lines\label{physical}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	53
				54	A physical line ends in whatever the current platform's convention is
				55	for terminating lines. On \UNIX{}, this is the \ASCII{} LF (linefeed)
				56	character. On DOS/Windows, it is the \ASCII{} sequence CR LF (return
				57	followed by linefeed). On Macintosh, it is the \ASCII{} CR (return)
				58	character.
				59
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	60
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	61	\subsection{Comments\label{comments}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	62
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	63	A comment starts with a hash character (\code{\#}) that is not part of
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	64	a string literal, and ends at the end of the physical line. A comment
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	65	signifies the end of the logical line unless the implicit line joining
				66	rules are invoked.
				67	Comments are ignored by the syntax; they are not tokens.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	68	\index{comment}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	69	\index{hash character}
				70
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	71
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	72	\subsection{Explicit line joining\label{explicit-joining}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	73
				74	Two or more physical lines may be joined into logical lines using
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	75	backslash characters (\code{\e}), as follows: when a physical line ends
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	76	in a backslash that is not part of a string literal or comment, it is
				77	joined with the following forming a single logical line, deleting the
				78	backslash and the following end-of-line character. For example:
				79	\index{physical line}
				80	\index{line joining}
				81	\index{line continuation}
				82	\index{backslash character}
				83	%
				84	\begin{verbatim}
				85	if 1900 < year < 2100 and 1 <= month <= 12 \
				86	and 1 <= day <= 31 and 0 <= hour < 24 \
				87	and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
				88	return 1
				89	\end{verbatim}
				90
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	91	A line ending in a backslash cannot carry a comment. A backslash does
				92	not continue a comment. A backslash does not continue a token except
				93	for string literals (i.e., tokens other than string literals cannot be
				94	split across physical lines using a backslash). A backslash is
				95	illegal elsewhere on a line outside a string literal.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	96
Fred Drake	c411fa6	1999-02-22 14:32:18 +0000	[diff] [blame]	97
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	98	\subsection{Implicit line joining\label{implicit-joining}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	99
				100	Expressions in parentheses, square brackets or curly braces can be
				101	split over more than one physical line without using backslashes.
				102	For example:
				103
				104	\begin{verbatim}
				105	month_names = ['Januari', 'Februari', 'Maart', # These are the
				106	'April', 'Mei', 'Juni', # Dutch names
				107	'Juli', 'Augustus', 'September', # for the months
				108	'Oktober', 'November', 'December'] # of the year
				109	\end{verbatim}
				110
				111	Implicitly continued lines can carry comments. The indentation of the
				112	continuation lines is not important. Blank continuation lines are
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	113	allowed. There is no NEWLINE token between implicit continuation
				114	lines. Implicitly continued lines can also occur within triple-quoted
				115	strings (see below); in that case they cannot carry comments.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	116
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	117
Fred Drake	c411fa6	1999-02-22 14:32:18 +0000	[diff] [blame]	118	\subsection{Blank lines \index{blank line}\label{blank-lines}}
				119
				120	A logical line that contains only spaces, tabs, formfeeds and possibly
				121	a comment, is ignored (i.e., no NEWLINE token is generated). During
				122	interactive input of statements, handling of a blank line may differ
				123	depending on the implementation of the read-eval-print loop. In the
				124	standard implementation, an entirely blank logical line (i.e.\ one
				125	containing not even whitespace or a comment) terminates a multi-line
				126	statement.
				127
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	128
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	129	\subsection{Indentation\label{indentation}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	130
				131	Leading whitespace (spaces and tabs) at the beginning of a logical
				132	line is used to compute the indentation level of the line, which in
				133	turn is used to determine the grouping of statements.
				134	\index{indentation}
				135	\index{whitespace}
				136	\index{leading whitespace}
				137	\index{space}
				138	\index{tab}
				139	\index{grouping}
				140	\index{statement grouping}
				141
				142	First, tabs are replaced (from left to right) by one to eight spaces
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	143	such that the total number of characters up to and including the
				144	replacement is a multiple of
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	145	eight (this is intended to be the same rule as used by \UNIX{}). The
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	146	total number of spaces preceding the first non-blank character then
				147	determines the line's indentation. Indentation cannot be split over
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	148	multiple physical lines using backslashes; the whitespace up to the
				149	first backslash determines the indentation.
				150
				151	\strong{Cross-platform compatibility note:} because of the nature of
				152	text editors on non-UNIX platforms, it is unwise to use a mixture of
				153	spaces and tabs for the indentation in a single source file.
				154
				155	A formfeed character may be present at the start of the line; it will
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	156	be ignored for the indentation calculations above. Formfeed
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	157	characters occurring elsewhere in the leading whitespace have an
				158	undefined effect (for instance, they may reset the space count to
				159	zero).
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	160
				161	The indentation levels of consecutive lines are used to generate
				162	INDENT and DEDENT tokens, using a stack, as follows.
				163	\index{INDENT token}
				164	\index{DEDENT token}
				165
				166	Before the first line of the file is read, a single zero is pushed on
				167	the stack; this will never be popped off again. The numbers pushed on
				168	the stack will always be strictly increasing from bottom to top. At
				169	the beginning of each logical line, the line's indentation level is
				170	compared to the top of the stack. If it is equal, nothing happens.
				171	If it is larger, it is pushed on the stack, and one INDENT token is
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	172	generated. If it is smaller, it \emph{must} be one of the numbers
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	173	occurring on the stack; all numbers on the stack that are larger are
				174	popped off, and for each number popped off a DEDENT token is
				175	generated. At the end of the file, a DEDENT token is generated for
				176	each number remaining on the stack that is larger than zero.
				177
				178	Here is an example of a correctly (though confusingly) indented piece
				179	of Python code:
				180
				181	\begin{verbatim}
				182	def perm(l):
				183	# Compute the list of all permutations of l
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	184	if len(l) <= 1:
				185	return [l]
				186	r = []
				187	for i in range(len(l)):
				188	s = l[:i] + l[i+1:]
				189	p = perm(s)
				190	for x in p:
				191	r.append(l[i:i+1] + x)
				192	return r
				193	\end{verbatim}
				194
				195	The following example shows various indentation errors:
				196
				197	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	198	def perm(l): # error: first line indented
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	199	for i in range(len(l)): # error: not indented
				200	s = l[:i] + l[i+1:]
				201	p = perm(l[:i] + l[i+1:]) # error: unexpected indent
				202	for x in p:
				203	r.append(l[i:i+1] + x)
				204	return r # error: inconsistent dedent
				205	\end{verbatim}
				206
				207	(Actually, the first three errors are detected by the parser; only the
				208	last error is found by the lexical analyzer --- the indentation of
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	209	\code{return r} does not match a level popped off the stack.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	210
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	211
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	212	\subsection{Whitespace between tokens\label{whitespace}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	213
				214	Except at the beginning of a logical line or in string literals, the
				215	whitespace characters space, tab and formfeed can be used
				216	interchangeably to separate tokens. Whitespace is needed between two
				217	tokens only if their concatenation could otherwise be interpreted as a
				218	different token (e.g., ab is one token, but a b is two tokens).
				219
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	220
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	221	\section{Other tokens\label{other-tokens}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	222
				223	Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	224	exist: \emph{identifiers}, \emph{keywords}, \emph{literals},
				225	\emph{operators}, and \emph{delimiters}.
				226	Whitespace characters (other than line terminators, discussed earlier)
				227	are not tokens, but serve to delimit tokens.
				228	Where
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	229	ambiguity exists, a token comprises the longest possible string that
				230	forms a legal token, when read from left to right.
				231
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	232
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	233	\section{Identifiers and keywords\label{identifiers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	234
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	235	Identifiers (also referred to as \emph{names}) are described by the following
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	236	lexical definitions:
				237	\index{identifier}
				238	\index{name}
				239
				240	\begin{verbatim}
				241	identifier: (letter\|"_") (letter\|digit\|"_")*
				242	letter: lowercase \| uppercase
				243	lowercase: "a"..."z"
				244	uppercase: "A"..."Z"
				245	digit: "0"..."9"
				246	\end{verbatim}
				247
				248	Identifiers are unlimited in length. Case is significant.
				249
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	250
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	251	\subsection{Keywords\label{keywords}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	252
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	253	The following identifiers are used as reserved words, or
				254	\emph{keywords} of the language, and cannot be used as ordinary
				255	identifiers. They must be spelled exactly as written here:%
				256	\index{keyword}%
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	257	\index{reserved word}
				258
				259	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	260	and del for is raise
				261	assert elif from lambda return
				262	break else global not try
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	263	class except if or yeild
				264	continue exec import pass while
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	265	def finally in print
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	266	\end{verbatim}
				267
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	268	% When adding keywords, use reswords.py for reformatting
				269
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	270
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	271	\subsection{Reserved classes of identifiers\label{id-classes}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	272
				273	Certain classes of identifiers (besides keywords) have special
				274	meanings. These are:
				275
Fred Drake	39fc1bc	1999-03-05 18:30:21 +0000	[diff] [blame]	276	\begin{tableiii}{l\|l\|l}{code}{Form}{Meaning}{Notes}
				277	\lineiii{_}{Not imported by \samp{from \var{module} import }}{(1)}
				278	\lineiii{__*__}{System-defined name}{}
				279	\lineiii{__*}{Class-private name mangling}{}
				280	\end{tableiii}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	281
				282	(XXX need section references here.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	283
Fred Drake	39fc1bc	1999-03-05 18:30:21 +0000	[diff] [blame]	284	Note:
				285
				286	\begin{description}
				287	\item[(1)] The special identifier \samp{_} is used in the interactive
				288	interpreter to store the result of the last evaluation; it is stored
				289	in the \module{__builtin__} module. When not in interactive mode,
				290	\samp{_} has no special meaning and is not defined.
				291	\end{description}
				292
				293
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	294	\section{Literals\label{literals}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	295
				296	Literals are notations for constant values of some built-in types.
				297	\index{literal}
				298	\index{constant}
				299
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	300
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	301	\subsection{String literals\label{strings}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	302
				303	String literals are described by the following lexical definitions:
				304	\index{string literal}
				305
				306	\begin{verbatim}
				307	stringliteral: shortstring \| longstring
				308	shortstring: "'" shortstringitem* "'" \| '"' shortstringitem* '"'
				309	longstring: "'''" longstringitem* "'''" \| '"""' longstringitem* '"""'
				310	shortstringitem: shortstringchar \| escapeseq
				311	longstringitem: longstringchar \| escapeseq
				312	shortstringchar: <any ASCII character except "\" or newline or the quote>
				313	longstringchar: <any ASCII character except "\">
				314	escapeseq: "\" <any ASCII character>
				315	\end{verbatim}
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	316	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	317
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	318	\index{triple-quoted string}
				319	\index{Unicode Consortium}
				320	\index{string!Unicode}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	321	In plain English: String literals can be enclosed in matching single
				322	quotes (\code{'}) or double quotes (\code{"}). They can also be
				323	enclosed in matching groups of three single or double quotes (these
				324	are generally referred to as \emph{triple-quoted strings}). The
				325	backslash (\code{\e}) character is used to escape characters that
				326	otherwise have a special meaning, such as newline, backslash itself,
				327	or the quote character. String literals may optionally be prefixed
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	328	with a letter `r' or `R'; such strings are called
				329	\dfn{raw strings}\index{raw string} and use different rules for
				330	backslash escape sequences. A prefix of 'u' or 'U' makes the string
				331	a Unicode string. Unicode strings use the Unicode character set as
				332	defined by the Unicode Consortium and ISO~10646. Some additional
				333	escape sequences, described below, are available in Unicode strings.
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	334
				335	In triple-quoted strings,
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	336	unescaped newlines and quotes are allowed (and are retained), except
				337	that three unescaped quotes in a row terminate the string. (A
				338	``quote'' is the character used to open the string, i.e. either
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	339	\code{'} or \code{"}.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	340
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	341	Unless an `r' or `R' prefix is present, escape sequences in strings
				342	are interpreted according to rules similar
				343	to those used by Standard \C{}. The recognized escape sequences are:
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	344	\index{physical line}
				345	\index{escape sequence}
				346	\index{Standard C}
				347	\index{C}
				348
Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	349	\begin{tableii}{l\|l}{code}{Escape Sequence}{Meaning}
				350	\lineii{\e\var{newline}} {Ignored}
				351	\lineii{\e\e} {Backslash (\code{\e})}
				352	\lineii{\e'} {Single quote (\code{'})}
				353	\lineii{\e"} {Double quote (\code{"})}
				354	\lineii{\e a} {\ASCII{} Bell (BEL)}
				355	\lineii{\e b} {\ASCII{} Backspace (BS)}
				356	\lineii{\e f} {\ASCII{} Formfeed (FF)}
				357	\lineii{\e n} {\ASCII{} Linefeed (LF)}
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	358	\lineii{\e N\{\var{name}\}}
				359	{Character named \var{name} in the Unicode database (Unicode only)}
Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	360	\lineii{\e r} {\ASCII{} Carriage Return (CR)}
				361	\lineii{\e t} {\ASCII{} Horizontal Tab (TAB)}
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	362	\lineii{\e u\var{xxxx}}
				363	{Character with 16-bit hex value \var{xxxx} (Unicode only)}
				364	\lineii{\e U\var{xxxxxxxx}}
				365	{Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}
Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	366	\lineii{\e v} {\ASCII{} Vertical Tab (VT)}
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	367	\lineii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}}
				368	\lineii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}}
Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	369	\end{tableii}
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	370	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	371
Tim Peters	7530208	2001-02-14 04:03:51 +0000	[diff] [blame]	372	As in Standard C, up to three octal digits are accepted. However,
				373	exactly two hex digits are taken in hex escapes.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	374
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	375	Unlike Standard \index{unrecognized escape sequence}C,
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	376	all unrecognized escape sequences are left in the string unchanged,
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	377	i.e., \emph{the backslash is left in the string}. (This behavior is
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	378	useful when debugging: if an escape sequence is mistyped, the
Fred Drake	dea764d	2000-12-19 04:52:03 +0000	[diff] [blame]	379	resulting output is more easily recognized as broken.) It is also
				380	important to note that the escape sequences marked as ``(Unicode
				381	only)'' in the table above fall into the category of unrecognized
				382	escapes for non-Unicode string literals.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	383
Fred Drake	347a625	2001-01-09 21:38:16 +0000	[diff] [blame]	384	When an `r' or `R' prefix is present, a character following a
				385	backslash is included in the string without change, and \emph{all
				386	backslashes are left in the string}. For example, the string literal
				387	\code{r"\e n"} consists of two characters: a backslash and a lowercase
				388	`n'. String quotes can be escaped with a backslash, but the backslash
				389	remains in the string; for example, \code{r"\e""} is a valid string
				390	literal consisting of two characters: a backslash and a double quote;
				391	\code{r"\e"} is not a value string literal (even a raw string cannot
				392	end in an odd number of backslashes). Specifically, \emph{a raw
				393	string cannot end in a single backslash} (since the backslash would
				394	escape the following quote character). Note also that a single
				395	backslash followed by a newline is interpreted as those two characters
				396	as part of the string, \emph{not} as a line continuation.
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	397
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	398
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	399	\subsection{String literal concatenation\label{string-catenation}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	400
				401	Multiple adjacent string literals (delimited by whitespace), possibly
				402	using different quoting conventions, are allowed, and their meaning is
				403	the same as their concatenation. Thus, \code{"hello" 'world'} is
				404	equivalent to \code{"helloworld"}. This feature can be used to reduce
				405	the number of backslashes needed, to split long strings conveniently
				406	across long lines, or even to add comments to parts of strings, for
				407	example:
				408
				409	\begin{verbatim}
				410	re.compile("[A-Za-z_]" # letter or underscore
				411	"[A-Za-z0-9_]*" # letter, digit or underscore
				412	)
				413	\end{verbatim}
				414
				415	Note that this feature is defined at the syntactical level, but
				416	implemented at compile time. The `+' operator must be used to
				417	concatenate string expressions at run time. Also note that literal
				418	concatenation can use different quoting styles for each component
				419	(even mixing raw strings and triple quoted strings).
				420
Fred Drake	2ed27d3	2000-11-17 19:05:12 +0000	[diff] [blame]	421
				422	\subsection{Unicode literals \label{unicode}}
				423
				424	XXX explain more here...
				425
				426
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	427	\subsection{Numeric literals\label{numbers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	428
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	429	There are four types of numeric literals: plain integers, long
				430	integers, floating point numbers, and imaginary numbers. There are no
				431	complex literals (complex numbers can be formed by adding a real
				432	number and an imaginary number).
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	433	\index{number}
				434	\index{numeric literal}
				435	\index{integer literal}
				436	\index{plain integer literal}
				437	\index{long integer literal}
				438	\index{floating point literal}
				439	\index{hexadecimal literal}
				440	\index{octal literal}
				441	\index{decimal literal}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	442	\index{imaginary literal}
				443	\index{complex literal}
				444
				445	Note that numeric literals do not include a sign; a phrase like
				446	\code{-1} is actually an expression composed of the unary operator
				447	`\code{-}' and the literal \code{1}.
				448
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	449
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	450	\subsection{Integer and long integer literals\label{integers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	451
				452	Integer and long integer literals are described by the following
				453	lexical definitions:
				454
				455	\begin{verbatim}
				456	longinteger: integer ("l"\|"L")
				457	integer: decimalinteger \| octinteger \| hexinteger
				458	decimalinteger: nonzerodigit digit* \| "0"
				459	octinteger: "0" octdigit+
				460	hexinteger: "0" ("x"\|"X") hexdigit+
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	461	nonzerodigit: "1"..."9"
				462	octdigit: "0"..."7"
				463	hexdigit: digit\|"a"..."f"\|"A"..."F"
				464	\end{verbatim}
				465
				466	Although both lower case `l' and upper case `L' are allowed as suffix
				467	for long integers, it is strongly recommended to always use `L', since
				468	the letter `l' looks too much like the digit `1'.
				469
				470	Plain integer decimal literals must be at most 2147483647 (i.e., the
				471	largest positive integer, using 32-bit arithmetic). Plain octal and
				472	hexadecimal literals may be as large as 4294967295, but values larger
				473	than 2147483647 are converted to a negative value by subtracting
				474	4294967296. There is no limit for long integer literals apart from
				475	what can be stored in available memory.
				476
				477	Some examples of plain and long integer literals:
				478
				479	\begin{verbatim}
				480	7 2147483647 0177 0x80000000
				481	3L 79228162514264337593543950336L 0377L 0x100000000L
				482	\end{verbatim}
				483
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	484
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	485	\subsection{Floating point literals\label{floating}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	486
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	487	Floating point literals are described by the following lexical
				488	definitions:
				489
				490	\begin{verbatim}
				491	floatnumber: pointfloat \| exponentfloat
				492	pointfloat: [intpart] fraction \| intpart "."
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	493	exponentfloat: (nonzerodigit digit* \| pointfloat) exponent
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	494	intpart: nonzerodigit digit* \| "0"
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	495	fraction: "." digit+
				496	exponent: ("e"\|"E") ["+"\|"-"] digit+
				497	\end{verbatim}
				498
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	499	Note that the integer part of a floating point number cannot look like
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	500	an octal integer, though the exponent may look like an octal literal
				501	but will always be interpreted using radix 10. For example,
				502	\samp{1e010} is legal, while \samp{07.1} is a syntax error.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	503	The allowed range of floating point literals is
				504	implementation-dependent.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	505	Some examples of floating point literals:
				506
				507	\begin{verbatim}
				508	3.14 10. .001 1e100 3.14e-10
				509	\end{verbatim}
				510
				511	Note that numeric literals do not include a sign; a phrase like
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	512	\code{-1} is actually an expression composed of the operator
				513	\code{-} and the literal \code{1}.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	514
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	515
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	516	\subsection{Imaginary literals\label{imaginary}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	517
				518	Imaginary literals are described by the following lexical definitions:
				519
				520	\begin{verbatim}
				521	imagnumber: (floatnumber \| intpart) ("j"\|"J")
				522	\end{verbatim}
				523
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	524	An imaginary literal yields a complex number with a real part of
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	525	0.0. Complex numbers are represented as a pair of floating point
				526	numbers and have the same restrictions on their range. To create a
				527	complex number with a nonzero real part, add a floating point number
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	528	to it, e.g., \code{(3+4j)}. Some examples of imaginary literals:
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	529
				530	\begin{verbatim}
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	531	3.14j 10.j 10j .001j 1e100j 3.14e-10j
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	532	\end{verbatim}
				533
				534
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	535	\section{Operators\label{operators}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	536
				537	The following tokens are operators:
				538	\index{operators}
				539
				540	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	541	+ - * ** / %
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	542	<< >> & \| ^ ~
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	543	< > <= >= == != <>
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	544	\end{verbatim}
				545
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	546	The comparison operators \code{<>} and \code{!=} are alternate
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	547	spellings of the same operator. \code{!=} is the preferred spelling;
				548	\code{<>} is obsolescent.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	549
Fred Drake	f5eae66	2001-06-23 05:26:52 +0000	[diff] [blame^]	550
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	551	\section{Delimiters\label{delimiters}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	552
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	553	The following tokens serve as delimiters in the grammar:
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	554	\index{delimiters}
				555
				556	\begin{verbatim}
				557	( ) [ ] { }
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	558	, : . ` = ;
Thomas Wouters	12bba85	2000-08-24 20:06:04 +0000	[diff] [blame]	559	+= -= = /= %= *=
				560	&= \|= ^= >>= <<=
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	561	\end{verbatim}
				562
				563	The period can also occur in floating-point and imaginary literals. A
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	564	sequence of three periods has a special meaning as an ellipsis in slices.
Thomas Wouters	12bba85	2000-08-24 20:06:04 +0000	[diff] [blame]	565	The second half of the list, the augmented assignment operators, serve
				566	lexically as delimiters, but also perform an operation.
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	567
				568	The following printing ASCII characters have special meaning as part
				569	of other tokens or are otherwise significant to the lexical analyzer:
				570
				571	\begin{verbatim}
				572	' " # \
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	573	\end{verbatim}
				574
				575	The following printing \ASCII{} characters are not used in Python. Their
				576	occurrence outside string literals and comments is an unconditional
				577	error:
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	578	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	579
				580	\begin{verbatim}
				581	@ $ ?
				582	\end{verbatim}