Blame - Doc/ref/ref2.tex - platform/external/python/cpython3

blob: ba4c684799a76cb49067b9baec7fcc235a8a4032 [file] [log] [blame]

Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	1	\chapter{Lexical analysis\label{lexical}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	2
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	3	A Python program is read by a \emph{parser}. Input to the parser is a
				4	stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	5	chapter describes how the lexical analyzer breaks a file into tokens.
				6	\index{lexical analysis}
				7	\index{parser}
				8	\index{token}
				9
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	10	Python uses the 7-bit \ASCII{} character set for program text and string
				11	literals. 8-bit characters may be used in string literals and comments
				12	but their interpretation is platform dependent; the proper way to
				13	insert 8-bit characters in string literals is by using octal or
				14	hexadecimal escape sequences.
				15
				16	The run-time character set depends on the I/O devices connected to the
				17	program but is generally a superset of \ASCII{}.
				18
				19	\strong{Future compatibility note:} It may be tempting to assume that the
				20	character set for 8-bit characters is ISO Latin-1 (an \ASCII{}
				21	superset that covers most western languages that use the Latin
				22	alphabet), but it is possible that in the future Unicode text editors
				23	will become common. These generally use the UTF-8 encoding, which is
				24	also an \ASCII{} superset, but with very different use for the
				25	characters with ordinals 128-255. While there is no consensus on this
				26	subject yet, it is unwise to assume either Latin-1 or UTF-8, even
				27	though the current implementation appears to favor Latin-1. This
				28	applies both to the source character set and the run-time character
				29	set.
				30
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	31	\section{Line structure\label{line-structure}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	32
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	33	A Python program is divided into a number of \emph{logical lines}.
				34	\index{line structure}
				35
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	36	\subsection{Logical lines\label{logical}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	37
				38	The end of
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	39	a logical line is represented by the token NEWLINE. Statements cannot
				40	cross logical line boundaries except where NEWLINE is allowed by the
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	41	syntax (e.g., between statements in compound statements).
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	42	A logical line is constructed from one or more \emph{physical lines}
				43	by following the explicit or implicit \emph{line joining} rules.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	44	\index{logical line}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	45	\index{physical line}
				46	\index{line joining}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	47	\index{NEWLINE token}
				48
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	49	\subsection{Physical lines\label{physical}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	50
				51	A physical line ends in whatever the current platform's convention is
				52	for terminating lines. On \UNIX{}, this is the \ASCII{} LF (linefeed)
				53	character. On DOS/Windows, it is the \ASCII{} sequence CR LF (return
				54	followed by linefeed). On Macintosh, it is the \ASCII{} CR (return)
				55	character.
				56
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	57	\subsection{Comments\label{comments}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	58
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	59	A comment starts with a hash character (\code{\#}) that is not part of
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	60	a string literal, and ends at the end of the physical line. A comment
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	61	signifies the end of the logical line unless the implicit line joining
				62	rules are invoked.
				63	Comments are ignored by the syntax; they are not tokens.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	64	\index{comment}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	65	\index{hash character}
				66
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	67	\subsection{Explicit line joining\label{explicit-joining}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	68
				69	Two or more physical lines may be joined into logical lines using
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	70	backslash characters (\code{\e}), as follows: when a physical line ends
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	71	in a backslash that is not part of a string literal or comment, it is
				72	joined with the following forming a single logical line, deleting the
				73	backslash and the following end-of-line character. For example:
				74	\index{physical line}
				75	\index{line joining}
				76	\index{line continuation}
				77	\index{backslash character}
				78	%
				79	\begin{verbatim}
				80	if 1900 < year < 2100 and 1 <= month <= 12 \
				81	and 1 <= day <= 31 and 0 <= hour < 24 \
				82	and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
				83	return 1
				84	\end{verbatim}
				85
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	86	A line ending in a backslash cannot carry a comment. A backslash does
				87	not continue a comment. A backslash does not continue a token except
				88	for string literals (i.e., tokens other than string literals cannot be
				89	split across physical lines using a backslash). A backslash is
				90	illegal elsewhere on a line outside a string literal.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	91
Fred Drake	c411fa6	1999-02-22 14:32:18 +0000	[diff] [blame]	92
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	93	\subsection{Implicit line joining\label{implicit-joining}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	94
				95	Expressions in parentheses, square brackets or curly braces can be
				96	split over more than one physical line without using backslashes.
				97	For example:
				98
				99	\begin{verbatim}
				100	month_names = ['Januari', 'Februari', 'Maart', # These are the
				101	'April', 'Mei', 'Juni', # Dutch names
				102	'Juli', 'Augustus', 'September', # for the months
				103	'Oktober', 'November', 'December'] # of the year
				104	\end{verbatim}
				105
				106	Implicitly continued lines can carry comments. The indentation of the
				107	continuation lines is not important. Blank continuation lines are
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	108	allowed. There is no NEWLINE token between implicit continuation
				109	lines. Implicitly continued lines can also occur within triple-quoted
				110	strings (see below); in that case they cannot carry comments.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	111
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	112
Fred Drake	c411fa6	1999-02-22 14:32:18 +0000	[diff] [blame]	113	\subsection{Blank lines \index{blank line}\label{blank-lines}}
				114
				115	A logical line that contains only spaces, tabs, formfeeds and possibly
				116	a comment, is ignored (i.e., no NEWLINE token is generated). During
				117	interactive input of statements, handling of a blank line may differ
				118	depending on the implementation of the read-eval-print loop. In the
				119	standard implementation, an entirely blank logical line (i.e.\ one
				120	containing not even whitespace or a comment) terminates a multi-line
				121	statement.
				122
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	123
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	124	\subsection{Indentation\label{indentation}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	125
				126	Leading whitespace (spaces and tabs) at the beginning of a logical
				127	line is used to compute the indentation level of the line, which in
				128	turn is used to determine the grouping of statements.
				129	\index{indentation}
				130	\index{whitespace}
				131	\index{leading whitespace}
				132	\index{space}
				133	\index{tab}
				134	\index{grouping}
				135	\index{statement grouping}
				136
				137	First, tabs are replaced (from left to right) by one to eight spaces
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	138	such that the total number of characters up to and including the
				139	replacement is a multiple of
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	140	eight (this is intended to be the same rule as used by \UNIX{}). The
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	141	total number of spaces preceding the first non-blank character then
				142	determines the line's indentation. Indentation cannot be split over
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	143	multiple physical lines using backslashes; the whitespace up to the
				144	first backslash determines the indentation.
				145
				146	\strong{Cross-platform compatibility note:} because of the nature of
				147	text editors on non-UNIX platforms, it is unwise to use a mixture of
				148	spaces and tabs for the indentation in a single source file.
				149
				150	A formfeed character may be present at the start of the line; it will
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	151	be ignored for the indentation calculations above. Formfeed
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	152	characters occurring elsewhere in the leading whitespace have an
				153	undefined effect (for instance, they may reset the space count to
				154	zero).
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	155
				156	The indentation levels of consecutive lines are used to generate
				157	INDENT and DEDENT tokens, using a stack, as follows.
				158	\index{INDENT token}
				159	\index{DEDENT token}
				160
				161	Before the first line of the file is read, a single zero is pushed on
				162	the stack; this will never be popped off again. The numbers pushed on
				163	the stack will always be strictly increasing from bottom to top. At
				164	the beginning of each logical line, the line's indentation level is
				165	compared to the top of the stack. If it is equal, nothing happens.
				166	If it is larger, it is pushed on the stack, and one INDENT token is
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	167	generated. If it is smaller, it \emph{must} be one of the numbers
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	168	occurring on the stack; all numbers on the stack that are larger are
				169	popped off, and for each number popped off a DEDENT token is
				170	generated. At the end of the file, a DEDENT token is generated for
				171	each number remaining on the stack that is larger than zero.
				172
				173	Here is an example of a correctly (though confusingly) indented piece
				174	of Python code:
				175
				176	\begin{verbatim}
				177	def perm(l):
				178	# Compute the list of all permutations of l
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	179	if len(l) <= 1:
				180	return [l]
				181	r = []
				182	for i in range(len(l)):
				183	s = l[:i] + l[i+1:]
				184	p = perm(s)
				185	for x in p:
				186	r.append(l[i:i+1] + x)
				187	return r
				188	\end{verbatim}
				189
				190	The following example shows various indentation errors:
				191
				192	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	193	def perm(l): # error: first line indented
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	194	for i in range(len(l)): # error: not indented
				195	s = l[:i] + l[i+1:]
				196	p = perm(l[:i] + l[i+1:]) # error: unexpected indent
				197	for x in p:
				198	r.append(l[i:i+1] + x)
				199	return r # error: inconsistent dedent
				200	\end{verbatim}
				201
				202	(Actually, the first three errors are detected by the parser; only the
				203	last error is found by the lexical analyzer --- the indentation of
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	204	\code{return r} does not match a level popped off the stack.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	205
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	206	\subsection{Whitespace between tokens\label{whitespace}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	207
				208	Except at the beginning of a logical line or in string literals, the
				209	whitespace characters space, tab and formfeed can be used
				210	interchangeably to separate tokens. Whitespace is needed between two
				211	tokens only if their concatenation could otherwise be interpreted as a
				212	different token (e.g., ab is one token, but a b is two tokens).
				213
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	214	\section{Other tokens\label{other-tokens}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	215
				216	Besides NEWLINE, INDENT and DEDENT, the following categories of tokens
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	217	exist: \emph{identifiers}, \emph{keywords}, \emph{literals},
				218	\emph{operators}, and \emph{delimiters}.
				219	Whitespace characters (other than line terminators, discussed earlier)
				220	are not tokens, but serve to delimit tokens.
				221	Where
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	222	ambiguity exists, a token comprises the longest possible string that
				223	forms a legal token, when read from left to right.
				224
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	225	\section{Identifiers and keywords\label{identifiers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	226
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	227	Identifiers (also referred to as \emph{names}) are described by the following
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	228	lexical definitions:
				229	\index{identifier}
				230	\index{name}
				231
				232	\begin{verbatim}
				233	identifier: (letter\|"_") (letter\|digit\|"_")*
				234	letter: lowercase \| uppercase
				235	lowercase: "a"..."z"
				236	uppercase: "A"..."Z"
				237	digit: "0"..."9"
				238	\end{verbatim}
				239
				240	Identifiers are unlimited in length. Case is significant.
				241
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	242	\subsection{Keywords\label{keywords}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	243
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	244	The following identifiers are used as reserved words, or
				245	\emph{keywords} of the language, and cannot be used as ordinary
				246	identifiers. They must be spelled exactly as written here:%
				247	\index{keyword}%
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	248	\index{reserved word}
				249
				250	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	251	and del for is raise
				252	assert elif from lambda return
				253	break else global not try
				254	class except if or while
				255	continue exec import pass
				256	def finally in print
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	257	\end{verbatim}
				258
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	259	% When adding keywords, use reswords.py for reformatting
				260
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	261	\subsection{Reserved classes of identifiers\label{id-classes}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	262
				263	Certain classes of identifiers (besides keywords) have special
				264	meanings. These are:
				265
Fred Drake	39fc1bc	1999-03-05 18:30:21 +0000	[diff] [blame]	266	\begin{tableiii}{l\|l\|l}{code}{Form}{Meaning}{Notes}
				267	\lineiii{_}{Not imported by \samp{from \var{module} import }}{(1)}
				268	\lineiii{__*__}{System-defined name}{}
				269	\lineiii{__*}{Class-private name mangling}{}
				270	\end{tableiii}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	271
				272	(XXX need section references here.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	273
Fred Drake	39fc1bc	1999-03-05 18:30:21 +0000	[diff] [blame]	274	Note:
				275
				276	\begin{description}
				277	\item[(1)] The special identifier \samp{_} is used in the interactive
				278	interpreter to store the result of the last evaluation; it is stored
				279	in the \module{__builtin__} module. When not in interactive mode,
				280	\samp{_} has no special meaning and is not defined.
				281	\end{description}
				282
				283
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	284	\section{Literals\label{literals}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	285
				286	Literals are notations for constant values of some built-in types.
				287	\index{literal}
				288	\index{constant}
				289
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	290	\subsection{String literals\label{strings}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	291
				292	String literals are described by the following lexical definitions:
				293	\index{string literal}
				294
				295	\begin{verbatim}
				296	stringliteral: shortstring \| longstring
				297	shortstring: "'" shortstringitem* "'" \| '"' shortstringitem* '"'
				298	longstring: "'''" longstringitem* "'''" \| '"""' longstringitem* '"""'
				299	shortstringitem: shortstringchar \| escapeseq
				300	longstringitem: longstringchar \| escapeseq
				301	shortstringchar: <any ASCII character except "\" or newline or the quote>
				302	longstringchar: <any ASCII character except "\">
				303	escapeseq: "\" <any ASCII character>
				304	\end{verbatim}
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	305	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	306
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	307	In plain English: String literals can be enclosed in matching single
				308	quotes (\code{'}) or double quotes (\code{"}). They can also be
				309	enclosed in matching groups of three single or double quotes (these
				310	are generally referred to as \emph{triple-quoted strings}). The
				311	backslash (\code{\e}) character is used to escape characters that
				312	otherwise have a special meaning, such as newline, backslash itself,
				313	or the quote character. String literals may optionally be prefixed
				314	with a letter `r' or `R'; such strings are called raw strings and use
				315	different rules for backslash escape sequences.
				316	\index{triple-quoted string}
				317	\index{raw string}
				318
				319	In triple-quoted strings,
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	320	unescaped newlines and quotes are allowed (and are retained), except
				321	that three unescaped quotes in a row terminate the string. (A
				322	``quote'' is the character used to open the string, i.e. either
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	323	\code{'} or \code{"}.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	324
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	325	Unless an `r' or `R' prefix is present, escape sequences in strings
				326	are interpreted according to rules similar
				327	to those used by Standard \C{}. The recognized escape sequences are:
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	328	\index{physical line}
				329	\index{escape sequence}
				330	\index{Standard C}
				331	\index{C}
				332
Fred Drake	a1cce71	1998-07-24 22:12:32 +0000	[diff] [blame]	333	\begin{tableii}{l\|l}{code}{Escape Sequence}{Meaning}
				334	\lineii{\e\var{newline}} {Ignored}
				335	\lineii{\e\e} {Backslash (\code{\e})}
				336	\lineii{\e'} {Single quote (\code{'})}
				337	\lineii{\e"} {Double quote (\code{"})}
				338	\lineii{\e a} {\ASCII{} Bell (BEL)}
				339	\lineii{\e b} {\ASCII{} Backspace (BS)}
				340	\lineii{\e f} {\ASCII{} Formfeed (FF)}
				341	\lineii{\e n} {\ASCII{} Linefeed (LF)}
				342	\lineii{\e r} {\ASCII{} Carriage Return (CR)}
				343	\lineii{\e t} {\ASCII{} Horizontal Tab (TAB)}
				344	\lineii{\e v} {\ASCII{} Vertical Tab (VT)}
				345	\lineii{\e\var{ooo}} {\ASCII{} character with octal value \emph{ooo}}
				346	\lineii{\e x\var{hh...}} {\ASCII{} character with hex value \emph{hh...}}
				347	\end{tableii}
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	348	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	349
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	350	In strict compatibility with Standard \C, up to three octal digits are
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	351	accepted, but an unlimited number of hex digits is taken to be part of
				352	the hex escape (and then the lower 8 bits of the resulting hex number
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	353	are used in 8-bit implementations).
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	354
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	355	Unlike Standard \C{},
				356	all unrecognized escape sequences are left in the string unchanged,
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	357	i.e., \emph{the backslash is left in the string.} (This behavior is
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	358	useful when debugging: if an escape sequence is mistyped, the
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	359	resulting output is more easily recognized as broken.)
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	360	\index{unrecognized escape sequence}
				361
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	362	When an `r' or `R' prefix is present, backslashes are still used to
				363	quote the following character, but \emph{all backslashes are left in
				364	the string}. For example, the string literal \code{r"\e n"} consists
				365	of two characters: a backslash and a lowercase `n'. String quotes can
				366	be escaped with a backslash, but the backslash remains in the string;
Fred Drake	c456d36	1998-10-01 20:41:57 +0000	[diff] [blame]	367	for example, \code{r"\e""} is a valid string literal consisting of two
				368	characters: a backslash and a double quote; \code{r"\e"} is not a value
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	369	string literal (even a raw string cannot end in an odd number of
				370	backslashes). Specifically, \emph{a raw string cannot end in a single
				371	backslash} (since the backslash would escape the following quote
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	372	character). Note also that a single backslash followed by a newline
				373	is interpreted as those two characters as part of the string,
				374	\emph{not} as a line continuation.
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	375
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	376	\subsection{String literal concatenation\label{string-catenation}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	377
				378	Multiple adjacent string literals (delimited by whitespace), possibly
				379	using different quoting conventions, are allowed, and their meaning is
				380	the same as their concatenation. Thus, \code{"hello" 'world'} is
				381	equivalent to \code{"helloworld"}. This feature can be used to reduce
				382	the number of backslashes needed, to split long strings conveniently
				383	across long lines, or even to add comments to parts of strings, for
				384	example:
				385
				386	\begin{verbatim}
				387	re.compile("[A-Za-z_]" # letter or underscore
				388	"[A-Za-z0-9_]*" # letter, digit or underscore
				389	)
				390	\end{verbatim}
				391
				392	Note that this feature is defined at the syntactical level, but
				393	implemented at compile time. The `+' operator must be used to
				394	concatenate string expressions at run time. Also note that literal
				395	concatenation can use different quoting styles for each component
				396	(even mixing raw strings and triple quoted strings).
				397
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	398	\subsection{Numeric literals\label{numbers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	399
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	400	There are four types of numeric literals: plain integers, long
				401	integers, floating point numbers, and imaginary numbers. There are no
				402	complex literals (complex numbers can be formed by adding a real
				403	number and an imaginary number).
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	404	\index{number}
				405	\index{numeric literal}
				406	\index{integer literal}
				407	\index{plain integer literal}
				408	\index{long integer literal}
				409	\index{floating point literal}
				410	\index{hexadecimal literal}
				411	\index{octal literal}
				412	\index{decimal literal}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	413	\index{imaginary literal}
				414	\index{complex literal}
				415
				416	Note that numeric literals do not include a sign; a phrase like
				417	\code{-1} is actually an expression composed of the unary operator
				418	`\code{-}' and the literal \code{1}.
				419
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	420	\subsection{Integer and long integer literals\label{integers}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	421
				422	Integer and long integer literals are described by the following
				423	lexical definitions:
				424
				425	\begin{verbatim}
				426	longinteger: integer ("l"\|"L")
				427	integer: decimalinteger \| octinteger \| hexinteger
				428	decimalinteger: nonzerodigit digit* \| "0"
				429	octinteger: "0" octdigit+
				430	hexinteger: "0" ("x"\|"X") hexdigit+
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	431	nonzerodigit: "1"..."9"
				432	octdigit: "0"..."7"
				433	hexdigit: digit\|"a"..."f"\|"A"..."F"
				434	\end{verbatim}
				435
				436	Although both lower case `l' and upper case `L' are allowed as suffix
				437	for long integers, it is strongly recommended to always use `L', since
				438	the letter `l' looks too much like the digit `1'.
				439
				440	Plain integer decimal literals must be at most 2147483647 (i.e., the
				441	largest positive integer, using 32-bit arithmetic). Plain octal and
				442	hexadecimal literals may be as large as 4294967295, but values larger
				443	than 2147483647 are converted to a negative value by subtracting
				444	4294967296. There is no limit for long integer literals apart from
				445	what can be stored in available memory.
				446
				447	Some examples of plain and long integer literals:
				448
				449	\begin{verbatim}
				450	7 2147483647 0177 0x80000000
				451	3L 79228162514264337593543950336L 0377L 0x100000000L
				452	\end{verbatim}
				453
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	454	\subsection{Floating point literals\label{floating}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	455
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	456	Floating point literals are described by the following lexical
				457	definitions:
				458
				459	\begin{verbatim}
				460	floatnumber: pointfloat \| exponentfloat
				461	pointfloat: [intpart] fraction \| intpart "."
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	462	exponentfloat: (nonzerodigit digit* \| pointfloat) exponent
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	463	intpart: nonzerodigit digit* \| "0"
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	464	fraction: "." digit+
				465	exponent: ("e"\|"E") ["+"\|"-"] digit+
				466	\end{verbatim}
				467
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	468	Note that the integer part of a floating point number cannot look like
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	469	an octal integer, though the exponent may look like an octal literal
				470	but will always be interpreted using radix 10. For example,
				471	\samp{1e010} is legal, while \samp{07.1} is a syntax error.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	472	The allowed range of floating point literals is
				473	implementation-dependent.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	474	Some examples of floating point literals:
				475
				476	\begin{verbatim}
				477	3.14 10. .001 1e100 3.14e-10
				478	\end{verbatim}
				479
				480	Note that numeric literals do not include a sign; a phrase like
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	481	\code{-1} is actually an expression composed of the operator
				482	\code{-} and the literal \code{1}.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	483
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	484	\subsection{Imaginary literals\label{imaginary}}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	485
				486	Imaginary literals are described by the following lexical definitions:
				487
				488	\begin{verbatim}
				489	imagnumber: (floatnumber \| intpart) ("j"\|"J")
				490	\end{verbatim}
				491
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	492	An imaginary literal yields a complex number with a real part of
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	493	0.0. Complex numbers are represented as a pair of floating point
				494	numbers and have the same restrictions on their range. To create a
				495	complex number with a nonzero real part, add a floating point number
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	496	to it, e.g., \code{(3+4j)}. Some examples of imaginary literals:
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	497
				498	\begin{verbatim}
Guido van Rossum	7c0240f	1998-07-24 15:36:43 +0000	[diff] [blame]	499	3.14j 10.j 10j .001j 1e100j 3.14e-10j
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	500	\end{verbatim}
				501
				502
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	503	\section{Operators\label{operators}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	504
				505	The following tokens are operators:
				506	\index{operators}
				507
				508	\begin{verbatim}
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	509	+ - * ** / %
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	510	<< >> & \| ^ ~
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	511	< > <= >= == != <>
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	512	\end{verbatim}
				513
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	514	The comparison operators \code{<>} and \code{!=} are alternate
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	515	spellings of the same operator. \code{!=} is the preferred spelling;
				516	\code{<>} is obsolescent.
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	517
Fred Drake	61c7728	1998-07-28 19:34:22 +0000	[diff] [blame]	518	\section{Delimiters\label{delimiters}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	519
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	520	The following tokens serve as delimiters in the grammar:
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	521	\index{delimiters}
				522
				523	\begin{verbatim}
				524	( ) [ ] { }
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	525	, : . ` = ;
				526	\end{verbatim}
				527
				528	The period can also occur in floating-point and imaginary literals. A
Fred Drake	e15956b	2000-04-03 04:51:13 +0000	[diff] [blame]	529	sequence of three periods has a special meaning as an ellipsis in slices.
Guido van Rossum	60f2f0c	1998-06-15 18:00:50 +0000	[diff] [blame]	530
				531	The following printing ASCII characters have special meaning as part
				532	of other tokens or are otherwise significant to the lexical analyzer:
				533
				534	\begin{verbatim}
				535	' " # \
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	536	\end{verbatim}
				537
				538	The following printing \ASCII{} characters are not used in Python. Their
				539	occurrence outside string literals and comments is an unconditional
				540	error:
Fred Drake	5c07d9b	1998-05-14 19:37:06 +0000	[diff] [blame]	541	\index{ASCII@\ASCII{}}
Fred Drake	f666917	1998-05-06 19:52:49 +0000	[diff] [blame]	542
				543	\begin{verbatim}
				544	@ $ ?
				545	\end{verbatim}