Cosmetic changes; added sections on notation and on objects;
new grammar (global, '==').
diff --git a/Doc/ref.tex b/Doc/ref.tex
index c47b084..d6d4f56 100644
--- a/Doc/ref.tex
+++ b/Doc/ref.tex
@@ -60,6 +60,69 @@
This reference manual describes the Python programming language.
It is not intended as a tutorial.
+While I am trying to be as precise as possible, I chose to use English
+rather than formal specifications for everything except syntax and
+lexical analysis. This should make the document better understandable
+to the average reader, but will leave room for ambiguities.
+Consequently, if you were coming from Mars and tried to re-implement
+Python from this document alone, you might in fact be implementing
+quite a different language. On the other hand, if you are using
+Python and wonder what the precise rules about a particular area of
+the language are, you should be able to find it here.
+
+It is dangerous to add too many implementation details to a language
+reference document -- the implementation may change, and other
+implementations of the same language may work differently. On the
+other hand, there is currently only one Python implementation, and
+particular quirks of it are sometimes worth mentioning, especially
+where it differs from the ``ideal'' specification.
+
+Every Python implementation comes with a number of built-in and
+standard modules. These are not documented here, but in the separate
+{\em Python Library Reference} document. A few built-in modules are
+mentioned when they interact in a significant way with the language
+definition.
+
+\section{Notation}
+
+The descriptions of lexical analysis and syntax use a modified BNF
+grammar notation. This uses the following style of definition:
+
+\begin{verbatim}
+name: lcletter (lcletter | "_")*
+lcletter: "a"..."z"
+\end{verbatim}
+
+The first line says that a \verb\name\ is a \verb\lcletter\ followed by
+a sequence of zero or more \verb\lcletter\s and underscores. A
+\verb\lcletter\ in turn is any of the single characters `a' through `z'.
+(This rule is actually adhered to for the names defined in syntax and
+grammar rules in this document.)
+
+Each rule begins with a name (which is the name defined by the rule)
+followed by a colon. Each rule is wholly contained on one line. A
+vertical bar (\verb\|\) is used to separate alternatives, it is the
+least binding operator in this notation. A star (\verb\*\) means zero
+or more repetitions of the preceding item; likewise, a plus (\verb\+\)
+means one or more repetitions and a question mark (\verb\?\) zero or
+one (in other words, the preceding item is optional). These three
+operators bind as tight as possible; parentheses are used for
+grouping. Literal strings are enclosed in double quotes. White space
+is only meaningful to separate tokens.
+
+In lexical definitions (as the example above), two more conventions
+are used: Two literal characters separated by three dots mean a choice
+of any single character in the given (inclusive) range of ASCII
+characters. A phrase between angular brackets (\verb\<...>\) gives an
+informal description of the symbol defined; e.g., this could be used
+to describe the notion of `control character' if needed.
+
+Although the notation used is almost the same, there is a big
+difference between the meaning of lexical and syntactic definitions:
+a lexical definition operates on the individual characters of the
+input source, while a syntax definition operates on the stream of
+tokens generated by the lexical analysis.
+
\chapter{Lexical analysis}
A Python program is read by a {\em parser}. Input to the parser is a
@@ -130,11 +193,6 @@
ambiguity exists, a token comprises the longest possible string that
forms a legal token, when read from left to right.
-Tokens are described using an extended regular expression notation.
-This is similar to the extended BNF notation used later, except that
-the notation \verb\<...>\ is used to give an informal description of a
-character, and that spaces and tabs are not to be ignored.
-
\section{Identifiers}
Identifiers are described by the following regular expressions:
@@ -142,9 +200,9 @@
\begin{verbatim}
identifier: (letter|"_") (letter|digit|"_")*
letter: lowercase | uppercase
-lowercase: "a"|"b"|...|"z"
-uppercase: "A"|"B"|...|"Z"
-digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
+lowercase: "a"..."z"
+uppercase: "A"..."Z"
+digit: "0"..."9"
\end{verbatim}
Identifiers are unlimited in length. Case is significant.
@@ -156,13 +214,14 @@
identifiers. They must be spelled exactly as written here:
\begin{verbatim}
-and del for is raise
-break elif from not return
-class else if or try
-continue except import pass while
-def finally in print
+and del for in print
+break elif from is raise
+class else global not return
+continue except if or try
+def finally import pass while
\end{verbatim}
+% # This Python program sorts and formats the above table
% import string
% l = []
% try:
@@ -185,8 +244,8 @@
\begin{verbatim}
stringliteral: "'" stringitem* "'"
stringitem: stringchar | escapeseq
-stringchar: <any character except newline or "\" or "'">
-escapeseq: "'" <any character except newline>
+stringchar: <any ASCII character except newline or "\" or "'">
+escapeseq: "'" <any ASCII character except newline>
\end{verbatim}
String literals cannot span physical line boundaries. Escape
@@ -208,7 +267,7 @@
\verb/\t/ & ASCII Horizontal Tab (TAB) \\
\verb/\v/ & ASCII Vertical Tab (VT) \\
\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\
-\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\
+\verb/\x/{em xx...} & ASCII character with hex value {\em xx...} \\
\hline
\end{tabular}
\end{center}
@@ -221,9 +280,10 @@
All unrecognized escape sequences are left in the string {\em
unchanged}, i.e., the backslash is left in the string. (This rule is
useful when debugging: if an escape sequence is mistyped, the
-resulting output is more easily recognized as broken. It also helps
-somewhat for string literals used as regular expressions or otherwise
-passed to other modules that do their own escape handling.)
+resulting output is more easily recognized as broken. It also helps a
+great deal for string literals used as regular expressions or
+otherwise passed to other modules that do their own escape handling --
+but you may end up quadrupling backslashes that must appear literally.)
\subsection{Numeric literals}
@@ -239,9 +299,9 @@
octinteger: "0" octdigit+
hexinteger: "0" ("x"|"X") hexdigit+
-nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
-octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
-hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
+nonzerodigit: "1"..."9"
+octdigit: "0"..."7"
+hexdigit: digit|"a"..."f"|"A"..."F"
\end{verbatim}
Floating point numbers are described by the following regular expressions:
@@ -260,16 +320,20 @@
\begin{verbatim}
+ - * / %
<< >> & | ^ ~
-< = == > <= <> != >=
+< == > <= <> != >=
\end{verbatim}
+The comparison operators \verb\<>\ and \verb\!=\ are alternate
+spellings of the same operator.
+
\section{Delimiters}
-The following tokens are delimiters:
+The following tokens serve as delimiters or otherwise have a special
+meaning:
\begin{verbatim}
( ) [ ] { }
-; , : . `
+; , : . ` =
\end{verbatim}
The following printing ASCII characters are currently not used;
@@ -281,35 +345,83 @@
\chapter{Execution model}
-(XXX This chapter should explain the general model
-of the execution of Python code and
-the evaluation of expressions.
-It should introduce objects, values, code blocks, scopes, name spaces,
-name binding,
-types, sequences, numbers, mappings,
-exceptions, and other technical terms needed to make the following
-chapters concise and exact.)
+(XXX This chapter should explain the general model of the execution of
+Python code and the evaluation of expressions. It should introduce
+objects, values, code blocks, scopes, name spaces, name binding,
+types, sequences, numbers, mappings, exceptions, and other technical
+terms needed to make the following chapters concise and exact.)
+
+\section{Objects, values and types}
+
+I won't try to define rigorously here what an object is, but I'll give
+some properties of objects that are important to know about.
+
+Every object has an identity, a type and a value. An object's {\em
+identity} never changes once it has been created; think of it as the
+object's (permanent) address. An object's {\em type} determines the
+operations that an object supports (e.g., can its length be taken?)
+and also defines the ``meaning'' of the object's value; it also never
+changes. The {\em value} of some objects can change; whether an
+object's value can change is a property of its type.
+
+Objects are never explicitly destroyed; however, when they become
+unreachable they may be garbage-collected. An implementation,
+however, is allowed to delay garbage collection or omit it altogether
+-- it is a matter of implementation quality how garbage collection is
+implemented. (Implementation note: the current implementation uses a
+reference-counting scheme which collects most objects as soon as they
+become onreachable, but does not detect garbage containing circular
+references.)
+
+(Some objects contain references to ``external'' resources such as
+open files. It is understood that these resources are freed when the
+object is garbage-collected, but since garbage collection is not
+guaranteed such objects also provide an explicit way to release the
+external resource (e.g., a \verb\close\ method) and programs are
+recommended to use this.)
+
+Some objects contain references to other objects. These references
+are part of the object's value; in most cases, when such a
+``container'' object is compared to another (of the same type), the
+comparison takes the {\em values} of the referenced objects into
+account (not their identities).
+
+Except for their identity, types affect almost any aspect of objects.
+Even object identities are affected in some sense: for immutable
+types, operations that compute new values may actually return a
+reference to an existing object with the same type and value, while
+for mutable objects this is not allowed. E.g., after
+
+\begin{verbatim}
+a = 1; b = 1; c = []; d = []
+\end{verbatim}
+
+\verb\a\ and \verb\b\ may or may not refer to the same object, but
+\verb\c\ and \verb\d\ are guaranteed to refer to two different, unique,
+newly created lists.
+
+\section{Execution frames, name spaces, and scopes}
+
+XXX
\chapter{Expressions and conditions}
-(From now on, extended BNF notation will be used to describe
-syntax, not lexical analysis.)
-(XXX Explain the notation.)
+From now on, extended BNF notation will be used to describe syntax,
+not lexical analysis.
This chapter explains the meaning of the elements of expressions and
conditions. Conditions are a superset of expressions, and a condition
may be used where an expression is required by enclosing it in
-parentheses. The only place where an unparenthesized condition
-is not allowed is on the right-hand side of the assignment operator,
-because this operator is the same token (\verb\=\) as used for
-compasisons.
+parentheses. The only place where an unparenthesized condition is not
+allowed is on the right-hand side of the assignment operator, because
+this operator is the same token (\verb\=\) as used for compasisons.
-The comma plays a somewhat special role in Python's syntax.
-It is an operator with a lower precedence than all others, but
-occasionally serves other purposes as well (e.g., it has special
-semantics in print statements). When a comma is accepted by the
-syntax, one of the syntactic categories \verb\expression_list\
-or \verb\condition_list\ is always used.
+The comma plays a somewhat special role in Python's syntax. It is an
+operator with a lower precedence than all others, but occasionally
+serves other purposes as well (e.g., it has special semantics in print
+statements). When a comma is accepted by the syntax, one of the
+syntactic categories \verb\expression_list\ or \verb\condition_list\
+is always used.
When (one alternative of) a syntax rule has the form
@@ -351,11 +463,11 @@
atom: identifier | literal | parenth_form | string_conversion
literal: stringliteral | integer | longinteger | floatnumber
parenth_form: enclosure | list_display | dict_display
-enclosure: '(' [condition_list] ')'
-list_display: '[' [condition_list] ']'
-dict_display: '{' [key_datum (',' key_datum)* [','] '}'
-key_datum: condition ':' condition
-string_conversion:'`' condition_list '`'
+enclosure: "(" [condition_list] ")"
+list_display: "[" [condition_list] "]"
+dict_display: "{" [key_datum ("," key_datum)* [","] "}"
+key_datum: condition ":" condition
+string_conversion:"`" condition_list "`"
\end{verbatim}
\subsection{Identifiers (Names)}
@@ -413,10 +525,9 @@
each key object is used as a key into the dictionary to store
the corresponding datum pair.
-Key objects must be strings, otherwise a {\tt TypeError}
-exception is raised.
-Clashes between keys are not detected; the last datum stored for a given
-key value prevails.
+Keys must be strings, otherwise a {\tt TypeError} exception is raised.
+Clashes between keys are not detected; the last datum (textually
+rightmost in the display) stored for a given key value prevails.
\subsection{String conversions}
@@ -445,10 +556,10 @@
\begin{verbatim}
primary: atom | attributeref | call | subscription | slicing
-attributeref: primary '.' identifier
-call: primary '(' [condition_list] ')'
-subscription: primary '[' condition ']'
-slicing: primary '[' [condition] ':' [condition] ']'
+attributeref: primary "." identifier
+call: primary "(" [condition_list] ")"
+subscription: primary "[" condition "]"
+slicing: primary "[" [condition] ":" [condition] "]"
\end{verbatim}
\subsection{Attribute references}
@@ -465,7 +576,7 @@
Their syntax is:
\begin{verbatim}
-factor: primary | '-' factor | '+' factor | '~' factor
+factor: primary | "-" factor | "+" factor | "~" factor
\end{verbatim}
The unary \verb\-\ operator yields the negative of its numeric argument.
@@ -483,7 +594,7 @@
Terms represent the most tightly binding binary operators:
\begin{verbatim}
-term: factor | term '*' factor | term '/' factor | term '%' factor
+term: factor | term "*" factor | term "/" factor | term "%" factor
\end{verbatim}
The \verb\*\ operator yields the product of its arguments.
@@ -494,13 +605,13 @@
In the latter case, string repetition is performed; a negative
repetition factor yields the empty string.
-The \verb|'/'| operator yields the quotient of its arguments.
+The \verb|"/"| operator yields the quotient of its arguments.
The numeric arguments are first converted to a common type.
(Short or long) integer division yields an integer of the same type,
truncating towards zero.
Division by zero raises a {\tt RuntimeError} exception.
-The \verb|'%'| operator yields the remainder from the division
+The \verb|"%"| operator yields the remainder from the division
of the first argument by the second.
The numeric arguments are first converted to a common type.
The outcome of $x \% y$ is defined as $x - y*trunc(x/y)$.
@@ -511,28 +622,28 @@
\section{Arithmetic expressions}
\begin{verbatim}
-arith_expr: term | arith_expr '+' term | arith_expr '-' term
+arith_expr: term | arith_expr "+" term | arith_expr "-" term
\end{verbatim}
-The \verb|'+'| operator yields the sum of its arguments.
+The \verb|"+"| operator yields the sum of its arguments.
The arguments must either both be numbers, or both strings.
In the former case, the numbers are converted to a common type
and then added together.
In the latter case, the strings are concatenated directly,
without inserting a space.
-The \verb|'-'| operator yields the difference of its arguments.
+The \verb|"-"| operator yields the difference of its arguments.
The numeric arguments are first converted to a common type.
\section{Shift expressions}
\begin{verbatim}
-shift_expr: arith_expr | shift_expr '<<' arith_expr | shift_expr '>>' arith_expr
+shift_expr: arith_expr | shift_expr "<<" arith_expr | shift_expr ">>" arith_expr
\end{verbatim}
These operators accept short integers as arguments only.
They shift their left argument to the left or right by the number of bits
-given by the right argument. Shifts are ``logical'', e.g., bits shifted
+given by the right argument. Shifts are ``logical"", e.g., bits shifted
out on one end are lost, and bits shifted in are zero;
negative numbers are shifted as if they were unsigned in C.
Negative shift counts and shift counts greater than {\em or equal to}
@@ -541,7 +652,7 @@
\section{Bitwise AND expressions}
\begin{verbatim}
-and_expr: shift_expr | and_expr '&' shift_expr
+and_expr: shift_expr | and_expr "&" shift_expr
\end{verbatim}
This operator yields the bitwise AND of its arguments,
@@ -550,7 +661,7 @@
\section{Bitwise XOR expressions}
\begin{verbatim}
-xor_expr: and_expr | xor_expr '^' and_expr
+xor_expr: and_expr | xor_expr "^" and_expr
\end{verbatim}
This operator yields the bitwise exclusive OR of its arguments,
@@ -559,7 +670,7 @@
\section{Bitwise OR expressions}
\begin{verbatim}
-or_expr: xor_expr | or_expr '|' xor_expr
+or_expr: xor_expr | or_expr "|" xor_expr
\end{verbatim}
This operator yields the bitwise OR of its arguments,
@@ -569,7 +680,7 @@
\begin{verbatim}
expression: or_expression
-expr_list: expression (',' expression)* [',']
+expr_list: expression ("," expression)* [","]
\end{verbatim}
An expression list containing at least one comma yields a new tuple.
@@ -587,7 +698,7 @@
\begin{verbatim}
comparison: expression (comp_operator expression)*
-comp_operator: '<'|'>'|'='|'=='|'>='|'<='|'<>'|'!='|['not'] 'in'|is' ['not']
+comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in"
\end{verbatim}
Comparisons yield integer value: 1 for true, 0 for false.
@@ -605,12 +716,9 @@
Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
-For the benefit of C programmers,
-the comparison operators \verb\=\ and \verb\==\ are equivalent,
-and so are \verb\<>\ and \verb\!=\.
-Use of the C variants is discouraged.
+The forms \verb\<>\ and \verb\!=\ are equivalent.
-The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
+The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "<>"} compare
the values of two objects. The objects needn't have the same type.
If both are numbers, they are compared to a common type.
Otherwise, objects of different types {\em always} compare unequal,
@@ -652,9 +760,9 @@
\begin{verbatim}
condition: or_test
-or_test: and_test | or_test 'or' and_test
-and_test: not_test | and_test 'and' not_test
-not_test: comparison | 'not' not_test
+or_test: and_test | or_test "or" and_test
+and_test: not_test | and_test "and" not_test
+not_test: comparison | "not" not_test
\end{verbatim}
In the context of Boolean operators, and also when conditions are
@@ -686,7 +794,7 @@
by semicolons. The syntax for simple statements is:
\begin{verbatim}
-stmt_list: simple_stmt (';' simple_stmt)* [';']
+stmt_list: simple_stmt (";" simple_stmt)* [";"]
simple_stmt: expression_stmt
| assignment
| pass_stmt
@@ -697,6 +805,7 @@
| break_stmt
| continue_stmt
| import_stmt
+ | global_stmt
\end{verbatim}
\section{Expression statements}
@@ -718,9 +827,9 @@
\section{Assignments}
\begin{verbatim}
-assignment: target_list ('=' target_list)* '=' expression_list
-target_list: target (',' target)* [',']
-target: identifier | '(' target_list ')' | '[' target_list ']'
+assignment: target_list ("=" target_list)* "=" expression_list
+target_list: target ("," target)* [","]
+target: identifier | "(" target_list ")" | "[" target_list "]"
| attributeref | subscription | slicing
\end{verbatim}
@@ -835,7 +944,7 @@
\section{The {\tt pass} statement}
\begin{verbatim}
-pass_stmt: 'pass'
+pass_stmt: "pass"
\end{verbatim}
{\tt pass} is a null operation -- when it is executed,
@@ -844,7 +953,7 @@
\section{The {\tt del} statement}
\begin{verbatim}
-del_stmt: 'del' target_list
+del_stmt: "del" target_list
\end{verbatim}
Deletion is recursively defined similar to assignment.
@@ -866,7 +975,7 @@
\section{The {\tt print} statement}
\begin{verbatim}
-print_stmt: 'print' [ condition (',' condition)* [','] ]
+print_stmt: "print" [ condition ("," condition)* [","] ]
\end{verbatim}
{\tt print} evaluates each condition in turn and writes the resulting
@@ -897,7 +1006,7 @@
\section{The {\tt return} statement}
\begin{verbatim}
-return_stmt: 'return' [condition_list]
+return_stmt: "return" [condition_list]
\end{verbatim}
\verb\return\ may only occur syntactically nested in a function
@@ -917,7 +1026,7 @@
\section{The {\tt raise} statement}
\begin{verbatim}
-raise_stmt: 'raise' condition [',' condition]
+raise_stmt: "raise" condition ["," condition]
\end{verbatim}
\verb\raise\ evaluates its first condition, which must yield
@@ -930,7 +1039,7 @@
\section{The {\tt break} statement}
\begin{verbatim}
-break_stmt: 'break'
+break_stmt: "break"
\end{verbatim}
\verb\break\ may only occur syntactically nested in a \verb\for\
@@ -949,7 +1058,7 @@
\section{The {\tt continue} statement}
\begin{verbatim}
-continue_stmt: 'continue'
+continue_stmt: "continue"
\end{verbatim}
\verb\continue\ may only occur syntactically nested in a \verb\for\
@@ -962,9 +1071,17 @@
\section{The {\tt import} statement}
\begin{verbatim}
-import_stmt: 'import' identifier (',' identifier)*
- | 'from' identifier 'import' identifier (',' identifier)*
- | 'from' identifier 'import' '*'
+import_stmt: "import" identifier ("," identifier)*
+ | "from" identifier "import" identifier ("," identifier)*
+ | "from" identifier "import" "*"
+\end{verbatim}
+
+(XXX To be done.)
+
+\section{The {\tt global} statement}
+
+\begin{verbatim}
+global_stmt: "global" identifier ("," identifier)*
\end{verbatim}
(XXX To be done.)
@@ -982,48 +1099,49 @@
\section{The {\tt if} statement}
\begin{verbatim}
-if_stmt: 'if' condition ':' suite
- ('elif' condition ':' suite)*
- ['else' ':' suite]
+if_stmt: "if" condition ":" suite
+ ("elif" condition ":" suite)*
+ ["else" ":" suite]
\end{verbatim}
\section{The {\tt while} statement}
\begin{verbatim}
-while_stmt: 'while' condition ':' suite ['else' ':' suite]
+while_stmt: "while" condition ":" suite ["else" ":" suite]
\end{verbatim}
\section{The {\tt for} statement}
\begin{verbatim}
-for_stmt: 'for' target_list 'in' condition_list ':' suite
- ['else' ':' suite]
+for_stmt: "for" target_list "in" condition_list ":" suite
+ ["else" ":" suite]
\end{verbatim}
\section{The {\tt try} statement}
\begin{verbatim}
-try_stmt: 'try' ':' suite
- ('except' condition [',' condition] ':' suite)*
- ['finally' ':' suite]
+try_stmt: "try" ":" suite
+ ("except" condition ["," condition] ":" suite)*
+ ["finally" ":" suite]
\end{verbatim}
\section{Function definitions}
\begin{verbatim}
-funcdef: 'def' identifier '(' [parameter_list] ')' ':' suite
-parameter_list: parameter (',' parameter)*
-parameter: identifier | '(' parameter_list ')'
+funcdef: "def" identifier "(" [parameter_list] ")" ":" suite
+parameter_list: parameter ("," parameter)*
+parameter: identifier | "(" parameter_list ")"
\end{verbatim}
\section{Class definitions}
\begin{verbatim}
-classdef: 'class' identifier '(' ')' [inheritance] ':' suite
-inheritance: '=' identifier '(' ')' (',' identifier '(' ')')*
+classdef: "class" identifier [inheritance] ":" suite
+inheritance: "(" expression ("," expression)* ")"
\end{verbatim}
XXX Syntax for scripts, modules
XXX Syntax for interactive input, eval, exec, input
+XXX New definition of expressions (as conditions)
\end{document}