Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 1 | \chapter{Lexical analysis\label{lexical}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 2 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 3 | A Python program is read by a \emph{parser}. Input to the parser is a |
| 4 | stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 5 | chapter describes how the lexical analyzer breaks a file into tokens. |
| 6 | \index{lexical analysis} |
| 7 | \index{parser} |
| 8 | \index{token} |
| 9 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 10 | Python uses the 7-bit \ASCII{} character set for program text and string |
| 11 | literals. 8-bit characters may be used in string literals and comments |
| 12 | but their interpretation is platform dependent; the proper way to |
| 13 | insert 8-bit characters in string literals is by using octal or |
| 14 | hexadecimal escape sequences. |
| 15 | |
| 16 | The run-time character set depends on the I/O devices connected to the |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 17 | program but is generally a superset of \ASCII. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 18 | |
| 19 | \strong{Future compatibility note:} It may be tempting to assume that the |
| 20 | character set for 8-bit characters is ISO Latin-1 (an \ASCII{} |
| 21 | superset that covers most western languages that use the Latin |
| 22 | alphabet), but it is possible that in the future Unicode text editors |
| 23 | will become common. These generally use the UTF-8 encoding, which is |
| 24 | also an \ASCII{} superset, but with very different use for the |
| 25 | characters with ordinals 128-255. While there is no consensus on this |
| 26 | subject yet, it is unwise to assume either Latin-1 or UTF-8, even |
| 27 | though the current implementation appears to favor Latin-1. This |
| 28 | applies both to the source character set and the run-time character |
| 29 | set. |
| 30 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 31 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 32 | \section{Line structure\label{line-structure}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 33 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 34 | A Python program is divided into a number of \emph{logical lines}. |
| 35 | \index{line structure} |
| 36 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 37 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 38 | \subsection{Logical lines\label{logical}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 39 | |
| 40 | The end of |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 41 | a logical line is represented by the token NEWLINE. Statements cannot |
| 42 | cross logical line boundaries except where NEWLINE is allowed by the |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 43 | syntax (e.g., between statements in compound statements). |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 44 | A logical line is constructed from one or more \emph{physical lines} |
| 45 | by following the explicit or implicit \emph{line joining} rules. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 46 | \index{logical line} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 47 | \index{physical line} |
| 48 | \index{line joining} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 49 | \index{NEWLINE token} |
| 50 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 51 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 52 | \subsection{Physical lines\label{physical}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 53 | |
| 54 | A physical line ends in whatever the current platform's convention is |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 55 | for terminating lines. On \UNIX, this is the \ASCII{} LF (linefeed) |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 56 | character. On DOS/Windows, it is the \ASCII{} sequence CR LF (return |
| 57 | followed by linefeed). On Macintosh, it is the \ASCII{} CR (return) |
| 58 | character. |
| 59 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 60 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 61 | \subsection{Comments\label{comments}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 62 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 63 | A comment starts with a hash character (\code{\#}) that is not part of |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 64 | a string literal, and ends at the end of the physical line. A comment |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 65 | signifies the end of the logical line unless the implicit line joining |
| 66 | rules are invoked. |
| 67 | Comments are ignored by the syntax; they are not tokens. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 68 | \index{comment} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 69 | \index{hash character} |
| 70 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 71 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 72 | \subsection{Explicit line joining\label{explicit-joining}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 73 | |
| 74 | Two or more physical lines may be joined into logical lines using |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 75 | backslash characters (\code{\e}), as follows: when a physical line ends |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 76 | in a backslash that is not part of a string literal or comment, it is |
| 77 | joined with the following forming a single logical line, deleting the |
| 78 | backslash and the following end-of-line character. For example: |
| 79 | \index{physical line} |
| 80 | \index{line joining} |
| 81 | \index{line continuation} |
| 82 | \index{backslash character} |
| 83 | % |
| 84 | \begin{verbatim} |
| 85 | if 1900 < year < 2100 and 1 <= month <= 12 \ |
| 86 | and 1 <= day <= 31 and 0 <= hour < 24 \ |
| 87 | and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date |
| 88 | return 1 |
| 89 | \end{verbatim} |
| 90 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 91 | A line ending in a backslash cannot carry a comment. A backslash does |
| 92 | not continue a comment. A backslash does not continue a token except |
| 93 | for string literals (i.e., tokens other than string literals cannot be |
| 94 | split across physical lines using a backslash). A backslash is |
| 95 | illegal elsewhere on a line outside a string literal. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 96 | |
Fred Drake | c411fa6 | 1999-02-22 14:32:18 +0000 | [diff] [blame] | 97 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 98 | \subsection{Implicit line joining\label{implicit-joining}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 99 | |
| 100 | Expressions in parentheses, square brackets or curly braces can be |
| 101 | split over more than one physical line without using backslashes. |
| 102 | For example: |
| 103 | |
| 104 | \begin{verbatim} |
| 105 | month_names = ['Januari', 'Februari', 'Maart', # These are the |
| 106 | 'April', 'Mei', 'Juni', # Dutch names |
| 107 | 'Juli', 'Augustus', 'September', # for the months |
| 108 | 'Oktober', 'November', 'December'] # of the year |
| 109 | \end{verbatim} |
| 110 | |
| 111 | Implicitly continued lines can carry comments. The indentation of the |
| 112 | continuation lines is not important. Blank continuation lines are |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 113 | allowed. There is no NEWLINE token between implicit continuation |
| 114 | lines. Implicitly continued lines can also occur within triple-quoted |
| 115 | strings (see below); in that case they cannot carry comments. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 116 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 117 | |
Fred Drake | c411fa6 | 1999-02-22 14:32:18 +0000 | [diff] [blame] | 118 | \subsection{Blank lines \index{blank line}\label{blank-lines}} |
| 119 | |
| 120 | A logical line that contains only spaces, tabs, formfeeds and possibly |
| 121 | a comment, is ignored (i.e., no NEWLINE token is generated). During |
| 122 | interactive input of statements, handling of a blank line may differ |
| 123 | depending on the implementation of the read-eval-print loop. In the |
| 124 | standard implementation, an entirely blank logical line (i.e.\ one |
| 125 | containing not even whitespace or a comment) terminates a multi-line |
| 126 | statement. |
| 127 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 128 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 129 | \subsection{Indentation\label{indentation}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 130 | |
| 131 | Leading whitespace (spaces and tabs) at the beginning of a logical |
| 132 | line is used to compute the indentation level of the line, which in |
| 133 | turn is used to determine the grouping of statements. |
| 134 | \index{indentation} |
| 135 | \index{whitespace} |
| 136 | \index{leading whitespace} |
| 137 | \index{space} |
| 138 | \index{tab} |
| 139 | \index{grouping} |
| 140 | \index{statement grouping} |
| 141 | |
| 142 | First, tabs are replaced (from left to right) by one to eight spaces |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 143 | such that the total number of characters up to and including the |
| 144 | replacement is a multiple of |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 145 | eight (this is intended to be the same rule as used by \UNIX). The |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 146 | total number of spaces preceding the first non-blank character then |
| 147 | determines the line's indentation. Indentation cannot be split over |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 148 | multiple physical lines using backslashes; the whitespace up to the |
| 149 | first backslash determines the indentation. |
| 150 | |
| 151 | \strong{Cross-platform compatibility note:} because of the nature of |
| 152 | text editors on non-UNIX platforms, it is unwise to use a mixture of |
| 153 | spaces and tabs for the indentation in a single source file. |
| 154 | |
| 155 | A formfeed character may be present at the start of the line; it will |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 156 | be ignored for the indentation calculations above. Formfeed |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 157 | characters occurring elsewhere in the leading whitespace have an |
| 158 | undefined effect (for instance, they may reset the space count to |
| 159 | zero). |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 160 | |
| 161 | The indentation levels of consecutive lines are used to generate |
| 162 | INDENT and DEDENT tokens, using a stack, as follows. |
| 163 | \index{INDENT token} |
| 164 | \index{DEDENT token} |
| 165 | |
| 166 | Before the first line of the file is read, a single zero is pushed on |
| 167 | the stack; this will never be popped off again. The numbers pushed on |
| 168 | the stack will always be strictly increasing from bottom to top. At |
| 169 | the beginning of each logical line, the line's indentation level is |
| 170 | compared to the top of the stack. If it is equal, nothing happens. |
| 171 | If it is larger, it is pushed on the stack, and one INDENT token is |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 172 | generated. If it is smaller, it \emph{must} be one of the numbers |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 173 | occurring on the stack; all numbers on the stack that are larger are |
| 174 | popped off, and for each number popped off a DEDENT token is |
| 175 | generated. At the end of the file, a DEDENT token is generated for |
| 176 | each number remaining on the stack that is larger than zero. |
| 177 | |
| 178 | Here is an example of a correctly (though confusingly) indented piece |
| 179 | of Python code: |
| 180 | |
| 181 | \begin{verbatim} |
| 182 | def perm(l): |
| 183 | # Compute the list of all permutations of l |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 184 | if len(l) <= 1: |
| 185 | return [l] |
| 186 | r = [] |
| 187 | for i in range(len(l)): |
| 188 | s = l[:i] + l[i+1:] |
| 189 | p = perm(s) |
| 190 | for x in p: |
| 191 | r.append(l[i:i+1] + x) |
| 192 | return r |
| 193 | \end{verbatim} |
| 194 | |
| 195 | The following example shows various indentation errors: |
| 196 | |
| 197 | \begin{verbatim} |
Fred Drake | 1d3e6c1 | 2001-12-11 17:46:38 +0000 | [diff] [blame] | 198 | def perm(l): # error: first line indented |
| 199 | for i in range(len(l)): # error: not indented |
| 200 | s = l[:i] + l[i+1:] |
| 201 | p = perm(l[:i] + l[i+1:]) # error: unexpected indent |
| 202 | for x in p: |
| 203 | r.append(l[i:i+1] + x) |
| 204 | return r # error: inconsistent dedent |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 205 | \end{verbatim} |
| 206 | |
| 207 | (Actually, the first three errors are detected by the parser; only the |
| 208 | last error is found by the lexical analyzer --- the indentation of |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 209 | \code{return r} does not match a level popped off the stack.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 210 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 211 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 212 | \subsection{Whitespace between tokens\label{whitespace}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 213 | |
| 214 | Except at the beginning of a logical line or in string literals, the |
| 215 | whitespace characters space, tab and formfeed can be used |
| 216 | interchangeably to separate tokens. Whitespace is needed between two |
| 217 | tokens only if their concatenation could otherwise be interpreted as a |
| 218 | different token (e.g., ab is one token, but a b is two tokens). |
| 219 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 220 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 221 | \section{Other tokens\label{other-tokens}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 222 | |
| 223 | Besides NEWLINE, INDENT and DEDENT, the following categories of tokens |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 224 | exist: \emph{identifiers}, \emph{keywords}, \emph{literals}, |
| 225 | \emph{operators}, and \emph{delimiters}. |
| 226 | Whitespace characters (other than line terminators, discussed earlier) |
| 227 | are not tokens, but serve to delimit tokens. |
| 228 | Where |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 229 | ambiguity exists, a token comprises the longest possible string that |
| 230 | forms a legal token, when read from left to right. |
| 231 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 232 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 233 | \section{Identifiers and keywords\label{identifiers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 234 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 235 | Identifiers (also referred to as \emph{names}) are described by the following |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 236 | lexical definitions: |
| 237 | \index{identifier} |
| 238 | \index{name} |
| 239 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 240 | \begin{productionlist} |
| 241 | \production{identifier} |
| 242 | {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*} |
| 243 | \production{letter} |
| 244 | {\token{lowercase} | \token{uppercase}} |
| 245 | \production{lowercase} |
| 246 | {"a"..."z"} |
| 247 | \production{uppercase} |
| 248 | {"A"..."Z"} |
| 249 | \production{digit} |
| 250 | {"0"..."9"} |
| 251 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 252 | |
| 253 | Identifiers are unlimited in length. Case is significant. |
| 254 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 255 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 256 | \subsection{Keywords\label{keywords}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 257 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 258 | The following identifiers are used as reserved words, or |
| 259 | \emph{keywords} of the language, and cannot be used as ordinary |
| 260 | identifiers. They must be spelled exactly as written here:% |
| 261 | \index{keyword}% |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 262 | \index{reserved word} |
| 263 | |
| 264 | \begin{verbatim} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 265 | and del for is raise |
| 266 | assert elif from lambda return |
| 267 | break else global not try |
Guido van Rossum | 41c6719 | 2001-12-04 20:38:44 +0000 | [diff] [blame] | 268 | class except if or while |
| 269 | continue exec import pass yield |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 270 | def finally in print |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 271 | \end{verbatim} |
| 272 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 273 | % When adding keywords, use reswords.py for reformatting |
| 274 | |
Fred Drake | a23b573 | 2002-06-18 19:17:14 +0000 | [diff] [blame] | 275 | Note that although the identifier \code{as} can be used as part of the |
| 276 | syntax of \keyword{import} statements, it is not currently a reserved |
| 277 | word. |
| 278 | |
| 279 | In some future version of Python, the identifiers \code{as} and |
| 280 | \code{None} will both become keywords. |
| 281 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 282 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 283 | \subsection{Reserved classes of identifiers\label{id-classes}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 284 | |
| 285 | Certain classes of identifiers (besides keywords) have special |
| 286 | meanings. These are: |
| 287 | |
Fred Drake | 39fc1bc | 1999-03-05 18:30:21 +0000 | [diff] [blame] | 288 | \begin{tableiii}{l|l|l}{code}{Form}{Meaning}{Notes} |
| 289 | \lineiii{_*}{Not imported by \samp{from \var{module} import *}}{(1)} |
| 290 | \lineiii{__*__}{System-defined name}{} |
| 291 | \lineiii{__*}{Class-private name mangling}{} |
| 292 | \end{tableiii} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 293 | |
| 294 | (XXX need section references here.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 295 | |
Fred Drake | 39fc1bc | 1999-03-05 18:30:21 +0000 | [diff] [blame] | 296 | Note: |
| 297 | |
| 298 | \begin{description} |
| 299 | \item[(1)] The special identifier \samp{_} is used in the interactive |
| 300 | interpreter to store the result of the last evaluation; it is stored |
| 301 | in the \module{__builtin__} module. When not in interactive mode, |
| 302 | \samp{_} has no special meaning and is not defined. |
| 303 | \end{description} |
| 304 | |
| 305 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 306 | \section{Literals\label{literals}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 307 | |
| 308 | Literals are notations for constant values of some built-in types. |
| 309 | \index{literal} |
| 310 | \index{constant} |
| 311 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 312 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 313 | \subsection{String literals\label{strings}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 314 | |
| 315 | String literals are described by the following lexical definitions: |
| 316 | \index{string literal} |
| 317 | |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 318 | \index{ASCII@\ASCII} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 319 | \begin{productionlist} |
| 320 | \production{stringliteral} |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 321 | {[\token{stringprefix}](\token{shortstring} | \token{longstring})} |
| 322 | \production{stringprefix} |
| 323 | {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 324 | \production{shortstring} |
| 325 | {"'" \token{shortstringitem}* "'" |
| 326 | | '"' \token{shortstringitem}* '"'} |
| 327 | \production{longstring} |
Fred Drake | 5381588 | 2002-03-15 23:21:37 +0000 | [diff] [blame] | 328 | {"'''" \token{longstringitem}* "'''"} |
| 329 | \productioncont{| '"""' \token{longstringitem}* '"""'} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 330 | \production{shortstringitem} |
| 331 | {\token{shortstringchar} | \token{escapeseq}} |
| 332 | \production{longstringitem} |
| 333 | {\token{longstringchar} | \token{escapeseq}} |
| 334 | \production{shortstringchar} |
| 335 | {<any ASCII character except "\e" or newline or the quote>} |
| 336 | \production{longstringchar} |
Fred Drake | 1d3e6c1 | 2001-12-11 17:46:38 +0000 | [diff] [blame] | 337 | {<any ASCII character except "\e">} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 338 | \production{escapeseq} |
| 339 | {"\e" <any ASCII character>} |
| 340 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 341 | |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 342 | One syntactic restriction not indicated by these productions is that |
| 343 | whitespace is not allowed between the \grammartoken{stringprefix} and |
| 344 | the rest of the string literal. |
| 345 | |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 346 | \index{triple-quoted string} |
| 347 | \index{Unicode Consortium} |
| 348 | \index{string!Unicode} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 349 | In plain English: String literals can be enclosed in matching single |
| 350 | quotes (\code{'}) or double quotes (\code{"}). They can also be |
| 351 | enclosed in matching groups of three single or double quotes (these |
| 352 | are generally referred to as \emph{triple-quoted strings}). The |
| 353 | backslash (\code{\e}) character is used to escape characters that |
| 354 | otherwise have a special meaning, such as newline, backslash itself, |
| 355 | or the quote character. String literals may optionally be prefixed |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 356 | with a letter `r' or `R'; such strings are called \dfn{raw |
| 357 | strings}\index{raw string} and use different rules for interpreting |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 358 | backslash escape sequences. A prefix of 'u' or 'U' makes the string |
| 359 | a Unicode string. Unicode strings use the Unicode character set as |
| 360 | defined by the Unicode Consortium and ISO~10646. Some additional |
| 361 | escape sequences, described below, are available in Unicode strings. |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 362 | The two prefix characters may be combined; in this case, `u' must |
| 363 | appear before `r'. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 364 | |
| 365 | In triple-quoted strings, |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 366 | unescaped newlines and quotes are allowed (and are retained), except |
| 367 | that three unescaped quotes in a row terminate the string. (A |
| 368 | ``quote'' is the character used to open the string, i.e. either |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 369 | \code{'} or \code{"}.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 370 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 371 | Unless an `r' or `R' prefix is present, escape sequences in strings |
| 372 | are interpreted according to rules similar |
Fred Drake | 9079164 | 2001-07-20 15:33:23 +0000 | [diff] [blame] | 373 | to those used by Standard C. The recognized escape sequences are: |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 374 | \index{physical line} |
| 375 | \index{escape sequence} |
| 376 | \index{Standard C} |
| 377 | \index{C} |
| 378 | |
Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 379 | \begin{tableii}{l|l}{code}{Escape Sequence}{Meaning} |
| 380 | \lineii{\e\var{newline}} {Ignored} |
| 381 | \lineii{\e\e} {Backslash (\code{\e})} |
| 382 | \lineii{\e'} {Single quote (\code{'})} |
| 383 | \lineii{\e"} {Double quote (\code{"})} |
| 384 | \lineii{\e a} {\ASCII{} Bell (BEL)} |
| 385 | \lineii{\e b} {\ASCII{} Backspace (BS)} |
| 386 | \lineii{\e f} {\ASCII{} Formfeed (FF)} |
| 387 | \lineii{\e n} {\ASCII{} Linefeed (LF)} |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 388 | \lineii{\e N\{\var{name}\}} |
| 389 | {Character named \var{name} in the Unicode database (Unicode only)} |
Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 390 | \lineii{\e r} {\ASCII{} Carriage Return (CR)} |
| 391 | \lineii{\e t} {\ASCII{} Horizontal Tab (TAB)} |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 392 | \lineii{\e u\var{xxxx}} {Character with 16-bit hex value \var{xxxx} (Unicode only)} |
| 393 | \lineii{\e U\var{xxxxxxxx}}{Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)} |
Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 394 | \lineii{\e v} {\ASCII{} Vertical Tab (VT)} |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 395 | \lineii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}} |
| 396 | \lineii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}} |
Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 397 | \end{tableii} |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 398 | \index{ASCII@\ASCII} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 399 | |
Tim Peters | 7530208 | 2001-02-14 04:03:51 +0000 | [diff] [blame] | 400 | As in Standard C, up to three octal digits are accepted. However, |
| 401 | exactly two hex digits are taken in hex escapes. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 402 | |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 403 | Unlike Standard \index{unrecognized escape sequence}C, |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 404 | all unrecognized escape sequences are left in the string unchanged, |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 405 | i.e., \emph{the backslash is left in the string}. (This behavior is |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 406 | useful when debugging: if an escape sequence is mistyped, the |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 407 | resulting output is more easily recognized as broken.) It is also |
| 408 | important to note that the escape sequences marked as ``(Unicode |
| 409 | only)'' in the table above fall into the category of unrecognized |
| 410 | escapes for non-Unicode string literals. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 411 | |
Fred Drake | 347a625 | 2001-01-09 21:38:16 +0000 | [diff] [blame] | 412 | When an `r' or `R' prefix is present, a character following a |
| 413 | backslash is included in the string without change, and \emph{all |
| 414 | backslashes are left in the string}. For example, the string literal |
| 415 | \code{r"\e n"} consists of two characters: a backslash and a lowercase |
| 416 | `n'. String quotes can be escaped with a backslash, but the backslash |
| 417 | remains in the string; for example, \code{r"\e""} is a valid string |
| 418 | literal consisting of two characters: a backslash and a double quote; |
Fred Drake | 0825dc2 | 2001-07-20 14:32:28 +0000 | [diff] [blame] | 419 | \code{r"\e"} is not a valid string literal (even a raw string cannot |
Fred Drake | 347a625 | 2001-01-09 21:38:16 +0000 | [diff] [blame] | 420 | end in an odd number of backslashes). Specifically, \emph{a raw |
| 421 | string cannot end in a single backslash} (since the backslash would |
| 422 | escape the following quote character). Note also that a single |
| 423 | backslash followed by a newline is interpreted as those two characters |
| 424 | as part of the string, \emph{not} as a line continuation. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 425 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 426 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 427 | \subsection{String literal concatenation\label{string-catenation}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 428 | |
| 429 | Multiple adjacent string literals (delimited by whitespace), possibly |
| 430 | using different quoting conventions, are allowed, and their meaning is |
| 431 | the same as their concatenation. Thus, \code{"hello" 'world'} is |
| 432 | equivalent to \code{"helloworld"}. This feature can be used to reduce |
| 433 | the number of backslashes needed, to split long strings conveniently |
| 434 | across long lines, or even to add comments to parts of strings, for |
| 435 | example: |
| 436 | |
| 437 | \begin{verbatim} |
| 438 | re.compile("[A-Za-z_]" # letter or underscore |
| 439 | "[A-Za-z0-9_]*" # letter, digit or underscore |
| 440 | ) |
| 441 | \end{verbatim} |
| 442 | |
| 443 | Note that this feature is defined at the syntactical level, but |
| 444 | implemented at compile time. The `+' operator must be used to |
| 445 | concatenate string expressions at run time. Also note that literal |
| 446 | concatenation can use different quoting styles for each component |
| 447 | (even mixing raw strings and triple quoted strings). |
| 448 | |
Fred Drake | 2ed27d3 | 2000-11-17 19:05:12 +0000 | [diff] [blame] | 449 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 450 | \subsection{Numeric literals\label{numbers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 451 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 452 | There are four types of numeric literals: plain integers, long |
| 453 | integers, floating point numbers, and imaginary numbers. There are no |
| 454 | complex literals (complex numbers can be formed by adding a real |
| 455 | number and an imaginary number). |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 456 | \index{number} |
| 457 | \index{numeric literal} |
| 458 | \index{integer literal} |
| 459 | \index{plain integer literal} |
| 460 | \index{long integer literal} |
| 461 | \index{floating point literal} |
| 462 | \index{hexadecimal literal} |
| 463 | \index{octal literal} |
| 464 | \index{decimal literal} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 465 | \index{imaginary literal} |
Fred Drake | ed9e453 | 2002-04-23 20:04:46 +0000 | [diff] [blame] | 466 | \index{complex!literal} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 467 | |
| 468 | Note that numeric literals do not include a sign; a phrase like |
| 469 | \code{-1} is actually an expression composed of the unary operator |
| 470 | `\code{-}' and the literal \code{1}. |
| 471 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 472 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 473 | \subsection{Integer and long integer literals\label{integers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 474 | |
| 475 | Integer and long integer literals are described by the following |
| 476 | lexical definitions: |
| 477 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 478 | \begin{productionlist} |
| 479 | \production{longinteger} |
| 480 | {\token{integer} ("l" | "L")} |
| 481 | \production{integer} |
| 482 | {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}} |
| 483 | \production{decimalinteger} |
| 484 | {\token{nonzerodigit} \token{digit}* | "0"} |
| 485 | \production{octinteger} |
| 486 | {"0" \token{octdigit}+} |
| 487 | \production{hexinteger} |
| 488 | {"0" ("x" | "X") \token{hexdigit}+} |
| 489 | \production{nonzerodigit} |
| 490 | {"1"..."9"} |
| 491 | \production{octdigit} |
| 492 | {"0"..."7"} |
| 493 | \production{hexdigit} |
| 494 | {\token{digit} | "a"..."f" | "A"..."F"} |
| 495 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 496 | |
| 497 | Although both lower case `l' and upper case `L' are allowed as suffix |
| 498 | for long integers, it is strongly recommended to always use `L', since |
| 499 | the letter `l' looks too much like the digit `1'. |
| 500 | |
| 501 | Plain integer decimal literals must be at most 2147483647 (i.e., the |
| 502 | largest positive integer, using 32-bit arithmetic). Plain octal and |
| 503 | hexadecimal literals may be as large as 4294967295, but values larger |
| 504 | than 2147483647 are converted to a negative value by subtracting |
| 505 | 4294967296. There is no limit for long integer literals apart from |
| 506 | what can be stored in available memory. |
| 507 | |
| 508 | Some examples of plain and long integer literals: |
| 509 | |
| 510 | \begin{verbatim} |
| 511 | 7 2147483647 0177 0x80000000 |
| 512 | 3L 79228162514264337593543950336L 0377L 0x100000000L |
| 513 | \end{verbatim} |
| 514 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 515 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 516 | \subsection{Floating point literals\label{floating}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 517 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 518 | Floating point literals are described by the following lexical |
| 519 | definitions: |
| 520 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 521 | \begin{productionlist} |
| 522 | \production{floatnumber} |
| 523 | {\token{pointfloat} | \token{exponentfloat}} |
| 524 | \production{pointfloat} |
| 525 | {[\token{intpart}] \token{fraction} | \token{intpart} "."} |
| 526 | \production{exponentfloat} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 527 | {(\token{intpart} | \token{pointfloat}) |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 528 | \token{exponent}} |
| 529 | \production{intpart} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 530 | {\token{digit}+} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 531 | \production{fraction} |
| 532 | {"." \token{digit}+} |
| 533 | \production{exponent} |
| 534 | {("e" | "E") ["+" | "-"] \token{digit}+} |
| 535 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 536 | |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 537 | Note that the integer and exponent parts of floating point numbers |
| 538 | can look like octal integers, but are interpreted using radix 10. For |
| 539 | example, \samp{077e010} is legal, and denotes the same number |
| 540 | as \samp{77e10}. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 541 | The allowed range of floating point literals is |
| 542 | implementation-dependent. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 543 | Some examples of floating point literals: |
| 544 | |
| 545 | \begin{verbatim} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 546 | 3.14 10. .001 1e100 3.14e-10 0e0 |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 547 | \end{verbatim} |
| 548 | |
| 549 | Note that numeric literals do not include a sign; a phrase like |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 550 | \code{-1} is actually an expression composed of the operator |
| 551 | \code{-} and the literal \code{1}. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 552 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 553 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 554 | \subsection{Imaginary literals\label{imaginary}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 555 | |
| 556 | Imaginary literals are described by the following lexical definitions: |
| 557 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 558 | \begin{productionlist} |
| 559 | \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")} |
| 560 | \end{productionlist} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 561 | |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 562 | An imaginary literal yields a complex number with a real part of |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 563 | 0.0. Complex numbers are represented as a pair of floating point |
| 564 | numbers and have the same restrictions on their range. To create a |
| 565 | complex number with a nonzero real part, add a floating point number |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 566 | to it, e.g., \code{(3+4j)}. Some examples of imaginary literals: |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 567 | |
| 568 | \begin{verbatim} |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 569 | 3.14j 10.j 10j .001j 1e100j 3.14e-10j |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 570 | \end{verbatim} |
| 571 | |
| 572 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 573 | \section{Operators\label{operators}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 574 | |
| 575 | The following tokens are operators: |
| 576 | \index{operators} |
| 577 | |
| 578 | \begin{verbatim} |
Fred Drake | a7d608d | 2001-08-08 05:37:21 +0000 | [diff] [blame] | 579 | + - * ** / // % |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 580 | << >> & | ^ ~ |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 581 | < > <= >= == != <> |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 582 | \end{verbatim} |
| 583 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 584 | The comparison operators \code{<>} and \code{!=} are alternate |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 585 | spellings of the same operator. \code{!=} is the preferred spelling; |
| 586 | \code{<>} is obsolescent. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 587 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 588 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 589 | \section{Delimiters\label{delimiters}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 590 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 591 | The following tokens serve as delimiters in the grammar: |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 592 | \index{delimiters} |
| 593 | |
| 594 | \begin{verbatim} |
| 595 | ( ) [ ] { } |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 596 | , : . ` = ; |
Fred Drake | a7d608d | 2001-08-08 05:37:21 +0000 | [diff] [blame] | 597 | += -= *= /= //= %= |
| 598 | &= |= ^= >>= <<= **= |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 599 | \end{verbatim} |
| 600 | |
| 601 | The period can also occur in floating-point and imaginary literals. A |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 602 | sequence of three periods has a special meaning as an ellipsis in slices. |
Thomas Wouters | 12bba85 | 2000-08-24 20:06:04 +0000 | [diff] [blame] | 603 | The second half of the list, the augmented assignment operators, serve |
| 604 | lexically as delimiters, but also perform an operation. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 605 | |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 606 | The following printing \ASCII{} characters have special meaning as part |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 607 | of other tokens or are otherwise significant to the lexical analyzer: |
| 608 | |
| 609 | \begin{verbatim} |
| 610 | ' " # \ |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 611 | \end{verbatim} |
| 612 | |
| 613 | The following printing \ASCII{} characters are not used in Python. Their |
| 614 | occurrence outside string literals and comments is an unconditional |
| 615 | error: |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 616 | \index{ASCII@\ASCII} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 617 | |
| 618 | \begin{verbatim} |
| 619 | @ $ ? |
| 620 | \end{verbatim} |