Fred Drake | a1cce71 | 1998-07-24 22:12:32 +0000 | [diff] [blame] | 1 | \chapter{Lexical analysis\label{lexical}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 2 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 3 | A Python program is read by a \emph{parser}. Input to the parser is a |
| 4 | stream of \emph{tokens}, generated by the \emph{lexical analyzer}. This |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 5 | chapter describes how the lexical analyzer breaks a file into tokens. |
| 6 | \index{lexical analysis} |
| 7 | \index{parser} |
| 8 | \index{token} |
| 9 | |
Martin v. Löwis | 00f1e3f | 2002-08-04 17:29:52 +0000 | [diff] [blame] | 10 | Python uses the 7-bit \ASCII{} character set for program text. |
| 11 | \versionadded[An encoding declaration can be used to indicate that |
| 12 | string literals and comments use an encoding different from ASCII.]{2.3} |
| 13 | For compatibility with older versions, Python only warns if it finds |
| 14 | 8-bit characters; those warnings should be corrected by either declaring |
| 15 | an explicit encoding, or using escape sequences if those bytes are binary |
| 16 | data, instead of characters. |
| 17 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 18 | |
| 19 | The run-time character set depends on the I/O devices connected to the |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 20 | program but is generally a superset of \ASCII. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 21 | |
| 22 | \strong{Future compatibility note:} It may be tempting to assume that the |
| 23 | character set for 8-bit characters is ISO Latin-1 (an \ASCII{} |
| 24 | superset that covers most western languages that use the Latin |
| 25 | alphabet), but it is possible that in the future Unicode text editors |
| 26 | will become common. These generally use the UTF-8 encoding, which is |
| 27 | also an \ASCII{} superset, but with very different use for the |
| 28 | characters with ordinals 128-255. While there is no consensus on this |
| 29 | subject yet, it is unwise to assume either Latin-1 or UTF-8, even |
| 30 | though the current implementation appears to favor Latin-1. This |
| 31 | applies both to the source character set and the run-time character |
| 32 | set. |
| 33 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 34 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 35 | \section{Line structure\label{line-structure}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 36 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 37 | A Python program is divided into a number of \emph{logical lines}. |
| 38 | \index{line structure} |
| 39 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 40 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 41 | \subsection{Logical lines\label{logical}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 42 | |
| 43 | The end of |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 44 | a logical line is represented by the token NEWLINE. Statements cannot |
| 45 | cross logical line boundaries except where NEWLINE is allowed by the |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 46 | syntax (e.g., between statements in compound statements). |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 47 | A logical line is constructed from one or more \emph{physical lines} |
| 48 | by following the explicit or implicit \emph{line joining} rules. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 49 | \index{logical line} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 50 | \index{physical line} |
| 51 | \index{line joining} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 52 | \index{NEWLINE token} |
| 53 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 54 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 55 | \subsection{Physical lines\label{physical}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 56 | |
| 57 | A physical line ends in whatever the current platform's convention is |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 58 | for terminating lines. On \UNIX, this is the \ASCII{} LF (linefeed) |
Martin v. Löwis | 36a4d8c | 2002-10-10 18:24:54 +0000 | [diff] [blame] | 59 | character. On Windows, it is the \ASCII{} sequence CR LF (return |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 60 | followed by linefeed). On Macintosh, it is the \ASCII{} CR (return) |
| 61 | character. |
| 62 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 63 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 64 | \subsection{Comments\label{comments}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 65 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 66 | A comment starts with a hash character (\code{\#}) that is not part of |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 67 | a string literal, and ends at the end of the physical line. A comment |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 68 | signifies the end of the logical line unless the implicit line joining |
| 69 | rules are invoked. |
| 70 | Comments are ignored by the syntax; they are not tokens. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 71 | \index{comment} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 72 | \index{hash character} |
| 73 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 74 | |
Martin v. Löwis | 00f1e3f | 2002-08-04 17:29:52 +0000 | [diff] [blame] | 75 | \subsection{Encoding declarations\label{encodings}} |
| 76 | |
| 77 | If a comment in the first or second line of the Python script matches |
Fred Drake | 31f3db3 | 2002-08-06 21:36:06 +0000 | [diff] [blame] | 78 | the regular expression \regexp{coding[=:]\e s*([\e w-_.]+)}, this comment is |
Martin v. Löwis | 00f1e3f | 2002-08-04 17:29:52 +0000 | [diff] [blame] | 79 | processed as an encoding declaration; the first group of this |
| 80 | expression names the encoding of the source code file. The recommended |
| 81 | forms of this expression are |
| 82 | |
| 83 | \begin{verbatim} |
| 84 | # -*- coding: <encoding-name> -*- |
| 85 | \end{verbatim} |
| 86 | |
| 87 | which is recognized also by GNU Emacs, and |
| 88 | |
| 89 | \begin{verbatim} |
| 90 | # vim:fileencoding=<encoding-name> |
| 91 | \end{verbatim} |
| 92 | |
| 93 | which is recognized by Bram Moolenar's VIM. In addition, if the first |
Fred Drake | 31f3db3 | 2002-08-06 21:36:06 +0000 | [diff] [blame] | 94 | bytes of the file are the UTF-8 byte-order mark |
| 95 | (\code{'\e xef\e xbb\e xbf'}), the declared file encoding is UTF-8 |
| 96 | (this is supported, among others, by Microsoft's \program{notepad}). |
Martin v. Löwis | 00f1e3f | 2002-08-04 17:29:52 +0000 | [diff] [blame] | 97 | |
| 98 | If an encoding is declared, the encoding name must be recognized by |
| 99 | Python. % XXX there should be a list of supported encodings. |
| 100 | The encoding is used for all lexical analysis, in particular to find |
| 101 | the end of a string, and to interpret the contents of Unicode literals. |
| 102 | String literals are converted to Unicode for syntactical analysis, |
| 103 | then converted back to their original encoding before interpretation |
Martin v. Löwis | f62a89b | 2002-09-03 11:52:44 +0000 | [diff] [blame] | 104 | starts. The encoding declaration must appear on a line of its own. |
Martin v. Löwis | 00f1e3f | 2002-08-04 17:29:52 +0000 | [diff] [blame] | 105 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 106 | \subsection{Explicit line joining\label{explicit-joining}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 107 | |
| 108 | Two or more physical lines may be joined into logical lines using |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 109 | backslash characters (\code{\e}), as follows: when a physical line ends |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 110 | in a backslash that is not part of a string literal or comment, it is |
| 111 | joined with the following forming a single logical line, deleting the |
| 112 | backslash and the following end-of-line character. For example: |
| 113 | \index{physical line} |
| 114 | \index{line joining} |
| 115 | \index{line continuation} |
| 116 | \index{backslash character} |
| 117 | % |
| 118 | \begin{verbatim} |
| 119 | if 1900 < year < 2100 and 1 <= month <= 12 \ |
| 120 | and 1 <= day <= 31 and 0 <= hour < 24 \ |
| 121 | and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date |
| 122 | return 1 |
| 123 | \end{verbatim} |
| 124 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 125 | A line ending in a backslash cannot carry a comment. A backslash does |
| 126 | not continue a comment. A backslash does not continue a token except |
| 127 | for string literals (i.e., tokens other than string literals cannot be |
| 128 | split across physical lines using a backslash). A backslash is |
| 129 | illegal elsewhere on a line outside a string literal. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 130 | |
Fred Drake | c411fa6 | 1999-02-22 14:32:18 +0000 | [diff] [blame] | 131 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 132 | \subsection{Implicit line joining\label{implicit-joining}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 133 | |
| 134 | Expressions in parentheses, square brackets or curly braces can be |
| 135 | split over more than one physical line without using backslashes. |
| 136 | For example: |
| 137 | |
| 138 | \begin{verbatim} |
| 139 | month_names = ['Januari', 'Februari', 'Maart', # These are the |
| 140 | 'April', 'Mei', 'Juni', # Dutch names |
| 141 | 'Juli', 'Augustus', 'September', # for the months |
| 142 | 'Oktober', 'November', 'December'] # of the year |
| 143 | \end{verbatim} |
| 144 | |
| 145 | Implicitly continued lines can carry comments. The indentation of the |
| 146 | continuation lines is not important. Blank continuation lines are |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 147 | allowed. There is no NEWLINE token between implicit continuation |
| 148 | lines. Implicitly continued lines can also occur within triple-quoted |
| 149 | strings (see below); in that case they cannot carry comments. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 150 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 151 | |
Fred Drake | c411fa6 | 1999-02-22 14:32:18 +0000 | [diff] [blame] | 152 | \subsection{Blank lines \index{blank line}\label{blank-lines}} |
| 153 | |
| 154 | A logical line that contains only spaces, tabs, formfeeds and possibly |
| 155 | a comment, is ignored (i.e., no NEWLINE token is generated). During |
| 156 | interactive input of statements, handling of a blank line may differ |
| 157 | depending on the implementation of the read-eval-print loop. In the |
| 158 | standard implementation, an entirely blank logical line (i.e.\ one |
| 159 | containing not even whitespace or a comment) terminates a multi-line |
| 160 | statement. |
| 161 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 162 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 163 | \subsection{Indentation\label{indentation}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 164 | |
| 165 | Leading whitespace (spaces and tabs) at the beginning of a logical |
| 166 | line is used to compute the indentation level of the line, which in |
| 167 | turn is used to determine the grouping of statements. |
| 168 | \index{indentation} |
| 169 | \index{whitespace} |
| 170 | \index{leading whitespace} |
| 171 | \index{space} |
| 172 | \index{tab} |
| 173 | \index{grouping} |
| 174 | \index{statement grouping} |
| 175 | |
| 176 | First, tabs are replaced (from left to right) by one to eight spaces |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 177 | such that the total number of characters up to and including the |
| 178 | replacement is a multiple of |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 179 | eight (this is intended to be the same rule as used by \UNIX). The |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 180 | total number of spaces preceding the first non-blank character then |
| 181 | determines the line's indentation. Indentation cannot be split over |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 182 | multiple physical lines using backslashes; the whitespace up to the |
| 183 | first backslash determines the indentation. |
| 184 | |
| 185 | \strong{Cross-platform compatibility note:} because of the nature of |
| 186 | text editors on non-UNIX platforms, it is unwise to use a mixture of |
| 187 | spaces and tabs for the indentation in a single source file. |
| 188 | |
| 189 | A formfeed character may be present at the start of the line; it will |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 190 | be ignored for the indentation calculations above. Formfeed |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 191 | characters occurring elsewhere in the leading whitespace have an |
| 192 | undefined effect (for instance, they may reset the space count to |
| 193 | zero). |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 194 | |
| 195 | The indentation levels of consecutive lines are used to generate |
| 196 | INDENT and DEDENT tokens, using a stack, as follows. |
| 197 | \index{INDENT token} |
| 198 | \index{DEDENT token} |
| 199 | |
| 200 | Before the first line of the file is read, a single zero is pushed on |
| 201 | the stack; this will never be popped off again. The numbers pushed on |
| 202 | the stack will always be strictly increasing from bottom to top. At |
| 203 | the beginning of each logical line, the line's indentation level is |
| 204 | compared to the top of the stack. If it is equal, nothing happens. |
| 205 | If it is larger, it is pushed on the stack, and one INDENT token is |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 206 | generated. If it is smaller, it \emph{must} be one of the numbers |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 207 | occurring on the stack; all numbers on the stack that are larger are |
| 208 | popped off, and for each number popped off a DEDENT token is |
| 209 | generated. At the end of the file, a DEDENT token is generated for |
| 210 | each number remaining on the stack that is larger than zero. |
| 211 | |
| 212 | Here is an example of a correctly (though confusingly) indented piece |
| 213 | of Python code: |
| 214 | |
| 215 | \begin{verbatim} |
| 216 | def perm(l): |
| 217 | # Compute the list of all permutations of l |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 218 | if len(l) <= 1: |
| 219 | return [l] |
| 220 | r = [] |
| 221 | for i in range(len(l)): |
| 222 | s = l[:i] + l[i+1:] |
| 223 | p = perm(s) |
| 224 | for x in p: |
| 225 | r.append(l[i:i+1] + x) |
| 226 | return r |
| 227 | \end{verbatim} |
| 228 | |
| 229 | The following example shows various indentation errors: |
| 230 | |
| 231 | \begin{verbatim} |
Fred Drake | 1d3e6c1 | 2001-12-11 17:46:38 +0000 | [diff] [blame] | 232 | def perm(l): # error: first line indented |
| 233 | for i in range(len(l)): # error: not indented |
| 234 | s = l[:i] + l[i+1:] |
| 235 | p = perm(l[:i] + l[i+1:]) # error: unexpected indent |
| 236 | for x in p: |
| 237 | r.append(l[i:i+1] + x) |
| 238 | return r # error: inconsistent dedent |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 239 | \end{verbatim} |
| 240 | |
| 241 | (Actually, the first three errors are detected by the parser; only the |
| 242 | last error is found by the lexical analyzer --- the indentation of |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 243 | \code{return r} does not match a level popped off the stack.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 244 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 245 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 246 | \subsection{Whitespace between tokens\label{whitespace}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 247 | |
| 248 | Except at the beginning of a logical line or in string literals, the |
| 249 | whitespace characters space, tab and formfeed can be used |
| 250 | interchangeably to separate tokens. Whitespace is needed between two |
| 251 | tokens only if their concatenation could otherwise be interpreted as a |
| 252 | different token (e.g., ab is one token, but a b is two tokens). |
| 253 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 254 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 255 | \section{Other tokens\label{other-tokens}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 256 | |
| 257 | Besides NEWLINE, INDENT and DEDENT, the following categories of tokens |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 258 | exist: \emph{identifiers}, \emph{keywords}, \emph{literals}, |
| 259 | \emph{operators}, and \emph{delimiters}. |
| 260 | Whitespace characters (other than line terminators, discussed earlier) |
| 261 | are not tokens, but serve to delimit tokens. |
| 262 | Where |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 263 | ambiguity exists, a token comprises the longest possible string that |
| 264 | forms a legal token, when read from left to right. |
| 265 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 266 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 267 | \section{Identifiers and keywords\label{identifiers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 268 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 269 | Identifiers (also referred to as \emph{names}) are described by the following |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 270 | lexical definitions: |
| 271 | \index{identifier} |
| 272 | \index{name} |
| 273 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 274 | \begin{productionlist} |
| 275 | \production{identifier} |
| 276 | {(\token{letter}|"_") (\token{letter} | \token{digit} | "_")*} |
| 277 | \production{letter} |
| 278 | {\token{lowercase} | \token{uppercase}} |
| 279 | \production{lowercase} |
| 280 | {"a"..."z"} |
| 281 | \production{uppercase} |
| 282 | {"A"..."Z"} |
| 283 | \production{digit} |
| 284 | {"0"..."9"} |
| 285 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 286 | |
| 287 | Identifiers are unlimited in length. Case is significant. |
| 288 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 289 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 290 | \subsection{Keywords\label{keywords}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 291 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 292 | The following identifiers are used as reserved words, or |
| 293 | \emph{keywords} of the language, and cannot be used as ordinary |
| 294 | identifiers. They must be spelled exactly as written here:% |
| 295 | \index{keyword}% |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 296 | \index{reserved word} |
| 297 | |
| 298 | \begin{verbatim} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 299 | and del for is raise |
| 300 | assert elif from lambda return |
| 301 | break else global not try |
Guido van Rossum | 41c6719 | 2001-12-04 20:38:44 +0000 | [diff] [blame] | 302 | class except if or while |
| 303 | continue exec import pass yield |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 304 | def finally in print |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 305 | \end{verbatim} |
| 306 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 307 | % When adding keywords, use reswords.py for reformatting |
| 308 | |
Fred Drake | a23b573 | 2002-06-18 19:17:14 +0000 | [diff] [blame] | 309 | Note that although the identifier \code{as} can be used as part of the |
| 310 | syntax of \keyword{import} statements, it is not currently a reserved |
| 311 | word. |
| 312 | |
| 313 | In some future version of Python, the identifiers \code{as} and |
| 314 | \code{None} will both become keywords. |
| 315 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 316 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 317 | \subsection{Reserved classes of identifiers\label{id-classes}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 318 | |
| 319 | Certain classes of identifiers (besides keywords) have special |
| 320 | meanings. These are: |
| 321 | |
Fred Drake | 39fc1bc | 1999-03-05 18:30:21 +0000 | [diff] [blame] | 322 | \begin{tableiii}{l|l|l}{code}{Form}{Meaning}{Notes} |
| 323 | \lineiii{_*}{Not imported by \samp{from \var{module} import *}}{(1)} |
| 324 | \lineiii{__*__}{System-defined name}{} |
| 325 | \lineiii{__*}{Class-private name mangling}{} |
| 326 | \end{tableiii} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 327 | |
| 328 | (XXX need section references here.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 329 | |
Fred Drake | 39fc1bc | 1999-03-05 18:30:21 +0000 | [diff] [blame] | 330 | Note: |
| 331 | |
| 332 | \begin{description} |
| 333 | \item[(1)] The special identifier \samp{_} is used in the interactive |
| 334 | interpreter to store the result of the last evaluation; it is stored |
| 335 | in the \module{__builtin__} module. When not in interactive mode, |
| 336 | \samp{_} has no special meaning and is not defined. |
| 337 | \end{description} |
| 338 | |
| 339 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 340 | \section{Literals\label{literals}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 341 | |
| 342 | Literals are notations for constant values of some built-in types. |
| 343 | \index{literal} |
| 344 | \index{constant} |
| 345 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 346 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 347 | \subsection{String literals\label{strings}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 348 | |
| 349 | String literals are described by the following lexical definitions: |
| 350 | \index{string literal} |
| 351 | |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 352 | \index{ASCII@\ASCII} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 353 | \begin{productionlist} |
| 354 | \production{stringliteral} |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 355 | {[\token{stringprefix}](\token{shortstring} | \token{longstring})} |
| 356 | \production{stringprefix} |
| 357 | {"r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 358 | \production{shortstring} |
| 359 | {"'" \token{shortstringitem}* "'" |
| 360 | | '"' \token{shortstringitem}* '"'} |
| 361 | \production{longstring} |
Fred Drake | 5381588 | 2002-03-15 23:21:37 +0000 | [diff] [blame] | 362 | {"'''" \token{longstringitem}* "'''"} |
| 363 | \productioncont{| '"""' \token{longstringitem}* '"""'} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 364 | \production{shortstringitem} |
| 365 | {\token{shortstringchar} | \token{escapeseq}} |
| 366 | \production{longstringitem} |
| 367 | {\token{longstringchar} | \token{escapeseq}} |
| 368 | \production{shortstringchar} |
| 369 | {<any ASCII character except "\e" or newline or the quote>} |
| 370 | \production{longstringchar} |
Fred Drake | 1d3e6c1 | 2001-12-11 17:46:38 +0000 | [diff] [blame] | 371 | {<any ASCII character except "\e">} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 372 | \production{escapeseq} |
| 373 | {"\e" <any ASCII character>} |
| 374 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 375 | |
Fred Drake | c0cf726 | 2001-08-14 21:43:31 +0000 | [diff] [blame] | 376 | One syntactic restriction not indicated by these productions is that |
| 377 | whitespace is not allowed between the \grammartoken{stringprefix} and |
| 378 | the rest of the string literal. |
| 379 | |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 380 | \index{triple-quoted string} |
| 381 | \index{Unicode Consortium} |
| 382 | \index{string!Unicode} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 383 | In plain English: String literals can be enclosed in matching single |
| 384 | quotes (\code{'}) or double quotes (\code{"}). They can also be |
| 385 | enclosed in matching groups of three single or double quotes (these |
| 386 | are generally referred to as \emph{triple-quoted strings}). The |
| 387 | backslash (\code{\e}) character is used to escape characters that |
| 388 | otherwise have a special meaning, such as newline, backslash itself, |
| 389 | or the quote character. String literals may optionally be prefixed |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 390 | with a letter \character{r} or \character{R}; such strings are called |
| 391 | \dfn{raw strings}\index{raw string} and use different rules for interpreting |
| 392 | backslash escape sequences. A prefix of \character{u} or \character{U} |
| 393 | makes the string a Unicode string. Unicode strings use the Unicode character |
| 394 | set as defined by the Unicode Consortium and ISO~10646. Some additional |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 395 | escape sequences, described below, are available in Unicode strings. |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 396 | The two prefix characters may be combined; in this case, \character{u} must |
| 397 | appear before \character{r}. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 398 | |
| 399 | In triple-quoted strings, |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 400 | unescaped newlines and quotes are allowed (and are retained), except |
| 401 | that three unescaped quotes in a row terminate the string. (A |
| 402 | ``quote'' is the character used to open the string, i.e. either |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 403 | \code{'} or \code{"}.) |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 404 | |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 405 | Unless an \character{r} or \character{R} prefix is present, escape |
| 406 | sequences in strings are interpreted according to rules similar |
Fred Drake | 9079164 | 2001-07-20 15:33:23 +0000 | [diff] [blame] | 407 | to those used by Standard C. The recognized escape sequences are: |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 408 | \index{physical line} |
| 409 | \index{escape sequence} |
| 410 | \index{Standard C} |
| 411 | \index{C} |
| 412 | |
Fred Drake | 3e930ba | 2002-09-24 21:08:37 +0000 | [diff] [blame] | 413 | \begin{tableiii}{l|l|c}{code}{Escape Sequence}{Meaning}{Notes} |
| 414 | \lineiii{\e\var{newline}} {Ignored}{} |
| 415 | \lineiii{\e\e} {Backslash (\code{\e})}{} |
| 416 | \lineiii{\e'} {Single quote (\code{'})}{} |
| 417 | \lineiii{\e"} {Double quote (\code{"})}{} |
| 418 | \lineiii{\e a} {\ASCII{} Bell (BEL)}{} |
| 419 | \lineiii{\e b} {\ASCII{} Backspace (BS)}{} |
| 420 | \lineiii{\e f} {\ASCII{} Formfeed (FF)}{} |
| 421 | \lineiii{\e n} {\ASCII{} Linefeed (LF)}{} |
| 422 | \lineiii{\e N\{\var{name}\}} |
| 423 | {Character named \var{name} in the Unicode database (Unicode only)}{} |
| 424 | \lineiii{\e r} {\ASCII{} Carriage Return (CR)}{} |
| 425 | \lineiii{\e t} {\ASCII{} Horizontal Tab (TAB)}{} |
| 426 | \lineiii{\e u\var{xxxx}} |
| 427 | {Character with 16-bit hex value \var{xxxx} (Unicode only)}{(1)} |
| 428 | \lineiii{\e U\var{xxxxxxxx}} |
| 429 | {Character with 32-bit hex value \var{xxxxxxxx} (Unicode only)}{(2)} |
| 430 | \lineiii{\e v} {\ASCII{} Vertical Tab (VT)}{} |
| 431 | \lineiii{\e\var{ooo}} {\ASCII{} character with octal value \var{ooo}}{(3)} |
| 432 | \lineiii{\e x\var{hh}} {\ASCII{} character with hex value \var{hh}}{(4)} |
| 433 | \end{tableiii} |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 434 | \index{ASCII@\ASCII} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 435 | |
Fred Drake | 3e930ba | 2002-09-24 21:08:37 +0000 | [diff] [blame] | 436 | \noindent |
| 437 | Notes: |
| 438 | |
| 439 | \begin{itemize} |
| 440 | \item[(1)] |
| 441 | Individual code units which form parts of a surrogate pair can be |
| 442 | encoded using this escape sequence. |
| 443 | \item[(2)] |
| 444 | Any Unicode character can be encoded this way, but characters |
| 445 | outside the Basic Multilingual Plane (BMP) will be encoded using a |
| 446 | surrogate pair if Python is compiled to use 16-bit code units (the |
| 447 | default). Individual code units which form parts of a surrogate |
| 448 | pair can be encoded using this escape sequence. |
| 449 | \item[(3)] |
| 450 | As in Standard C, up to three octal digits are accepted. |
| 451 | \item[(4)] |
| 452 | Unlike in Standard C, at most two hex digits are accepted. |
| 453 | \end{itemize} |
| 454 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 455 | |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 456 | Unlike Standard \index{unrecognized escape sequence}C, |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 457 | all unrecognized escape sequences are left in the string unchanged, |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 458 | i.e., \emph{the backslash is left in the string}. (This behavior is |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 459 | useful when debugging: if an escape sequence is mistyped, the |
Fred Drake | dea764d | 2000-12-19 04:52:03 +0000 | [diff] [blame] | 460 | resulting output is more easily recognized as broken.) It is also |
| 461 | important to note that the escape sequences marked as ``(Unicode |
| 462 | only)'' in the table above fall into the category of unrecognized |
| 463 | escapes for non-Unicode string literals. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 464 | |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 465 | When an \character{r} or \character{R} prefix is present, a character |
| 466 | following a backslash is included in the string without change, and \emph{all |
Fred Drake | 347a625 | 2001-01-09 21:38:16 +0000 | [diff] [blame] | 467 | backslashes are left in the string}. For example, the string literal |
| 468 | \code{r"\e n"} consists of two characters: a backslash and a lowercase |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 469 | \character{n}. String quotes can be escaped with a backslash, but the |
| 470 | backslash remains in the string; for example, \code{r"\e""} is a valid string |
Fred Drake | 347a625 | 2001-01-09 21:38:16 +0000 | [diff] [blame] | 471 | literal consisting of two characters: a backslash and a double quote; |
Fred Drake | 0825dc2 | 2001-07-20 14:32:28 +0000 | [diff] [blame] | 472 | \code{r"\e"} is not a valid string literal (even a raw string cannot |
Fred Drake | 347a625 | 2001-01-09 21:38:16 +0000 | [diff] [blame] | 473 | end in an odd number of backslashes). Specifically, \emph{a raw |
| 474 | string cannot end in a single backslash} (since the backslash would |
| 475 | escape the following quote character). Note also that a single |
| 476 | backslash followed by a newline is interpreted as those two characters |
| 477 | as part of the string, \emph{not} as a line continuation. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 478 | |
Fred Drake | f7aa164 | 2002-08-07 13:24:09 +0000 | [diff] [blame] | 479 | When an \character{r} or \character{R} prefix is used in conjunction |
| 480 | with a \character{u} or \character{U} prefix, then the \code{\e uXXXX} |
| 481 | escape sequence is processed while \emph{all other backslashes are |
Fred Drake | 3e930ba | 2002-09-24 21:08:37 +0000 | [diff] [blame] | 482 | left in the string}. For example, the string literal |
| 483 | \code{ur"\e{}u0062\e n"} consists of three Unicode characters: `LATIN |
| 484 | SMALL LETTER B', `REVERSE SOLIDUS', and `LATIN SMALL LETTER N'. |
| 485 | Backslashes can be escaped with a preceding backslash; however, both |
| 486 | remain in the string. As a result, \code{\e uXXXX} escape sequences |
| 487 | are only recognized when there are an odd number of backslashes. |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 488 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 489 | \subsection{String literal concatenation\label{string-catenation}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 490 | |
| 491 | Multiple adjacent string literals (delimited by whitespace), possibly |
| 492 | using different quoting conventions, are allowed, and their meaning is |
| 493 | the same as their concatenation. Thus, \code{"hello" 'world'} is |
| 494 | equivalent to \code{"helloworld"}. This feature can be used to reduce |
| 495 | the number of backslashes needed, to split long strings conveniently |
| 496 | across long lines, or even to add comments to parts of strings, for |
| 497 | example: |
| 498 | |
| 499 | \begin{verbatim} |
| 500 | re.compile("[A-Za-z_]" # letter or underscore |
| 501 | "[A-Za-z0-9_]*" # letter, digit or underscore |
| 502 | ) |
| 503 | \end{verbatim} |
| 504 | |
| 505 | Note that this feature is defined at the syntactical level, but |
| 506 | implemented at compile time. The `+' operator must be used to |
| 507 | concatenate string expressions at run time. Also note that literal |
| 508 | concatenation can use different quoting styles for each component |
| 509 | (even mixing raw strings and triple quoted strings). |
| 510 | |
Fred Drake | 2ed27d3 | 2000-11-17 19:05:12 +0000 | [diff] [blame] | 511 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 512 | \subsection{Numeric literals\label{numbers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 513 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 514 | There are four types of numeric literals: plain integers, long |
| 515 | integers, floating point numbers, and imaginary numbers. There are no |
| 516 | complex literals (complex numbers can be formed by adding a real |
| 517 | number and an imaginary number). |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 518 | \index{number} |
| 519 | \index{numeric literal} |
| 520 | \index{integer literal} |
| 521 | \index{plain integer literal} |
| 522 | \index{long integer literal} |
| 523 | \index{floating point literal} |
| 524 | \index{hexadecimal literal} |
| 525 | \index{octal literal} |
| 526 | \index{decimal literal} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 527 | \index{imaginary literal} |
Fred Drake | ed9e453 | 2002-04-23 20:04:46 +0000 | [diff] [blame] | 528 | \index{complex!literal} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 529 | |
| 530 | Note that numeric literals do not include a sign; a phrase like |
| 531 | \code{-1} is actually an expression composed of the unary operator |
| 532 | `\code{-}' and the literal \code{1}. |
| 533 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 534 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 535 | \subsection{Integer and long integer literals\label{integers}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 536 | |
| 537 | Integer and long integer literals are described by the following |
| 538 | lexical definitions: |
| 539 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 540 | \begin{productionlist} |
| 541 | \production{longinteger} |
| 542 | {\token{integer} ("l" | "L")} |
| 543 | \production{integer} |
| 544 | {\token{decimalinteger} | \token{octinteger} | \token{hexinteger}} |
| 545 | \production{decimalinteger} |
| 546 | {\token{nonzerodigit} \token{digit}* | "0"} |
| 547 | \production{octinteger} |
| 548 | {"0" \token{octdigit}+} |
| 549 | \production{hexinteger} |
| 550 | {"0" ("x" | "X") \token{hexdigit}+} |
| 551 | \production{nonzerodigit} |
| 552 | {"1"..."9"} |
| 553 | \production{octdigit} |
| 554 | {"0"..."7"} |
| 555 | \production{hexdigit} |
| 556 | {\token{digit} | "a"..."f" | "A"..."F"} |
| 557 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 558 | |
Raymond Hettinger | 83dcf5a | 2002-08-07 16:53:17 +0000 | [diff] [blame] | 559 | Although both lower case \character{l} and upper case \character{L} are |
| 560 | allowed as suffix for long integers, it is strongly recommended to always |
| 561 | use \character{L}, since the letter \character{l} looks too much like the |
| 562 | digit \character{1}. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 563 | |
| 564 | Plain integer decimal literals must be at most 2147483647 (i.e., the |
| 565 | largest positive integer, using 32-bit arithmetic). Plain octal and |
| 566 | hexadecimal literals may be as large as 4294967295, but values larger |
| 567 | than 2147483647 are converted to a negative value by subtracting |
| 568 | 4294967296. There is no limit for long integer literals apart from |
| 569 | what can be stored in available memory. |
| 570 | |
| 571 | Some examples of plain and long integer literals: |
| 572 | |
| 573 | \begin{verbatim} |
| 574 | 7 2147483647 0177 0x80000000 |
| 575 | 3L 79228162514264337593543950336L 0377L 0x100000000L |
| 576 | \end{verbatim} |
| 577 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 578 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 579 | \subsection{Floating point literals\label{floating}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 580 | |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 581 | Floating point literals are described by the following lexical |
| 582 | definitions: |
| 583 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 584 | \begin{productionlist} |
| 585 | \production{floatnumber} |
| 586 | {\token{pointfloat} | \token{exponentfloat}} |
| 587 | \production{pointfloat} |
| 588 | {[\token{intpart}] \token{fraction} | \token{intpart} "."} |
| 589 | \production{exponentfloat} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 590 | {(\token{intpart} | \token{pointfloat}) |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 591 | \token{exponent}} |
| 592 | \production{intpart} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 593 | {\token{digit}+} |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 594 | \production{fraction} |
| 595 | {"." \token{digit}+} |
| 596 | \production{exponent} |
| 597 | {("e" | "E") ["+" | "-"] \token{digit}+} |
| 598 | \end{productionlist} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 599 | |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 600 | Note that the integer and exponent parts of floating point numbers |
| 601 | can look like octal integers, but are interpreted using radix 10. For |
| 602 | example, \samp{077e010} is legal, and denotes the same number |
| 603 | as \samp{77e10}. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 604 | The allowed range of floating point literals is |
| 605 | implementation-dependent. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 606 | Some examples of floating point literals: |
| 607 | |
| 608 | \begin{verbatim} |
Tim Peters | d507dab | 2001-08-30 20:51:59 +0000 | [diff] [blame] | 609 | 3.14 10. .001 1e100 3.14e-10 0e0 |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 610 | \end{verbatim} |
| 611 | |
| 612 | Note that numeric literals do not include a sign; a phrase like |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 613 | \code{-1} is actually an expression composed of the operator |
| 614 | \code{-} and the literal \code{1}. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 615 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 616 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 617 | \subsection{Imaginary literals\label{imaginary}} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 618 | |
| 619 | Imaginary literals are described by the following lexical definitions: |
| 620 | |
Fred Drake | cb4638a | 2001-07-06 22:49:53 +0000 | [diff] [blame] | 621 | \begin{productionlist} |
| 622 | \production{imagnumber}{(\token{floatnumber} | \token{intpart}) ("j" | "J")} |
| 623 | \end{productionlist} |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 624 | |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 625 | An imaginary literal yields a complex number with a real part of |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 626 | 0.0. Complex numbers are represented as a pair of floating point |
| 627 | numbers and have the same restrictions on their range. To create a |
| 628 | complex number with a nonzero real part, add a floating point number |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 629 | to it, e.g., \code{(3+4j)}. Some examples of imaginary literals: |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 630 | |
| 631 | \begin{verbatim} |
Guido van Rossum | 7c0240f | 1998-07-24 15:36:43 +0000 | [diff] [blame] | 632 | 3.14j 10.j 10j .001j 1e100j 3.14e-10j |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 633 | \end{verbatim} |
| 634 | |
| 635 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 636 | \section{Operators\label{operators}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 637 | |
| 638 | The following tokens are operators: |
| 639 | \index{operators} |
| 640 | |
| 641 | \begin{verbatim} |
Fred Drake | a7d608d | 2001-08-08 05:37:21 +0000 | [diff] [blame] | 642 | + - * ** / // % |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 643 | << >> & | ^ ~ |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 644 | < > <= >= == != <> |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 645 | \end{verbatim} |
| 646 | |
Fred Drake | 5c07d9b | 1998-05-14 19:37:06 +0000 | [diff] [blame] | 647 | The comparison operators \code{<>} and \code{!=} are alternate |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 648 | spellings of the same operator. \code{!=} is the preferred spelling; |
| 649 | \code{<>} is obsolescent. |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 650 | |
Fred Drake | f5eae66 | 2001-06-23 05:26:52 +0000 | [diff] [blame] | 651 | |
Fred Drake | 61c7728 | 1998-07-28 19:34:22 +0000 | [diff] [blame] | 652 | \section{Delimiters\label{delimiters}} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 653 | |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 654 | The following tokens serve as delimiters in the grammar: |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 655 | \index{delimiters} |
| 656 | |
| 657 | \begin{verbatim} |
| 658 | ( ) [ ] { } |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 659 | , : . ` = ; |
Fred Drake | a7d608d | 2001-08-08 05:37:21 +0000 | [diff] [blame] | 660 | += -= *= /= //= %= |
| 661 | &= |= ^= >>= <<= **= |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 662 | \end{verbatim} |
| 663 | |
| 664 | The period can also occur in floating-point and imaginary literals. A |
Fred Drake | e15956b | 2000-04-03 04:51:13 +0000 | [diff] [blame] | 665 | sequence of three periods has a special meaning as an ellipsis in slices. |
Thomas Wouters | 12bba85 | 2000-08-24 20:06:04 +0000 | [diff] [blame] | 666 | The second half of the list, the augmented assignment operators, serve |
| 667 | lexically as delimiters, but also perform an operation. |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 668 | |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 669 | The following printing \ASCII{} characters have special meaning as part |
Guido van Rossum | 60f2f0c | 1998-06-15 18:00:50 +0000 | [diff] [blame] | 670 | of other tokens or are otherwise significant to the lexical analyzer: |
| 671 | |
| 672 | \begin{verbatim} |
| 673 | ' " # \ |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 674 | \end{verbatim} |
| 675 | |
| 676 | The following printing \ASCII{} characters are not used in Python. Their |
| 677 | occurrence outside string literals and comments is an unconditional |
| 678 | error: |
Fred Drake | c37b65e | 2001-11-28 07:26:15 +0000 | [diff] [blame] | 679 | \index{ASCII@\ASCII} |
Fred Drake | f666917 | 1998-05-06 19:52:49 +0000 | [diff] [blame] | 680 | |
| 681 | \begin{verbatim} |
| 682 | @ $ ? |
| 683 | \end{verbatim} |