Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 1 | .. _tut-informal: |
| 2 | |
| 3 | ********************************** |
| 4 | An Informal Introduction to Python |
| 5 | ********************************** |
| 6 | |
| 7 | In the following examples, input and output are distinguished by the presence or |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 8 | absence of prompts (:term:`>>>` and :term:`...`): to repeat the example, you must type |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 9 | everything after the prompt, when the prompt appears; lines that do not begin |
| 10 | with a prompt are output from the interpreter. Note that a secondary prompt on a |
| 11 | line by itself in an example means you must type a blank line; this is used to |
| 12 | end a multi-line command. |
| 13 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 14 | Many of the examples in this manual, even those entered at the interactive |
| 15 | prompt, include comments. Comments in Python start with the hash character, |
Georg Brandl | 3ce0dee | 2008-09-13 17:18:11 +0000 | [diff] [blame] | 16 | ``#``, and extend to the end of the physical line. A comment may appear at the |
| 17 | start of a line or following whitespace or code, but not within a string |
Georg Brandl | b19be57 | 2007-12-29 10:57:00 +0000 | [diff] [blame] | 18 | literal. A hash character within a string literal is just a hash character. |
Georg Brandl | 3ce0dee | 2008-09-13 17:18:11 +0000 | [diff] [blame] | 19 | Since comments are to clarify code and are not interpreted by Python, they may |
| 20 | be omitted when typing in examples. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 21 | |
| 22 | Some examples:: |
| 23 | |
| 24 | # this is the first comment |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 25 | spam = 1 # and this is the second comment |
| 26 | # ... and now a third! |
| 27 | text = "# This is not a comment because it's inside quotes." |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 28 | |
| 29 | |
| 30 | .. _tut-calculator: |
| 31 | |
| 32 | Using Python as a Calculator |
| 33 | ============================ |
| 34 | |
| 35 | Let's try some simple Python commands. Start the interpreter and wait for the |
| 36 | primary prompt, ``>>>``. (It shouldn't take long.) |
| 37 | |
| 38 | |
| 39 | .. _tut-numbers: |
| 40 | |
| 41 | Numbers |
| 42 | ------- |
| 43 | |
| 44 | The interpreter acts as a simple calculator: you can type an expression at it |
| 45 | and it will write the value. Expression syntax is straightforward: the |
| 46 | operators ``+``, ``-``, ``*`` and ``/`` work just like in most other languages |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 47 | (for example, Pascal or C); parentheses (``()``) can be used for grouping. |
| 48 | For example:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 49 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 50 | >>> 2 + 2 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 51 | 4 |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 52 | >>> 50 - 5*6 |
| 53 | 20 |
| 54 | >>> (50 - 5.0*6) / 4 |
| 55 | 5.0 |
| 56 | >>> 8 / 5.0 |
| 57 | 1.6 |
| 58 | |
| 59 | The integer numbers (e.g. ``2``, ``4``, ``20``) have type :class:`int`, |
| 60 | the ones with a fractional part (e.g. ``5.0``, ``1.6``) have type |
| 61 | :class:`float`. We will see more about numeric types later in the tutorial. |
| 62 | |
| 63 | The return type of a division (``/``) operation depends on its operands. If |
| 64 | both operands are of type :class:`int`, :term:`floor division` is performed |
| 65 | and an :class:`int` is returned. If either operand is a :class:`float`, |
| 66 | classic division is performed and a :class:`float` is returned. The ``//`` |
| 67 | operator is also provided for doing floor division no matter what the |
| 68 | operands are. The remainder can be calculated with the ``%`` operator:: |
| 69 | |
| 70 | >>> 17 / 3 # int / int -> int |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 71 | 5 |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 72 | >>> 17 / 3.0 # int / float -> float |
| 73 | 5.666666666666667 |
| 74 | >>> 17 // 3.0 # explicit floor division discards the fractional part |
| 75 | 5.0 |
| 76 | >>> 17 % 3 # the % operator returns the remainder of the division |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 77 | 2 |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 78 | >>> 5 * 3 + 2 # result * divisor + remainder |
| 79 | 17 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 80 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 81 | With Python, it is possible to use the ``**`` operator to calculate powers [#]_:: |
| 82 | |
| 83 | >>> 5 ** 2 # 5 squared |
| 84 | 25 |
| 85 | >>> 2 ** 7 # 2 to the power of 7 |
| 86 | 128 |
| 87 | |
| 88 | The equal sign (``=``) is used to assign a value to a variable. Afterwards, no |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 89 | result is displayed before the next interactive prompt:: |
| 90 | |
| 91 | >>> width = 20 |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 92 | >>> height = 5 * 9 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 93 | >>> width * height |
| 94 | 900 |
| 95 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 96 | If a variable is not "defined" (assigned a value), trying to use it will |
| 97 | give you an error:: |
Georg Brandl | 3ce0dee | 2008-09-13 17:18:11 +0000 | [diff] [blame] | 98 | |
Chris Jerdonek | 3dec449 | 2012-09-24 19:33:32 -0700 | [diff] [blame] | 99 | >>> n # try to access an undefined variable |
Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 100 | Traceback (most recent call last): |
Georg Brandl | 3ce0dee | 2008-09-13 17:18:11 +0000 | [diff] [blame] | 101 | File "<stdin>", line 1, in <module> |
| 102 | NameError: name 'n' is not defined |
| 103 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 104 | There is full support for floating point; operators with mixed type operands |
| 105 | convert the integer operand to floating point:: |
| 106 | |
| 107 | >>> 3 * 3.75 / 1.5 |
| 108 | 7.5 |
| 109 | >>> 7.0 / 2 |
| 110 | 3.5 |
| 111 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 112 | In interactive mode, the last printed expression is assigned to the variable |
| 113 | ``_``. This means that when you are using Python as a desk calculator, it is |
| 114 | somewhat easier to continue calculations, for example:: |
| 115 | |
| 116 | >>> tax = 12.5 / 100 |
| 117 | >>> price = 100.50 |
| 118 | >>> price * tax |
| 119 | 12.5625 |
| 120 | >>> price + _ |
| 121 | 113.0625 |
| 122 | >>> round(_, 2) |
| 123 | 113.06 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 124 | |
| 125 | This variable should be treated as read-only by the user. Don't explicitly |
| 126 | assign a value to it --- you would create an independent local variable with the |
| 127 | same name masking the built-in variable with its magic behavior. |
| 128 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 129 | In addition to :class:`int` and :class:`float`, Python supports other types of |
| 130 | numbers, such as :class:`~decimal.Decimal` and :class:`~fractions.Fraction`. |
| 131 | Python also has built-in support for :ref:`complex numbers <typesnumeric>`, |
| 132 | and uses the ``j`` or ``J`` suffix to indicate the imaginary part |
| 133 | (e.g. ``3+5j``). |
| 134 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 135 | |
| 136 | .. _tut-strings: |
| 137 | |
| 138 | Strings |
| 139 | ------- |
| 140 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 141 | Besides numbers, Python can also manipulate strings, which can be expressed |
| 142 | in several ways. They can be enclosed in single quotes (``'...'``) or |
| 143 | double quotes (``"..."``) with the same result [#]_. ``\`` can be used |
| 144 | to escape quotes:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 145 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 146 | >>> 'spam eggs' # single quotes |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 147 | 'spam eggs' |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 148 | >>> 'doesn\'t' # use \' to escape the single quote... |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 149 | "doesn't" |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 150 | >>> "doesn't" # ...or use double quotes instead |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 151 | "doesn't" |
| 152 | >>> '"Yes," he said.' |
| 153 | '"Yes," he said.' |
| 154 | >>> "\"Yes,\" he said." |
| 155 | '"Yes," he said.' |
| 156 | >>> '"Isn\'t," she said.' |
| 157 | '"Isn\'t," she said.' |
| 158 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 159 | In the interactive interpreter, the output string is enclosed in quotes and |
| 160 | special characters are escaped with backslashes. While this might sometimes |
| 161 | look different from the input (the enclosing quotes could change), the two |
| 162 | strings are equivalent. The string is enclosed in double quotes if |
| 163 | the string contains a single quote and no double quotes, otherwise it is |
| 164 | enclosed in single quotes. The :keyword:`print` statement produces a more |
| 165 | readable output, by omitting the enclosing quotes and by printing escaped |
| 166 | and special characters:: |
Senthil Kumaran | bf02429 | 2010-11-08 02:12:57 +0000 | [diff] [blame] | 167 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 168 | >>> '"Isn\'t," she said.' |
| 169 | '"Isn\'t," she said.' |
| 170 | >>> print '"Isn\'t," she said.' |
| 171 | "Isn't," she said. |
| 172 | >>> s = 'First line.\nSecond line.' # \n means newline |
| 173 | >>> s # without print(), \n is included in the output |
| 174 | 'First line.\nSecond line.' |
| 175 | >>> print s # with print, \n produces a new line |
| 176 | First line. |
| 177 | Second line. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 178 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 179 | If you don't want characters prefaced by ``\`` to be interpreted as |
| 180 | special characters, you can use *raw strings* by adding an ``r`` before |
| 181 | the first quote:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 182 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 183 | >>> print 'C:\some\name' # here \n means newline! |
| 184 | C:\some |
| 185 | ame |
| 186 | >>> print r'C:\some\name' # note the r before the quote |
| 187 | C:\some\name |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 188 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 189 | String literals can span multiple lines. One way is using triple-quotes: |
| 190 | ``"""..."""`` or ``'''...'''``. End of lines are automatically |
| 191 | included in the string, but it's possible to prevent this by adding a ``\`` at |
| 192 | the end of the line. The following example:: |
Georg Brandl | bf58d80 | 2009-09-03 07:27:26 +0000 | [diff] [blame] | 193 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 194 | print """\ |
Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 195 | Usage: thingy [OPTIONS] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 196 | -h Display this usage message |
| 197 | -H hostname Hostname to connect to |
| 198 | """ |
| 199 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 200 | produces the following output (note that the initial newline is not included): |
Georg Brandl | bf58d80 | 2009-09-03 07:27:26 +0000 | [diff] [blame] | 201 | |
| 202 | .. code-block:: text |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 203 | |
Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 204 | Usage: thingy [OPTIONS] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 205 | -h Display this usage message |
| 206 | -H hostname Hostname to connect to |
| 207 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 208 | Strings can be concatenated (glued together) with the ``+`` operator, and |
| 209 | repeated with ``*``:: |
| 210 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 211 | >>> # 3 times 'un', followed by 'ium' |
| 212 | >>> 3 * 'un' + 'ium' |
| 213 | 'unununium' |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 214 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 215 | Two or more *string literals* (i.e. the ones enclosed between quotes) next |
| 216 | to each other are automatically concatenated. :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 217 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 218 | >>> 'Py' 'thon' |
| 219 | 'Python' |
| 220 | |
| 221 | This only works with two literals though, not with variables or expressions:: |
| 222 | |
| 223 | >>> prefix = 'Py' |
| 224 | >>> prefix 'thon' # can't concatenate a variable and a string literal |
| 225 | ... |
| 226 | SyntaxError: invalid syntax |
| 227 | >>> ('un' * 3) 'ium' |
| 228 | ... |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 229 | SyntaxError: invalid syntax |
| 230 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 231 | If you want to concatenate variables or a variable and a literal, use ``+``:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 232 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 233 | >>> prefix + 'thon' |
| 234 | 'Python' |
| 235 | |
| 236 | This feature is particularly useful when you want to break long strings:: |
| 237 | |
| 238 | >>> text = ('Put several strings within parentheses ' |
| 239 | 'to have them joined together.') |
| 240 | >>> text |
| 241 | 'Put several strings within parentheses to have them joined together.' |
| 242 | |
| 243 | Strings can be *indexed* (subscripted), with the first character having index 0. |
| 244 | There is no separate character type; a character is simply a string of size |
| 245 | one:: |
| 246 | |
| 247 | >>> word = 'Python' |
| 248 | >>> word[0] # character in position 0 |
| 249 | 'P' |
| 250 | >>> word[5] # character in position 5 |
| 251 | 'n' |
| 252 | |
| 253 | Indices may also be negative numbers, to start counting from the right:: |
| 254 | |
| 255 | >>> word[-1] # last character |
| 256 | 'n' |
| 257 | >>> word[-2] # second-last character |
| 258 | 'o' |
| 259 | >>> word[-6] |
| 260 | 'P' |
| 261 | |
| 262 | Note that since -0 is the same as 0, negative indices start from -1. |
| 263 | |
| 264 | In addition to indexing, *slicing* is also supported. While indexing is used |
| 265 | to obtain individual characters, *slicing* allows you to obtain a substring:: |
| 266 | |
| 267 | >>> word[0:2] # characters from position 0 (included) to 2 (excluded) |
| 268 | 'Py' |
| 269 | >>> word[2:5] # characters from position 2 (included) to 5 (excluded) |
| 270 | 'tho' |
| 271 | |
| 272 | Note how the start is always included, and the end always excluded. This |
| 273 | makes sure that ``s[:i] + s[i:]`` is always equal to ``s``:: |
| 274 | |
| 275 | >>> word[:2] + word[2:] |
| 276 | 'Python' |
| 277 | >>> word[:4] + word[4:] |
| 278 | 'Python' |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 279 | |
| 280 | Slice indices have useful defaults; an omitted first index defaults to zero, an |
| 281 | omitted second index defaults to the size of the string being sliced. :: |
| 282 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 283 | >>> word[:2] # character from the beginning to position 2 (excluded) |
| 284 | 'Py' |
| 285 | >>> word[4:] # characters from position 4 (included) to the end |
| 286 | 'on' |
| 287 | >>> word[-2:] # characters from the second-last (included) to the end |
| 288 | 'on' |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 289 | |
| 290 | One way to remember how slices work is to think of the indices as pointing |
| 291 | *between* characters, with the left edge of the first character numbered 0. |
| 292 | Then the right edge of the last character of a string of *n* characters has |
| 293 | index *n*, for example:: |
| 294 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 295 | +---+---+---+---+---+---+ |
| 296 | | P | y | t | h | o | n | |
| 297 | +---+---+---+---+---+---+ |
| 298 | 0 1 2 3 4 5 6 |
| 299 | -6 -5 -4 -3 -2 -1 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 300 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 301 | The first row of numbers gives the position of the indices 0...6 in the string; |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 302 | the second row gives the corresponding negative indices. The slice from *i* to |
| 303 | *j* consists of all characters between the edges labeled *i* and *j*, |
| 304 | respectively. |
| 305 | |
| 306 | For non-negative indices, the length of a slice is the difference of the |
| 307 | indices, if both are within bounds. For example, the length of ``word[1:3]`` is |
| 308 | 2. |
| 309 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 310 | Attempting to use a index that is too large will result in an error:: |
| 311 | |
| 312 | >>> word[42] # the word only has 7 characters |
| 313 | Traceback (most recent call last): |
| 314 | File "<stdin>", line 1, in <module> |
| 315 | IndexError: string index out of range |
| 316 | |
| 317 | However, out of range slice indexes are handled gracefully when used for |
| 318 | slicing:: |
| 319 | |
| 320 | >>> word[4:42] |
| 321 | 'on' |
| 322 | >>> word[42:] |
| 323 | '' |
| 324 | |
| 325 | Python strings cannot be changed --- they are :term:`immutable`. |
| 326 | Therefore, assigning to an indexed position in the string results in an error:: |
| 327 | |
| 328 | >>> word[0] = 'J' |
| 329 | ... |
| 330 | TypeError: 'str' object does not support item assignment |
| 331 | >>> word[2:] = 'py' |
| 332 | ... |
| 333 | TypeError: 'str' object does not support item assignment |
| 334 | |
| 335 | If you need a different string, you should create a new one:: |
| 336 | |
| 337 | >>> 'J' + word[1:] |
| 338 | 'Jython' |
| 339 | >>> word[:2] + 'py' |
| 340 | 'Pypy' |
| 341 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 342 | The built-in function :func:`len` returns the length of a string:: |
| 343 | |
| 344 | >>> s = 'supercalifragilisticexpialidocious' |
| 345 | >>> len(s) |
| 346 | 34 |
| 347 | |
| 348 | |
| 349 | .. seealso:: |
| 350 | |
| 351 | :ref:`typesseq` |
| 352 | Strings, and the Unicode strings described in the next section, are |
| 353 | examples of *sequence types*, and support the common operations supported |
| 354 | by such types. |
| 355 | |
| 356 | :ref:`string-methods` |
| 357 | Both strings and Unicode strings support a large number of methods for |
| 358 | basic transformations and searching. |
| 359 | |
Benjamin Peterson | f9ef988 | 2008-05-26 00:54:22 +0000 | [diff] [blame] | 360 | :ref:`new-string-formatting` |
| 361 | Information about string formatting with :meth:`str.format` is described |
| 362 | here. |
| 363 | |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 364 | :ref:`string-formatting` |
Benjamin Peterson | f9ef988 | 2008-05-26 00:54:22 +0000 | [diff] [blame] | 365 | The old formatting operations invoked when strings and Unicode strings are |
| 366 | the left operand of the ``%`` operator are described in more detail here. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 367 | |
| 368 | |
| 369 | .. _tut-unicodestrings: |
| 370 | |
| 371 | Unicode Strings |
| 372 | --------------- |
| 373 | |
| 374 | .. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com> |
| 375 | |
| 376 | |
| 377 | Starting with Python 2.0 a new data type for storing text data is available to |
| 378 | the programmer: the Unicode object. It can be used to store and manipulate |
| 379 | Unicode data (see http://www.unicode.org/) and integrates well with the existing |
| 380 | string objects, providing auto-conversions where necessary. |
| 381 | |
| 382 | Unicode has the advantage of providing one ordinal for every character in every |
| 383 | script used in modern and ancient texts. Previously, there were only 256 |
| 384 | possible ordinals for script characters. Texts were typically bound to a code |
| 385 | page which mapped the ordinals to script characters. This lead to very much |
| 386 | confusion especially with respect to internationalization (usually written as |
| 387 | ``i18n`` --- ``'i'`` + 18 characters + ``'n'``) of software. Unicode solves |
| 388 | these problems by defining one code page for all scripts. |
| 389 | |
| 390 | Creating Unicode strings in Python is just as simple as creating normal |
| 391 | strings:: |
| 392 | |
| 393 | >>> u'Hello World !' |
| 394 | u'Hello World !' |
| 395 | |
| 396 | The small ``'u'`` in front of the quote indicates that a Unicode string is |
| 397 | supposed to be created. If you want to include special characters in the string, |
| 398 | you can do so by using the Python *Unicode-Escape* encoding. The following |
| 399 | example shows how:: |
| 400 | |
| 401 | >>> u'Hello\u0020World !' |
| 402 | u'Hello World !' |
| 403 | |
| 404 | The escape sequence ``\u0020`` indicates to insert the Unicode character with |
| 405 | the ordinal value 0x0020 (the space character) at the given position. |
| 406 | |
| 407 | Other characters are interpreted by using their respective ordinal values |
| 408 | directly as Unicode ordinals. If you have literal strings in the standard |
| 409 | Latin-1 encoding that is used in many Western countries, you will find it |
| 410 | convenient that the lower 256 characters of Unicode are the same as the 256 |
| 411 | characters of Latin-1. |
| 412 | |
| 413 | For experts, there is also a raw mode just like the one for normal strings. You |
| 414 | have to prefix the opening quote with 'ur' to have Python use the |
| 415 | *Raw-Unicode-Escape* encoding. It will only apply the above ``\uXXXX`` |
| 416 | conversion if there is an uneven number of backslashes in front of the small |
| 417 | 'u'. :: |
| 418 | |
| 419 | >>> ur'Hello\u0020World !' |
| 420 | u'Hello World !' |
| 421 | >>> ur'Hello\\u0020World !' |
| 422 | u'Hello\\\\u0020World !' |
| 423 | |
| 424 | The raw mode is most useful when you have to enter lots of backslashes, as can |
| 425 | be necessary in regular expressions. |
| 426 | |
| 427 | Apart from these standard encodings, Python provides a whole set of other ways |
| 428 | of creating Unicode strings on the basis of a known encoding. |
| 429 | |
| 430 | .. index:: builtin: unicode |
| 431 | |
| 432 | The built-in function :func:`unicode` provides access to all registered Unicode |
| 433 | codecs (COders and DECoders). Some of the more well known encodings which these |
| 434 | codecs can convert are *Latin-1*, *ASCII*, *UTF-8*, and *UTF-16*. The latter two |
| 435 | are variable-length encodings that store each Unicode character in one or more |
| 436 | bytes. The default encoding is normally set to ASCII, which passes through |
| 437 | characters in the range 0 to 127 and rejects any other characters with an error. |
| 438 | When a Unicode string is printed, written to a file, or converted with |
| 439 | :func:`str`, conversion takes place using this default encoding. :: |
| 440 | |
| 441 | >>> u"abc" |
| 442 | u'abc' |
| 443 | >>> str(u"abc") |
| 444 | 'abc' |
| 445 | >>> u"äöü" |
| 446 | u'\xe4\xf6\xfc' |
| 447 | >>> str(u"äöü") |
| 448 | Traceback (most recent call last): |
| 449 | File "<stdin>", line 1, in ? |
| 450 | UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) |
| 451 | |
| 452 | To convert a Unicode string into an 8-bit string using a specific encoding, |
| 453 | Unicode objects provide an :func:`encode` method that takes one argument, the |
| 454 | name of the encoding. Lowercase names for encodings are preferred. :: |
| 455 | |
| 456 | >>> u"äöü".encode('utf-8') |
| 457 | '\xc3\xa4\xc3\xb6\xc3\xbc' |
| 458 | |
| 459 | If you have data in a specific encoding and want to produce a corresponding |
| 460 | Unicode string from it, you can use the :func:`unicode` function with the |
| 461 | encoding name as the second argument. :: |
| 462 | |
| 463 | >>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8') |
| 464 | u'\xe4\xf6\xfc' |
| 465 | |
| 466 | |
| 467 | .. _tut-lists: |
| 468 | |
| 469 | Lists |
| 470 | ----- |
| 471 | |
| 472 | Python knows a number of *compound* data types, used to group together other |
| 473 | values. The most versatile is the *list*, which can be written as a list of |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 474 | comma-separated values (items) between square brackets. Lists might contain |
| 475 | items of different types, but usually the items all have the same type. :: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 476 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 477 | >>> squares = [1, 4, 9, 16, 25] |
| 478 | >>> squares |
| 479 | [1, 4, 9, 16, 25] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 480 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 481 | Like strings (and all other built-in :term:`sequence` type), lists can be |
| 482 | indexed and sliced:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 483 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 484 | >>> squares[0] # indexing returns the item |
| 485 | 1 |
| 486 | >>> squares[-1] |
| 487 | 25 |
| 488 | >>> squares[-3:] # slicing returns a new list |
| 489 | [9, 16, 25] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 490 | |
Georg Brandl | 0fcd882 | 2010-03-21 09:17:41 +0000 | [diff] [blame] | 491 | All slice operations return a new list containing the requested elements. This |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 492 | means that the following slice returns a new (shallow) copy of the list:: |
Georg Brandl | 0fcd882 | 2010-03-21 09:17:41 +0000 | [diff] [blame] | 493 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 494 | >>> squares[:] |
| 495 | [1, 4, 9, 16, 25] |
Georg Brandl | 0fcd882 | 2010-03-21 09:17:41 +0000 | [diff] [blame] | 496 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 497 | Lists also supports operations like concatenation:: |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 498 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 499 | >>> squares + [36, 49, 64, 81, 100] |
| 500 | [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] |
| 501 | |
| 502 | Unlike strings, which are :term:`immutable`, lists are a :term:`mutable` |
| 503 | type, i.e. it is possible to change their content:: |
| 504 | |
| 505 | >>> cubes = [1, 8, 27, 65, 125] # something's wrong here |
| 506 | >>> 4 ** 3 # the cube of 4 is 64, not 65! |
| 507 | 64 |
| 508 | >>> cubes[3] = 64 # replace the wrong value |
| 509 | >>> cubes |
| 510 | [1, 8, 27, 64, 125] |
| 511 | |
| 512 | You can also add new items at the end of the list, by using |
| 513 | the :meth:`~list.append` *method* (we will see more about methods later):: |
| 514 | |
| 515 | >>> cubes.append(216) # add the cube of 6 |
| 516 | >>> cubes.append(7 ** 3) # and the cube of 7 |
| 517 | >>> cubes |
| 518 | [1, 8, 27, 64, 125, 216, 343] |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 519 | |
| 520 | Assignment to slices is also possible, and this can even change the size of the |
| 521 | list or clear it entirely:: |
| 522 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 523 | >>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] |
| 524 | >>> letters |
| 525 | ['a', 'b', 'c', 'd', 'e', 'f', 'g'] |
| 526 | >>> # replace some values |
| 527 | >>> letters[2:5] = ['C', 'D', 'E'] |
| 528 | >>> letters |
| 529 | ['a', 'b', 'C', 'D', 'E', 'f', 'g'] |
| 530 | >>> # now remove them |
| 531 | >>> letters[2:5] = [] |
| 532 | >>> letters |
| 533 | ['a', 'b', 'f', 'g'] |
| 534 | >>> # clear the list by replacing all the elements with an empty list |
| 535 | >>> letters[:] = [] |
| 536 | >>> letters |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 537 | [] |
| 538 | |
| 539 | The built-in function :func:`len` also applies to lists:: |
| 540 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 541 | >>> letters = ['a', 'b', 'c', 'd'] |
| 542 | >>> len(letters) |
Georg Brandl | 87426cb | 2007-11-09 13:08:48 +0000 | [diff] [blame] | 543 | 4 |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 544 | |
| 545 | It is possible to nest lists (create lists containing other lists), for |
| 546 | example:: |
| 547 | |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 548 | >>> a = ['a', 'b', 'c'] |
| 549 | >>> n = [1, 2, 3] |
| 550 | >>> x = [a, n] |
| 551 | >>> x |
| 552 | [['a', 'b', 'c'], [1, 2, 3]] |
| 553 | >>> x[0] |
| 554 | ['a', 'b', 'c'] |
| 555 | >>> x[0][1] |
| 556 | 'b' |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 557 | |
| 558 | .. _tut-firststeps: |
| 559 | |
| 560 | First Steps Towards Programming |
| 561 | =============================== |
| 562 | |
| 563 | Of course, we can use Python for more complicated tasks than adding two and two |
| 564 | together. For instance, we can write an initial sub-sequence of the *Fibonacci* |
| 565 | series as follows:: |
| 566 | |
| 567 | >>> # Fibonacci series: |
| 568 | ... # the sum of two elements defines the next |
| 569 | ... a, b = 0, 1 |
| 570 | >>> while b < 10: |
Georg Brandl | 35f8861 | 2008-01-06 22:05:40 +0000 | [diff] [blame] | 571 | ... print b |
| 572 | ... a, b = b, a+b |
Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 573 | ... |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 574 | 1 |
| 575 | 1 |
| 576 | 2 |
| 577 | 3 |
| 578 | 5 |
| 579 | 8 |
| 580 | |
| 581 | This example introduces several new features. |
| 582 | |
| 583 | * The first line contains a *multiple assignment*: the variables ``a`` and ``b`` |
| 584 | simultaneously get the new values 0 and 1. On the last line this is used again, |
| 585 | demonstrating that the expressions on the right-hand side are all evaluated |
| 586 | first before any of the assignments take place. The right-hand side expressions |
| 587 | are evaluated from the left to the right. |
| 588 | |
| 589 | * The :keyword:`while` loop executes as long as the condition (here: ``b < 10``) |
| 590 | remains true. In Python, like in C, any non-zero integer value is true; zero is |
| 591 | false. The condition may also be a string or list value, in fact any sequence; |
| 592 | anything with a non-zero length is true, empty sequences are false. The test |
| 593 | used in the example is a simple comparison. The standard comparison operators |
| 594 | are written the same as in C: ``<`` (less than), ``>`` (greater than), ``==`` |
| 595 | (equal to), ``<=`` (less than or equal to), ``>=`` (greater than or equal to) |
| 596 | and ``!=`` (not equal to). |
| 597 | |
| 598 | * The *body* of the loop is *indented*: indentation is Python's way of grouping |
Georg Brandl | 2c9eee1 | 2011-12-25 19:03:07 +0100 | [diff] [blame] | 599 | statements. At the interactive prompt, you have to type a tab or space(s) for |
| 600 | each indented line. In practice you will prepare more complicated input |
| 601 | for Python with a text editor; all decent text editors have an auto-indent |
| 602 | facility. When a compound statement is entered interactively, it must be |
| 603 | followed by a blank line to indicate completion (since the parser cannot |
| 604 | guess when you have typed the last line). Note that each line within a basic |
| 605 | block must be indented by the same amount. |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 606 | |
| 607 | * The :keyword:`print` statement writes the value of the expression(s) it is |
| 608 | given. It differs from just writing the expression you want to write (as we did |
| 609 | earlier in the calculator examples) in the way it handles multiple expressions |
| 610 | and strings. Strings are printed without quotes, and a space is inserted |
| 611 | between items, so you can format things nicely, like this:: |
| 612 | |
| 613 | >>> i = 256*256 |
| 614 | >>> print 'The value of i is', i |
| 615 | The value of i is 65536 |
| 616 | |
| 617 | A trailing comma avoids the newline after the output:: |
| 618 | |
| 619 | >>> a, b = 0, 1 |
| 620 | >>> while b < 1000: |
| 621 | ... print b, |
| 622 | ... a, b = b, a+b |
Georg Brandl | c62ef8b | 2009-01-03 20:55:06 +0000 | [diff] [blame] | 623 | ... |
Georg Brandl | 8ec7f65 | 2007-08-15 14:28:01 +0000 | [diff] [blame] | 624 | 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 |
| 625 | |
| 626 | Note that the interpreter inserts a newline before it prints the next prompt if |
| 627 | the last line was not completed. |
Zachary Ware | 5b1b38cb | 2014-07-01 14:25:34 -0500 | [diff] [blame] | 628 | |
| 629 | .. rubric:: Footnotes |
| 630 | |
| 631 | .. [#] Since ``**`` has higher precedence than ``-``, ``-3**2`` will be |
| 632 | interpreted as ``-(3**2)`` and thus result in ``-9``. To avoid this |
| 633 | and get ``9``, you can use ``(-3)**2``. |
| 634 | |
| 635 | .. [#] Unlike other languages, special characters such as ``\n`` have the |
| 636 | same meaning with both single (``'...'``) and double (``"..."``) quotes. |
| 637 | The only difference between the two is that within single quotes you don't |
| 638 | need to escape ``"`` (but you have to escape ``\'``) and vice versa. |