blob: e209bfcc0e84e23adab643b1000c0b1b28fdab9d [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tut-informal:
2
3**********************************
4An Informal Introduction to Python
5**********************************
6
7In the following examples, input and output are distinguished by the presence or
8absence of prompts (``>>>`` and ``...``): to repeat the example, you must type
9everything after the prompt, when the prompt appears; lines that do not begin
10with a prompt are output from the interpreter. Note that a secondary prompt on a
11line by itself in an example means you must type a blank line; this is used to
12end a multi-line command.
13
14.. %
15.. % \footnote{
16.. % I'd prefer to use different fonts to distinguish input
17.. % from output, but the amount of LaTeX hacking that would require
18.. % is currently beyond my ability.
19.. % }
20
21Many of the examples in this manual, even those entered at the interactive
22prompt, include comments. Comments in Python start with the hash character,
23``'#'``, and extend to the end of the physical line. A comment may appear at
24the start of a line or following whitespace or code, but not within a string
25literal. A hash character within a string literal is just a hash character.
26
27Some examples::
28
29 # this is the first comment
30 SPAM = 1 # and this is the second comment
31 # ... and now a third!
32 STRING = "# This is not a comment."
33
34
35.. _tut-calculator:
36
37Using Python as a Calculator
38============================
39
40Let's try some simple Python commands. Start the interpreter and wait for the
41primary prompt, ``>>>``. (It shouldn't take long.)
42
43
44.. _tut-numbers:
45
46Numbers
47-------
48
49The interpreter acts as a simple calculator: you can type an expression at it
50and it will write the value. Expression syntax is straightforward: the
51operators ``+``, ``-``, ``*`` and ``/`` work just like in most other languages
52(for example, Pascal or C); parentheses can be used for grouping. For example::
53
54 >>> 2+2
55 4
56 >>> # This is a comment
57 ... 2+2
58 4
59 >>> 2+2 # and a comment on the same line as code
60 4
61 >>> (50-5*6)/4
62 5
63 >>> # Integer division returns the floor:
64 ... 7/3
65 2
66 >>> 7/-3
67 -3
68
69The equal sign (``'='``) is used to assign a value to a variable. Afterwards, no
70result is displayed before the next interactive prompt::
71
72 >>> width = 20
73 >>> height = 5*9
74 >>> width * height
75 900
76
77A value can be assigned to several variables simultaneously::
78
79 >>> x = y = z = 0 # Zero x, y and z
80 >>> x
81 0
82 >>> y
83 0
84 >>> z
85 0
86
87There is full support for floating point; operators with mixed type operands
88convert the integer operand to floating point::
89
90 >>> 3 * 3.75 / 1.5
91 7.5
92 >>> 7.0 / 2
93 3.5
94
95Complex numbers are also supported; imaginary numbers are written with a suffix
96of ``j`` or ``J``. Complex numbers with a nonzero real component are written as
97``(real+imagj)``, or can be created with the ``complex(real, imag)`` function.
98::
99
100 >>> 1j * 1J
101 (-1+0j)
102 >>> 1j * complex(0,1)
103 (-1+0j)
104 >>> 3+1j*3
105 (3+3j)
106 >>> (3+1j)*3
107 (9+3j)
108 >>> (1+2j)/(1+1j)
109 (1.5+0.5j)
110
111Complex numbers are always represented as two floating point numbers, the real
112and imaginary part. To extract these parts from a complex number *z*, use
113``z.real`` and ``z.imag``. ::
114
115 >>> a=1.5+0.5j
116 >>> a.real
117 1.5
118 >>> a.imag
119 0.5
120
121The conversion functions to floating point and integer (:func:`float`,
122:func:`int` and :func:`long`) don't work for complex numbers --- there is no one
123correct way to convert a complex number to a real number. Use ``abs(z)`` to get
124its magnitude (as a float) or ``z.real`` to get its real part. ::
125
126 >>> a=3.0+4.0j
127 >>> float(a)
128 Traceback (most recent call last):
129 File "<stdin>", line 1, in ?
130 TypeError: can't convert complex to float; use abs(z)
131 >>> a.real
132 3.0
133 >>> a.imag
134 4.0
135 >>> abs(a) # sqrt(a.real**2 + a.imag**2)
136 5.0
137 >>>
138
139In interactive mode, the last printed expression is assigned to the variable
140``_``. This means that when you are using Python as a desk calculator, it is
141somewhat easier to continue calculations, for example::
142
143 >>> tax = 12.5 / 100
144 >>> price = 100.50
145 >>> price * tax
146 12.5625
147 >>> price + _
148 113.0625
149 >>> round(_, 2)
150 113.06
151 >>>
152
153This variable should be treated as read-only by the user. Don't explicitly
154assign a value to it --- you would create an independent local variable with the
155same name masking the built-in variable with its magic behavior.
156
157
158.. _tut-strings:
159
160Strings
161-------
162
163Besides numbers, Python can also manipulate strings, which can be expressed in
164several ways. They can be enclosed in single quotes or double quotes::
165
166 >>> 'spam eggs'
167 'spam eggs'
168 >>> 'doesn\'t'
169 "doesn't"
170 >>> "doesn't"
171 "doesn't"
172 >>> '"Yes," he said.'
173 '"Yes," he said.'
174 >>> "\"Yes,\" he said."
175 '"Yes," he said.'
176 >>> '"Isn\'t," she said.'
177 '"Isn\'t," she said.'
178
179String literals can span multiple lines in several ways. Continuation lines can
180be used, with a backslash as the last character on the line indicating that the
181next line is a logical continuation of the line::
182
183 hello = "This is a rather long string containing\n\
184 several lines of text just as you would do in C.\n\
185 Note that whitespace at the beginning of the line is\
186 significant."
187
188 print hello
189
190Note that newlines still need to be embedded in the string using ``\n``; the
191newline following the trailing backslash is discarded. This example would print
192the following::
193
194 This is a rather long string containing
195 several lines of text just as you would do in C.
196 Note that whitespace at the beginning of the line is significant.
197
198If we make the string literal a "raw" string, however, the ``\n`` sequences are
199not converted to newlines, but the backslash at the end of the line, and the
200newline character in the source, are both included in the string as data. Thus,
201the example::
202
203 hello = r"This is a rather long string containing\n\
204 several lines of text much as you would do in C."
205
206 print hello
207
208would print::
209
210 This is a rather long string containing\n\
211 several lines of text much as you would do in C.
212
213Or, strings can be surrounded in a pair of matching triple-quotes: ``"""`` or
214``'''``. End of lines do not need to be escaped when using triple-quotes, but
215they will be included in the string. ::
216
217 print """
218 Usage: thingy [OPTIONS]
219 -h Display this usage message
220 -H hostname Hostname to connect to
221 """
222
223produces the following output::
224
225 Usage: thingy [OPTIONS]
226 -h Display this usage message
227 -H hostname Hostname to connect to
228
229The interpreter prints the result of string operations in the same way as they
230are typed for input: inside quotes, and with quotes and other funny characters
231escaped by backslashes, to show the precise value. The string is enclosed in
232double quotes if the string contains a single quote and no double quotes, else
233it's enclosed in single quotes. (The :keyword:`print` statement, described
234later, can be used to write strings without quotes or escapes.)
235
236Strings can be concatenated (glued together) with the ``+`` operator, and
237repeated with ``*``::
238
239 >>> word = 'Help' + 'A'
240 >>> word
241 'HelpA'
242 >>> '<' + word*5 + '>'
243 '<HelpAHelpAHelpAHelpAHelpA>'
244
245Two string literals next to each other are automatically concatenated; the first
246line above could also have been written ``word = 'Help' 'A'``; this only works
247with two literals, not with arbitrary string expressions::
248
249 >>> 'str' 'ing' # <- This is ok
250 'string'
251 >>> 'str'.strip() + 'ing' # <- This is ok
252 'string'
253 >>> 'str'.strip() 'ing' # <- This is invalid
254 File "<stdin>", line 1, in ?
255 'str'.strip() 'ing'
256 ^
257 SyntaxError: invalid syntax
258
259Strings can be subscripted (indexed); like in C, the first character of a string
260has subscript (index) 0. There is no separate character type; a character is
261simply a string of size one. Like in Icon, substrings can be specified with the
262*slice notation*: two indices separated by a colon. ::
263
264 >>> word[4]
265 'A'
266 >>> word[0:2]
267 'He'
268 >>> word[2:4]
269 'lp'
270
271Slice indices have useful defaults; an omitted first index defaults to zero, an
272omitted second index defaults to the size of the string being sliced. ::
273
274 >>> word[:2] # The first two characters
275 'He'
276 >>> word[2:] # Everything except the first two characters
277 'lpA'
278
279Unlike a C string, Python strings cannot be changed. Assigning to an indexed
280position in the string results in an error::
281
282 >>> word[0] = 'x'
283 Traceback (most recent call last):
284 File "<stdin>", line 1, in ?
285 TypeError: object doesn't support item assignment
286 >>> word[:1] = 'Splat'
287 Traceback (most recent call last):
288 File "<stdin>", line 1, in ?
289 TypeError: object doesn't support slice assignment
290
291However, creating a new string with the combined content is easy and efficient::
292
293 >>> 'x' + word[1:]
294 'xelpA'
295 >>> 'Splat' + word[4]
296 'SplatA'
297
298Here's a useful invariant of slice operations: ``s[:i] + s[i:]`` equals ``s``.
299::
300
301 >>> word[:2] + word[2:]
302 'HelpA'
303 >>> word[:3] + word[3:]
304 'HelpA'
305
306Degenerate slice indices are handled gracefully: an index that is too large is
307replaced by the string size, an upper bound smaller than the lower bound returns
308an empty string. ::
309
310 >>> word[1:100]
311 'elpA'
312 >>> word[10:]
313 ''
314 >>> word[2:1]
315 ''
316
317Indices may be negative numbers, to start counting from the right. For example::
318
319 >>> word[-1] # The last character
320 'A'
321 >>> word[-2] # The last-but-one character
322 'p'
323 >>> word[-2:] # The last two characters
324 'pA'
325 >>> word[:-2] # Everything except the last two characters
326 'Hel'
327
328But note that -0 is really the same as 0, so it does not count from the right!
329::
330
331 >>> word[-0] # (since -0 equals 0)
332 'H'
333
334Out-of-range negative slice indices are truncated, but don't try this for
335single-element (non-slice) indices::
336
337 >>> word[-100:]
338 'HelpA'
339 >>> word[-10] # error
340 Traceback (most recent call last):
341 File "<stdin>", line 1, in ?
342 IndexError: string index out of range
343
344One way to remember how slices work is to think of the indices as pointing
345*between* characters, with the left edge of the first character numbered 0.
346Then the right edge of the last character of a string of *n* characters has
347index *n*, for example::
348
349 +---+---+---+---+---+
350 | H | e | l | p | A |
351 +---+---+---+---+---+
352 0 1 2 3 4 5
353 -5 -4 -3 -2 -1
354
355The first row of numbers gives the position of the indices 0...5 in the string;
356the second row gives the corresponding negative indices. The slice from *i* to
357*j* consists of all characters between the edges labeled *i* and *j*,
358respectively.
359
360For non-negative indices, the length of a slice is the difference of the
361indices, if both are within bounds. For example, the length of ``word[1:3]`` is
3622.
363
364The built-in function :func:`len` returns the length of a string::
365
366 >>> s = 'supercalifragilisticexpialidocious'
367 >>> len(s)
368 34
369
370
371.. seealso::
372
373 :ref:`typesseq`
374 Strings, and the Unicode strings described in the next section, are
375 examples of *sequence types*, and support the common operations supported
376 by such types.
377
378 :ref:`string-methods`
379 Both strings and Unicode strings support a large number of methods for
380 basic transformations and searching.
381
382 :ref:`string-formatting`
383 The formatting operations invoked when strings and Unicode strings are the
384 left operand of the ``%`` operator are described in more detail here.
385
386
387.. _tut-unicodestrings:
388
389Unicode Strings
390---------------
391
392.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
393
394
395Starting with Python 2.0 a new data type for storing text data is available to
396the programmer: the Unicode object. It can be used to store and manipulate
397Unicode data (see http://www.unicode.org/) and integrates well with the existing
398string objects, providing auto-conversions where necessary.
399
400Unicode has the advantage of providing one ordinal for every character in every
401script used in modern and ancient texts. Previously, there were only 256
402possible ordinals for script characters. Texts were typically bound to a code
403page which mapped the ordinals to script characters. This lead to very much
404confusion especially with respect to internationalization (usually written as
405``i18n`` --- ``'i'`` + 18 characters + ``'n'``) of software. Unicode solves
406these problems by defining one code page for all scripts.
407
408Creating Unicode strings in Python is just as simple as creating normal
409strings::
410
411 >>> u'Hello World !'
412 u'Hello World !'
413
414The small ``'u'`` in front of the quote indicates that a Unicode string is
415supposed to be created. If you want to include special characters in the string,
416you can do so by using the Python *Unicode-Escape* encoding. The following
417example shows how::
418
419 >>> u'Hello\u0020World !'
420 u'Hello World !'
421
422The escape sequence ``\u0020`` indicates to insert the Unicode character with
423the ordinal value 0x0020 (the space character) at the given position.
424
425Other characters are interpreted by using their respective ordinal values
426directly as Unicode ordinals. If you have literal strings in the standard
427Latin-1 encoding that is used in many Western countries, you will find it
428convenient that the lower 256 characters of Unicode are the same as the 256
429characters of Latin-1.
430
431For experts, there is also a raw mode just like the one for normal strings. You
432have to prefix the opening quote with 'ur' to have Python use the
433*Raw-Unicode-Escape* encoding. It will only apply the above ``\uXXXX``
434conversion if there is an uneven number of backslashes in front of the small
435'u'. ::
436
437 >>> ur'Hello\u0020World !'
438 u'Hello World !'
439 >>> ur'Hello\\u0020World !'
440 u'Hello\\\\u0020World !'
441
442The raw mode is most useful when you have to enter lots of backslashes, as can
443be necessary in regular expressions.
444
445Apart from these standard encodings, Python provides a whole set of other ways
446of creating Unicode strings on the basis of a known encoding.
447
448.. index:: builtin: unicode
449
450The built-in function :func:`unicode` provides access to all registered Unicode
451codecs (COders and DECoders). Some of the more well known encodings which these
452codecs can convert are *Latin-1*, *ASCII*, *UTF-8*, and *UTF-16*. The latter two
453are variable-length encodings that store each Unicode character in one or more
454bytes. The default encoding is normally set to ASCII, which passes through
455characters in the range 0 to 127 and rejects any other characters with an error.
456When a Unicode string is printed, written to a file, or converted with
457:func:`str`, conversion takes place using this default encoding. ::
458
459 >>> u"abc"
460 u'abc'
461 >>> str(u"abc")
462 'abc'
463 >>> u"äöü"
464 u'\xe4\xf6\xfc'
465 >>> str(u"äöü")
466 Traceback (most recent call last):
467 File "<stdin>", line 1, in ?
468 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
469
470To convert a Unicode string into an 8-bit string using a specific encoding,
471Unicode objects provide an :func:`encode` method that takes one argument, the
472name of the encoding. Lowercase names for encodings are preferred. ::
473
474 >>> u"äöü".encode('utf-8')
475 '\xc3\xa4\xc3\xb6\xc3\xbc'
476
477If you have data in a specific encoding and want to produce a corresponding
478Unicode string from it, you can use the :func:`unicode` function with the
479encoding name as the second argument. ::
480
481 >>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
482 u'\xe4\xf6\xfc'
483
484
485.. _tut-lists:
486
487Lists
488-----
489
490Python knows a number of *compound* data types, used to group together other
491values. The most versatile is the *list*, which can be written as a list of
492comma-separated values (items) between square brackets. List items need not all
493have the same type. ::
494
495 >>> a = ['spam', 'eggs', 100, 1234]
496 >>> a
497 ['spam', 'eggs', 100, 1234]
498
499Like string indices, list indices start at 0, and lists can be sliced,
500concatenated and so on::
501
502 >>> a[0]
503 'spam'
504 >>> a[3]
505 1234
506 >>> a[-2]
507 100
508 >>> a[1:-1]
509 ['eggs', 100]
510 >>> a[:2] + ['bacon', 2*2]
511 ['spam', 'eggs', 'bacon', 4]
512 >>> 3*a[:3] + ['Boo!']
513 ['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']
514
515Unlike strings, which are *immutable*, it is possible to change individual
516elements of a list::
517
518 >>> a
519 ['spam', 'eggs', 100, 1234]
520 >>> a[2] = a[2] + 23
521 >>> a
522 ['spam', 'eggs', 123, 1234]
523
524Assignment to slices is also possible, and this can even change the size of the
525list or clear it entirely::
526
527 >>> # Replace some items:
528 ... a[0:2] = [1, 12]
529 >>> a
530 [1, 12, 123, 1234]
531 >>> # Remove some:
532 ... a[0:2] = []
533 >>> a
534 [123, 1234]
535 >>> # Insert some:
536 ... a[1:1] = ['bletch', 'xyzzy']
537 >>> a
538 [123, 'bletch', 'xyzzy', 1234]
539 >>> # Insert (a copy of) itself at the beginning
540 >>> a[:0] = a
541 >>> a
542 [123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
543 >>> # Clear the list: replace all items with an empty list
544 >>> a[:] = []
545 >>> a
546 []
547
548The built-in function :func:`len` also applies to lists::
549
550 >>> len(a)
551 8
552
553It is possible to nest lists (create lists containing other lists), for
554example::
555
556 >>> q = [2, 3]
557 >>> p = [1, q, 4]
558 >>> len(p)
559 3
560 >>> p[1]
561 [2, 3]
562 >>> p[1][0]
563 2
564 >>> p[1].append('xtra') # See section 5.1
565 >>> p
566 [1, [2, 3, 'xtra'], 4]
567 >>> q
568 [2, 3, 'xtra']
569
570Note that in the last example, ``p[1]`` and ``q`` really refer to the same
571object! We'll come back to *object semantics* later.
572
573
574.. _tut-firststeps:
575
576First Steps Towards Programming
577===============================
578
579Of course, we can use Python for more complicated tasks than adding two and two
580together. For instance, we can write an initial sub-sequence of the *Fibonacci*
581series as follows::
582
583 >>> # Fibonacci series:
584 ... # the sum of two elements defines the next
585 ... a, b = 0, 1
586 >>> while b < 10:
587 ... print b
588 ... a, b = b, a+b
589 ...
590 1
591 1
592 2
593 3
594 5
595 8
596
597This example introduces several new features.
598
599* The first line contains a *multiple assignment*: the variables ``a`` and ``b``
600 simultaneously get the new values 0 and 1. On the last line this is used again,
601 demonstrating that the expressions on the right-hand side are all evaluated
602 first before any of the assignments take place. The right-hand side expressions
603 are evaluated from the left to the right.
604
605* The :keyword:`while` loop executes as long as the condition (here: ``b < 10``)
606 remains true. In Python, like in C, any non-zero integer value is true; zero is
607 false. The condition may also be a string or list value, in fact any sequence;
608 anything with a non-zero length is true, empty sequences are false. The test
609 used in the example is a simple comparison. The standard comparison operators
610 are written the same as in C: ``<`` (less than), ``>`` (greater than), ``==``
611 (equal to), ``<=`` (less than or equal to), ``>=`` (greater than or equal to)
612 and ``!=`` (not equal to).
613
614* The *body* of the loop is *indented*: indentation is Python's way of grouping
615 statements. Python does not (yet!) provide an intelligent input line editing
616 facility, so you have to type a tab or space(s) for each indented line. In
617 practice you will prepare more complicated input for Python with a text editor;
618 most text editors have an auto-indent facility. When a compound statement is
619 entered interactively, it must be followed by a blank line to indicate
620 completion (since the parser cannot guess when you have typed the last line).
621 Note that each line within a basic block must be indented by the same amount.
622
623* The :keyword:`print` statement writes the value of the expression(s) it is
624 given. It differs from just writing the expression you want to write (as we did
625 earlier in the calculator examples) in the way it handles multiple expressions
626 and strings. Strings are printed without quotes, and a space is inserted
627 between items, so you can format things nicely, like this::
628
629 >>> i = 256*256
630 >>> print 'The value of i is', i
631 The value of i is 65536
632
633 A trailing comma avoids the newline after the output::
634
635 >>> a, b = 0, 1
636 >>> while b < 1000:
637 ... print b,
638 ... a, b = b, a+b
639 ...
640 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
641
642 Note that the interpreter inserts a newline before it prints the next prompt if
643 the last line was not completed.
644
645