blob: 0ce2757647c4126d95c783ceabd31419370c5c8c [file] [log] [blame]
Georg Brandl116aa622007-08-15 14:28:22 +00001.. _tut-brieftourtwo:
2
3*********************************************
4Brief Tour of the Standard Library -- Part II
5*********************************************
6
7This second tour covers more advanced modules that support professional
8programming needs. These modules rarely occur in small scripts.
9
10
11.. _tut-output-formatting:
12
13Output Formatting
14=================
15
16The :mod:`repr` module provides a version of :func:`repr` customized for
17abbreviated displays of large or deeply nested containers::
18
19 >>> import repr
20 >>> repr.repr(set('supercalifragilisticexpialidocious'))
21 "set(['a', 'c', 'd', 'e', 'f', 'g', ...])"
22
23The :mod:`pprint` module offers more sophisticated control over printing both
24built-in and user defined objects in a way that is readable by the interpreter.
25When the result is longer than one line, the "pretty printer" adds line breaks
26and indentation to more clearly reveal data structure::
27
28 >>> import pprint
29 >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
30 ... 'yellow'], 'blue']]]
31 ...
32 >>> pprint.pprint(t, width=30)
33 [[[['black', 'cyan'],
34 'white',
35 ['green', 'red']],
36 [['magenta', 'yellow'],
37 'blue']]]
38
39The :mod:`textwrap` module formats paragraphs of text to fit a given screen
40width::
41
42 >>> import textwrap
43 >>> doc = """The wrap() method is just like fill() except that it returns
44 ... a list of strings instead of one big string with newlines to separate
45 ... the wrapped lines."""
46 ...
47 >>> print textwrap.fill(doc, width=40)
48 The wrap() method is just like fill()
49 except that it returns a list of strings
50 instead of one big string with newlines
51 to separate the wrapped lines.
52
53The :mod:`locale` module accesses a database of culture specific data formats.
54The grouping attribute of locale's format function provides a direct way of
55formatting numbers with group separators::
56
57 >>> import locale
58 >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
59 'English_United States.1252'
60 >>> conv = locale.localeconv() # get a mapping of conventions
61 >>> x = 1234567.8
62 >>> locale.format("%d", x, grouping=True)
63 '1,234,567'
64 >>> locale.format("%s%.*f", (conv['currency_symbol'],
65 ... conv['frac_digits'], x), grouping=True)
66 '$1,234,567.80'
67
68
69.. _tut-templating:
70
71Templating
72==========
73
74The :mod:`string` module includes a versatile :class:`Template` class with a
75simplified syntax suitable for editing by end-users. This allows users to
76customize their applications without having to alter the application.
77
78The format uses placeholder names formed by ``$`` with valid Python identifiers
79(alphanumeric characters and underscores). Surrounding the placeholder with
80braces allows it to be followed by more alphanumeric letters with no intervening
81spaces. Writing ``$$`` creates a single escaped ``$``::
82
83 >>> from string import Template
84 >>> t = Template('${village}folk send $$10 to $cause.')
85 >>> t.substitute(village='Nottingham', cause='the ditch fund')
86 'Nottinghamfolk send $10 to the ditch fund.'
87
88The :meth:`substitute` method raises a :exc:`KeyError` when a placeholder is not
89supplied in a dictionary or a keyword argument. For mail-merge style
90applications, user supplied data may be incomplete and the
91:meth:`safe_substitute` method may be more appropriate --- it will leave
92placeholders unchanged if data is missing::
93
94 >>> t = Template('Return the $item to $owner.')
95 >>> d = dict(item='unladen swallow')
96 >>> t.substitute(d)
97 Traceback (most recent call last):
98 . . .
99 KeyError: 'owner'
100 >>> t.safe_substitute(d)
101 'Return the unladen swallow to $owner.'
102
103Template subclasses can specify a custom delimiter. For example, a batch
104renaming utility for a photo browser may elect to use percent signs for
105placeholders such as the current date, image sequence number, or file format::
106
107 >>> import time, os.path, sys
108 >>> def raw_input(prompt):
109 ... sys.stdout.write(prompt)
110 ... sys.stdout.flush()
111 ... return sys.stdin.readline()
112 ...
113 >>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
114 >>> class BatchRename(Template):
115 ... delimiter = '%'
116 >>> fmt = raw_input('Enter rename style (%d-date %n-seqnum %f-format): ')
117 Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f
118
119 >>> t = BatchRename(fmt)
120 >>> date = time.strftime('%d%b%y')
121 >>> for i, filename in enumerate(photofiles):
122 ... base, ext = os.path.splitext(filename)
123 ... newname = t.substitute(d=date, n=i, f=ext)
124 ... print '%s --> %s' % (filename, newname)
125
126 img_1074.jpg --> Ashley_0.jpg
127 img_1076.jpg --> Ashley_1.jpg
128 img_1077.jpg --> Ashley_2.jpg
129
130Another application for templating is separating program logic from the details
131of multiple output formats. This makes it possible to substitute custom
132templates for XML files, plain text reports, and HTML web reports.
133
134
135.. _tut-binary-formats:
136
137Working with Binary Data Record Layouts
138=======================================
139
140The :mod:`struct` module provides :func:`pack` and :func:`unpack` functions for
141working with variable length binary record formats. The following example shows
142how to loop through header information in a ZIP file (with pack codes ``"H"``
143and ``"L"`` representing two and four byte unsigned numbers respectively)::
144
145 import struct
146
147 data = open('myfile.zip', 'rb').read()
148 start = 0
149 for i in range(3): # show the first 3 file headers
150 start += 14
151 fields = struct.unpack('LLLHH', data[start:start+16])
152 crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
153
154 start += 16
155 filename = data[start:start+filenamesize]
156 start += filenamesize
157 extra = data[start:start+extra_size]
158 print filename, hex(crc32), comp_size, uncomp_size
159
160 start += extra_size + comp_size # skip to the next header
161
162
163.. _tut-multi-threading:
164
165Multi-threading
166===============
167
168Threading is a technique for decoupling tasks which are not sequentially
169dependent. Threads can be used to improve the responsiveness of applications
170that accept user input while other tasks run in the background. A related use
171case is running I/O in parallel with computations in another thread.
172
173The following code shows how the high level :mod:`threading` module can run
174tasks in background while the main program continues to run::
175
176 import threading, zipfile
177
178 class AsyncZip(threading.Thread):
179 def __init__(self, infile, outfile):
180 threading.Thread.__init__(self)
181 self.infile = infile
182 self.outfile = outfile
183 def run(self):
184 f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
185 f.write(self.infile)
186 f.close()
187 print 'Finished background zip of: ', self.infile
188
189 background = AsyncZip('mydata.txt', 'myarchive.zip')
190 background.start()
191 print 'The main program continues to run in foreground.'
192
193 background.join() # Wait for the background task to finish
194 print 'Main program waited until background was done.'
195
196The principal challenge of multi-threaded applications is coordinating threads
197that share data or other resources. To that end, the threading module provides
198a number of synchronization primitives including locks, events, condition
199variables, and semaphores.
200
201While those tools are powerful, minor design errors can result in problems that
202are difficult to reproduce. So, the preferred approach to task coordination is
203to concentrate all access to a resource in a single thread and then use the
204:mod:`Queue` module to feed that thread with requests from other threads.
205Applications using :class:`Queue` objects for inter-thread communication and
206coordination are easier to design, more readable, and more reliable.
207
208
209.. _tut-logging:
210
211Logging
212=======
213
214The :mod:`logging` module offers a full featured and flexible logging system.
215At its simplest, log messages are sent to a file or to ``sys.stderr``::
216
217 import logging
218 logging.debug('Debugging information')
219 logging.info('Informational message')
220 logging.warning('Warning:config file %s not found', 'server.conf')
221 logging.error('Error occurred')
222 logging.critical('Critical error -- shutting down')
223
224This produces the following output::
225
226 WARNING:root:Warning:config file server.conf not found
227 ERROR:root:Error occurred
228 CRITICAL:root:Critical error -- shutting down
229
230By default, informational and debugging messages are suppressed and the output
231is sent to standard error. Other output options include routing messages
232through email, datagrams, sockets, or to an HTTP Server. New filters can select
233different routing based on message priority: :const:`DEBUG`, :const:`INFO`,
234:const:`WARNING`, :const:`ERROR`, and :const:`CRITICAL`.
235
236The logging system can be configured directly from Python or can be loaded from
237a user editable configuration file for customized logging without altering the
238application.
239
240
241.. _tut-weak-references:
242
243Weak References
244===============
245
246Python does automatic memory management (reference counting for most objects and
247garbage collection to eliminate cycles). The memory is freed shortly after the
248last reference to it has been eliminated.
249
250This approach works fine for most applications but occasionally there is a need
251to track objects only as long as they are being used by something else.
252Unfortunately, just tracking them creates a reference that makes them permanent.
253The :mod:`weakref` module provides tools for tracking objects without creating a
254reference. When the object is no longer needed, it is automatically removed
255from a weakref table and a callback is triggered for weakref objects. Typical
256applications include caching objects that are expensive to create::
257
258 >>> import weakref, gc
259 >>> class A:
260 ... def __init__(self, value):
261 ... self.value = value
262 ... def __repr__(self):
263 ... return str(self.value)
264 ...
265 >>> a = A(10) # create a reference
266 >>> d = weakref.WeakValueDictionary()
267 >>> d['primary'] = a # does not create a reference
268 >>> d['primary'] # fetch the object if it is still alive
269 10
270 >>> del a # remove the one reference
271 >>> gc.collect() # run garbage collection right away
272 0
273 >>> d['primary'] # entry was automatically removed
274 Traceback (most recent call last):
275 File "<pyshell#108>", line 1, in -toplevel-
276 d['primary'] # entry was automatically removed
277 File "C:/python30/lib/weakref.py", line 46, in __getitem__
278 o = self.data[key]()
279 KeyError: 'primary'
280
281
282.. _tut-list-tools:
283
284Tools for Working with Lists
285============================
286
287Many data structure needs can be met with the built-in list type. However,
288sometimes there is a need for alternative implementations with different
289performance trade-offs.
290
291The :mod:`array` module provides an :class:`array()` object that is like a list
292that stores only homogenous data and stores it more compactly. The following
293example shows an array of numbers stored as two byte unsigned binary numbers
294(typecode ``"H"``) rather than the usual 16 bytes per entry for regular lists of
295python int objects::
296
297 >>> from array import array
298 >>> a = array('H', [4000, 10, 700, 22222])
299 >>> sum(a)
300 26932
301 >>> a[1:3]
302 array('H', [10, 700])
303
304The :mod:`collections` module provides a :class:`deque()` object that is like a
305list with faster appends and pops from the left side but slower lookups in the
306middle. These objects are well suited for implementing queues and breadth first
307tree searches::
308
309 >>> from collections import deque
310 >>> d = deque(["task1", "task2", "task3"])
311 >>> d.append("task4")
312 >>> print "Handling", d.popleft()
313 Handling task1
314
315 unsearched = deque([starting_node])
316 def breadth_first_search(unsearched):
317 node = unsearched.popleft()
318 for m in gen_moves(node):
319 if is_goal(m):
320 return m
321 unsearched.append(m)
322
323In addition to alternative list implementations, the library also offers other
324tools such as the :mod:`bisect` module with functions for manipulating sorted
325lists::
326
327 >>> import bisect
328 >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
329 >>> bisect.insort(scores, (300, 'ruby'))
330 >>> scores
331 [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
332
333The :mod:`heapq` module provides functions for implementing heaps based on
334regular lists. The lowest valued entry is always kept at position zero. This
335is useful for applications which repeatedly access the smallest element but do
336not want to run a full list sort::
337
338 >>> from heapq import heapify, heappop, heappush
339 >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
340 >>> heapify(data) # rearrange the list into heap order
341 >>> heappush(data, -5) # add a new entry
342 >>> [heappop(data) for i in range(3)] # fetch the three smallest entries
343 [-5, 0, 1]
344
345
346.. _tut-decimal-fp:
347
348Decimal Floating Point Arithmetic
349=================================
350
351The :mod:`decimal` module offers a :class:`Decimal` datatype for decimal
352floating point arithmetic. Compared to the built-in :class:`float`
353implementation of binary floating point, the new class is especially helpful for
354financial applications and other uses which require exact decimal
355representation, control over precision, control over rounding to meet legal or
356regulatory requirements, tracking of significant decimal places, or for
357applications where the user expects the results to match calculations done by
358hand.
359
360For example, calculating a 5% tax on a 70 cent phone charge gives different
361results in decimal floating point and binary floating point. The difference
362becomes significant if the results are rounded to the nearest cent::
363
364 >>> from decimal import *
365 >>> Decimal('0.70') * Decimal('1.05')
366 Decimal("0.7350")
367 >>> .70 * 1.05
368 0.73499999999999999
369
370The :class:`Decimal` result keeps a trailing zero, automatically inferring four
371place significance from multiplicands with two place significance. Decimal
372reproduces mathematics as done by hand and avoids issues that can arise when
373binary floating point cannot exactly represent decimal quantities.
374
375Exact representation enables the :class:`Decimal` class to perform modulo
376calculations and equality tests that are unsuitable for binary floating point::
377
378 >>> Decimal('1.00') % Decimal('.10')
379 Decimal("0.00")
380 >>> 1.00 % 0.10
381 0.09999999999999995
382
383 >>> sum([Decimal('0.1')]*10) == Decimal('1.0')
384 True
385 >>> sum([0.1]*10) == 1.0
386 False
387
388The :mod:`decimal` module provides arithmetic with as much precision as needed::
389
390 >>> getcontext().prec = 36
391 >>> Decimal(1) / Decimal(7)
392 Decimal("0.142857142857142857142857142857142857")
393
394