Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 1 | .. _tut-brieftourtwo: |
| 2 | |
| 3 | ********************************************* |
| 4 | Brief Tour of the Standard Library -- Part II |
| 5 | ********************************************* |
| 6 | |
| 7 | This second tour covers more advanced modules that support professional |
| 8 | programming needs. These modules rarely occur in small scripts. |
| 9 | |
| 10 | |
| 11 | .. _tut-output-formatting: |
| 12 | |
| 13 | Output Formatting |
| 14 | ================= |
| 15 | |
Alexandre Vassalotti | 1f2ba4b | 2008-05-16 07:12:44 +0000 | [diff] [blame] | 16 | The :mod:`reprlib` module provides a version of :func:`repr` customized for |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 17 | abbreviated displays of large or deeply nested containers:: |
| 18 | |
Alexandre Vassalotti | 1f2ba4b | 2008-05-16 07:12:44 +0000 | [diff] [blame] | 19 | >>> import reprlib |
| 20 | >>> reprlib.repr(set('supercalifragilisticexpialidocious')) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 21 | "set(['a', 'c', 'd', 'e', 'f', 'g', ...])" |
| 22 | |
| 23 | The :mod:`pprint` module offers more sophisticated control over printing both |
| 24 | built-in and user defined objects in a way that is readable by the interpreter. |
| 25 | When the result is longer than one line, the "pretty printer" adds line breaks |
| 26 | and indentation to more clearly reveal data structure:: |
| 27 | |
| 28 | >>> import pprint |
| 29 | >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta', |
| 30 | ... 'yellow'], 'blue']]] |
| 31 | ... |
| 32 | >>> pprint.pprint(t, width=30) |
| 33 | [[[['black', 'cyan'], |
| 34 | 'white', |
| 35 | ['green', 'red']], |
| 36 | [['magenta', 'yellow'], |
| 37 | 'blue']]] |
| 38 | |
| 39 | The :mod:`textwrap` module formats paragraphs of text to fit a given screen |
| 40 | width:: |
| 41 | |
| 42 | >>> import textwrap |
| 43 | >>> doc = """The wrap() method is just like fill() except that it returns |
| 44 | ... a list of strings instead of one big string with newlines to separate |
| 45 | ... the wrapped lines.""" |
| 46 | ... |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 47 | >>> print(textwrap.fill(doc, width=40)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 48 | The wrap() method is just like fill() |
| 49 | except that it returns a list of strings |
| 50 | instead of one big string with newlines |
| 51 | to separate the wrapped lines. |
| 52 | |
| 53 | The :mod:`locale` module accesses a database of culture specific data formats. |
| 54 | The grouping attribute of locale's format function provides a direct way of |
| 55 | formatting numbers with group separators:: |
| 56 | |
| 57 | >>> import locale |
| 58 | >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252') |
| 59 | 'English_United States.1252' |
| 60 | >>> conv = locale.localeconv() # get a mapping of conventions |
| 61 | >>> x = 1234567.8 |
| 62 | >>> locale.format("%d", x, grouping=True) |
| 63 | '1,234,567' |
Georg Brandl | 4a52a4c | 2009-08-13 12:06:43 +0000 | [diff] [blame] | 64 | >>> locale.format_string("%s%.*f", (conv['currency_symbol'], |
| 65 | ... conv['frac_digits'], x), grouping=True) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 66 | '$1,234,567.80' |
| 67 | |
| 68 | |
| 69 | .. _tut-templating: |
| 70 | |
| 71 | Templating |
| 72 | ========== |
| 73 | |
| 74 | The :mod:`string` module includes a versatile :class:`Template` class with a |
| 75 | simplified syntax suitable for editing by end-users. This allows users to |
| 76 | customize their applications without having to alter the application. |
| 77 | |
| 78 | The format uses placeholder names formed by ``$`` with valid Python identifiers |
| 79 | (alphanumeric characters and underscores). Surrounding the placeholder with |
| 80 | braces allows it to be followed by more alphanumeric letters with no intervening |
| 81 | spaces. Writing ``$$`` creates a single escaped ``$``:: |
| 82 | |
| 83 | >>> from string import Template |
| 84 | >>> t = Template('${village}folk send $$10 to $cause.') |
| 85 | >>> t.substitute(village='Nottingham', cause='the ditch fund') |
| 86 | 'Nottinghamfolk send $10 to the ditch fund.' |
| 87 | |
| 88 | The :meth:`substitute` method raises a :exc:`KeyError` when a placeholder is not |
| 89 | supplied in a dictionary or a keyword argument. For mail-merge style |
| 90 | applications, user supplied data may be incomplete and the |
| 91 | :meth:`safe_substitute` method may be more appropriate --- it will leave |
| 92 | placeholders unchanged if data is missing:: |
| 93 | |
| 94 | >>> t = Template('Return the $item to $owner.') |
| 95 | >>> d = dict(item='unladen swallow') |
| 96 | >>> t.substitute(d) |
| 97 | Traceback (most recent call last): |
| 98 | . . . |
| 99 | KeyError: 'owner' |
| 100 | >>> t.safe_substitute(d) |
| 101 | 'Return the unladen swallow to $owner.' |
| 102 | |
| 103 | Template subclasses can specify a custom delimiter. For example, a batch |
| 104 | renaming utility for a photo browser may elect to use percent signs for |
| 105 | placeholders such as the current date, image sequence number, or file format:: |
| 106 | |
Georg Brandl | 8d5c392 | 2007-12-02 22:48:17 +0000 | [diff] [blame] | 107 | >>> import time, os.path |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 108 | >>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg'] |
| 109 | >>> class BatchRename(Template): |
| 110 | ... delimiter = '%' |
Georg Brandl | 8d5c392 | 2007-12-02 22:48:17 +0000 | [diff] [blame] | 111 | >>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 112 | Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f |
| 113 | |
| 114 | >>> t = BatchRename(fmt) |
| 115 | >>> date = time.strftime('%d%b%y') |
| 116 | >>> for i, filename in enumerate(photofiles): |
| 117 | ... base, ext = os.path.splitext(filename) |
| 118 | ... newname = t.substitute(d=date, n=i, f=ext) |
Benjamin Peterson | e6f0063 | 2008-05-26 01:03:56 +0000 | [diff] [blame] | 119 | ... print('{0} --> {1}'.format(filename, newname)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 120 | |
| 121 | img_1074.jpg --> Ashley_0.jpg |
| 122 | img_1076.jpg --> Ashley_1.jpg |
| 123 | img_1077.jpg --> Ashley_2.jpg |
| 124 | |
| 125 | Another application for templating is separating program logic from the details |
| 126 | of multiple output formats. This makes it possible to substitute custom |
| 127 | templates for XML files, plain text reports, and HTML web reports. |
| 128 | |
| 129 | |
| 130 | .. _tut-binary-formats: |
| 131 | |
| 132 | Working with Binary Data Record Layouts |
| 133 | ======================================= |
| 134 | |
| 135 | The :mod:`struct` module provides :func:`pack` and :func:`unpack` functions for |
| 136 | working with variable length binary record formats. The following example shows |
Christian Heimes | e7a15bb | 2008-01-24 16:21:45 +0000 | [diff] [blame] | 137 | how to loop through header information in a ZIP file without using the |
| 138 | :mod:`zipfile` module. Pack codes ``"H"`` and ``"I"`` represent two and four |
| 139 | byte unsigned numbers respectively. The ``"<"`` indicates that they are |
| 140 | standard size and in little-endian byte order:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 141 | |
| 142 | import struct |
| 143 | |
Éric Araujo | a3dd56b | 2011-03-11 17:42:48 +0100 | [diff] [blame] | 144 | with open('myfile.zip', 'rb') as f: |
| 145 | data = f.read() |
| 146 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 147 | start = 0 |
| 148 | for i in range(3): # show the first 3 file headers |
| 149 | start += 14 |
Christian Heimes | e7a15bb | 2008-01-24 16:21:45 +0000 | [diff] [blame] | 150 | fields = struct.unpack('<IIIHH', data[start:start+16]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 151 | crc32, comp_size, uncomp_size, filenamesize, extra_size = fields |
| 152 | |
| 153 | start += 16 |
| 154 | filename = data[start:start+filenamesize] |
| 155 | start += filenamesize |
| 156 | extra = data[start:start+extra_size] |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 157 | print(filename, hex(crc32), comp_size, uncomp_size) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 158 | |
| 159 | start += extra_size + comp_size # skip to the next header |
| 160 | |
| 161 | |
| 162 | .. _tut-multi-threading: |
| 163 | |
| 164 | Multi-threading |
| 165 | =============== |
| 166 | |
| 167 | Threading is a technique for decoupling tasks which are not sequentially |
| 168 | dependent. Threads can be used to improve the responsiveness of applications |
| 169 | that accept user input while other tasks run in the background. A related use |
| 170 | case is running I/O in parallel with computations in another thread. |
| 171 | |
| 172 | The following code shows how the high level :mod:`threading` module can run |
| 173 | tasks in background while the main program continues to run:: |
| 174 | |
| 175 | import threading, zipfile |
| 176 | |
| 177 | class AsyncZip(threading.Thread): |
| 178 | def __init__(self, infile, outfile): |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 179 | threading.Thread.__init__(self) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 180 | self.infile = infile |
| 181 | self.outfile = outfile |
| 182 | def run(self): |
| 183 | f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED) |
| 184 | f.write(self.infile) |
| 185 | f.close() |
Georg Brandl | e4ac750 | 2007-09-03 07:10:24 +0000 | [diff] [blame] | 186 | print('Finished background zip of:', self.infile) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 187 | |
| 188 | background = AsyncZip('mydata.txt', 'myarchive.zip') |
| 189 | background.start() |
Guido van Rossum | 0616b79 | 2007-08-31 03:25:11 +0000 | [diff] [blame] | 190 | print('The main program continues to run in foreground.') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 191 | |
| 192 | background.join() # Wait for the background task to finish |
Guido van Rossum | 0616b79 | 2007-08-31 03:25:11 +0000 | [diff] [blame] | 193 | print('Main program waited until background was done.') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 194 | |
| 195 | The principal challenge of multi-threaded applications is coordinating threads |
| 196 | that share data or other resources. To that end, the threading module provides |
| 197 | a number of synchronization primitives including locks, events, condition |
| 198 | variables, and semaphores. |
| 199 | |
| 200 | While those tools are powerful, minor design errors can result in problems that |
| 201 | are difficult to reproduce. So, the preferred approach to task coordination is |
| 202 | to concentrate all access to a resource in a single thread and then use the |
Alexandre Vassalotti | f260e44 | 2008-05-11 19:59:59 +0000 | [diff] [blame] | 203 | :mod:`queue` module to feed that thread with requests from other threads. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 204 | Applications using :class:`Queue` objects for inter-thread communication and |
| 205 | coordination are easier to design, more readable, and more reliable. |
| 206 | |
| 207 | |
| 208 | .. _tut-logging: |
| 209 | |
| 210 | Logging |
| 211 | ======= |
| 212 | |
| 213 | The :mod:`logging` module offers a full featured and flexible logging system. |
| 214 | At its simplest, log messages are sent to a file or to ``sys.stderr``:: |
| 215 | |
| 216 | import logging |
| 217 | logging.debug('Debugging information') |
| 218 | logging.info('Informational message') |
| 219 | logging.warning('Warning:config file %s not found', 'server.conf') |
| 220 | logging.error('Error occurred') |
| 221 | logging.critical('Critical error -- shutting down') |
| 222 | |
| 223 | This produces the following output:: |
| 224 | |
| 225 | WARNING:root:Warning:config file server.conf not found |
| 226 | ERROR:root:Error occurred |
| 227 | CRITICAL:root:Critical error -- shutting down |
| 228 | |
| 229 | By default, informational and debugging messages are suppressed and the output |
| 230 | is sent to standard error. Other output options include routing messages |
| 231 | through email, datagrams, sockets, or to an HTTP Server. New filters can select |
| 232 | different routing based on message priority: :const:`DEBUG`, :const:`INFO`, |
| 233 | :const:`WARNING`, :const:`ERROR`, and :const:`CRITICAL`. |
| 234 | |
| 235 | The logging system can be configured directly from Python or can be loaded from |
| 236 | a user editable configuration file for customized logging without altering the |
| 237 | application. |
| 238 | |
| 239 | |
| 240 | .. _tut-weak-references: |
| 241 | |
| 242 | Weak References |
| 243 | =============== |
| 244 | |
| 245 | Python does automatic memory management (reference counting for most objects and |
Christian Heimes | d8654cf | 2007-12-02 15:22:16 +0000 | [diff] [blame] | 246 | :term:`garbage collection` to eliminate cycles). The memory is freed shortly |
| 247 | after the last reference to it has been eliminated. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 248 | |
| 249 | This approach works fine for most applications but occasionally there is a need |
| 250 | to track objects only as long as they are being used by something else. |
| 251 | Unfortunately, just tracking them creates a reference that makes them permanent. |
| 252 | The :mod:`weakref` module provides tools for tracking objects without creating a |
| 253 | reference. When the object is no longer needed, it is automatically removed |
| 254 | from a weakref table and a callback is triggered for weakref objects. Typical |
| 255 | applications include caching objects that are expensive to create:: |
| 256 | |
| 257 | >>> import weakref, gc |
| 258 | >>> class A: |
| 259 | ... def __init__(self, value): |
| 260 | ... self.value = value |
| 261 | ... def __repr__(self): |
| 262 | ... return str(self.value) |
| 263 | ... |
| 264 | >>> a = A(10) # create a reference |
| 265 | >>> d = weakref.WeakValueDictionary() |
| 266 | >>> d['primary'] = a # does not create a reference |
| 267 | >>> d['primary'] # fetch the object if it is still alive |
| 268 | 10 |
| 269 | >>> del a # remove the one reference |
| 270 | >>> gc.collect() # run garbage collection right away |
| 271 | 0 |
| 272 | >>> d['primary'] # entry was automatically removed |
| 273 | Traceback (most recent call last): |
Christian Heimes | c3f30c4 | 2008-02-22 16:37:40 +0000 | [diff] [blame] | 274 | File "<stdin>", line 1, in <module> |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 275 | d['primary'] # entry was automatically removed |
Georg Brandl | 3ebb6b3 | 2011-02-20 10:37:07 +0000 | [diff] [blame] | 276 | File "C:/python33/lib/weakref.py", line 46, in __getitem__ |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 277 | o = self.data[key]() |
| 278 | KeyError: 'primary' |
| 279 | |
| 280 | |
| 281 | .. _tut-list-tools: |
| 282 | |
| 283 | Tools for Working with Lists |
| 284 | ============================ |
| 285 | |
| 286 | Many data structure needs can be met with the built-in list type. However, |
| 287 | sometimes there is a need for alternative implementations with different |
| 288 | performance trade-offs. |
| 289 | |
| 290 | The :mod:`array` module provides an :class:`array()` object that is like a list |
Georg Brandl | 2ee470f | 2008-07-16 12:55:28 +0000 | [diff] [blame] | 291 | that stores only homogeneous data and stores it more compactly. The following |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 292 | example shows an array of numbers stored as two byte unsigned binary numbers |
| 293 | (typecode ``"H"``) rather than the usual 16 bytes per entry for regular lists of |
Ezio Melotti | 0639d5a | 2009-12-19 23:26:38 +0000 | [diff] [blame] | 294 | Python int objects:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 295 | |
| 296 | >>> from array import array |
| 297 | >>> a = array('H', [4000, 10, 700, 22222]) |
| 298 | >>> sum(a) |
| 299 | 26932 |
| 300 | >>> a[1:3] |
| 301 | array('H', [10, 700]) |
| 302 | |
| 303 | The :mod:`collections` module provides a :class:`deque()` object that is like a |
| 304 | list with faster appends and pops from the left side but slower lookups in the |
| 305 | middle. These objects are well suited for implementing queues and breadth first |
| 306 | tree searches:: |
| 307 | |
| 308 | >>> from collections import deque |
| 309 | >>> d = deque(["task1", "task2", "task3"]) |
| 310 | >>> d.append("task4") |
Guido van Rossum | 0616b79 | 2007-08-31 03:25:11 +0000 | [diff] [blame] | 311 | >>> print("Handling", d.popleft()) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 312 | Handling task1 |
| 313 | |
| 314 | unsearched = deque([starting_node]) |
| 315 | def breadth_first_search(unsearched): |
| 316 | node = unsearched.popleft() |
| 317 | for m in gen_moves(node): |
| 318 | if is_goal(m): |
| 319 | return m |
| 320 | unsearched.append(m) |
| 321 | |
| 322 | In addition to alternative list implementations, the library also offers other |
| 323 | tools such as the :mod:`bisect` module with functions for manipulating sorted |
| 324 | lists:: |
| 325 | |
| 326 | >>> import bisect |
| 327 | >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')] |
| 328 | >>> bisect.insort(scores, (300, 'ruby')) |
| 329 | >>> scores |
| 330 | [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')] |
| 331 | |
| 332 | The :mod:`heapq` module provides functions for implementing heaps based on |
| 333 | regular lists. The lowest valued entry is always kept at position zero. This |
| 334 | is useful for applications which repeatedly access the smallest element but do |
| 335 | not want to run a full list sort:: |
| 336 | |
| 337 | >>> from heapq import heapify, heappop, heappush |
| 338 | >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] |
| 339 | >>> heapify(data) # rearrange the list into heap order |
| 340 | >>> heappush(data, -5) # add a new entry |
| 341 | >>> [heappop(data) for i in range(3)] # fetch the three smallest entries |
| 342 | [-5, 0, 1] |
| 343 | |
| 344 | |
| 345 | .. _tut-decimal-fp: |
| 346 | |
| 347 | Decimal Floating Point Arithmetic |
| 348 | ================================= |
| 349 | |
| 350 | The :mod:`decimal` module offers a :class:`Decimal` datatype for decimal |
| 351 | floating point arithmetic. Compared to the built-in :class:`float` |
Alexandre Vassalotti | 6d3dfc3 | 2009-07-29 19:54:39 +0000 | [diff] [blame] | 352 | implementation of binary floating point, the class is especially helpful for |
| 353 | |
| 354 | * financial applications and other uses which require exact decimal |
| 355 | representation, |
| 356 | * control over precision, |
| 357 | * control over rounding to meet legal or regulatory requirements, |
| 358 | * tracking of significant decimal places, or |
| 359 | * applications where the user expects the results to match calculations done by |
| 360 | hand. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 361 | |
| 362 | For example, calculating a 5% tax on a 70 cent phone charge gives different |
| 363 | results in decimal floating point and binary floating point. The difference |
| 364 | becomes significant if the results are rounded to the nearest cent:: |
| 365 | |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 366 | >>> from decimal import * |
Mark Dickinson | 5a55b61 | 2009-06-28 20:59:42 +0000 | [diff] [blame] | 367 | >>> round(Decimal('0.70') * Decimal('1.05'), 2) |
| 368 | Decimal('0.74') |
| 369 | >>> round(.70 * 1.05, 2) |
| 370 | 0.73 |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 371 | |
| 372 | The :class:`Decimal` result keeps a trailing zero, automatically inferring four |
| 373 | place significance from multiplicands with two place significance. Decimal |
| 374 | reproduces mathematics as done by hand and avoids issues that can arise when |
| 375 | binary floating point cannot exactly represent decimal quantities. |
| 376 | |
| 377 | Exact representation enables the :class:`Decimal` class to perform modulo |
| 378 | calculations and equality tests that are unsuitable for binary floating point:: |
| 379 | |
| 380 | >>> Decimal('1.00') % Decimal('.10') |
Mark Dickinson | 2c02bdc | 2009-06-28 21:24:42 +0000 | [diff] [blame] | 381 | Decimal('0.00') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 382 | >>> 1.00 % 0.10 |
| 383 | 0.09999999999999995 |
| 384 | |
| 385 | >>> sum([Decimal('0.1')]*10) == Decimal('1.0') |
| 386 | True |
| 387 | >>> sum([0.1]*10) == 1.0 |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 388 | False |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 389 | |
| 390 | The :mod:`decimal` module provides arithmetic with as much precision as needed:: |
| 391 | |
| 392 | >>> getcontext().prec = 36 |
| 393 | >>> Decimal(1) / Decimal(7) |
Mark Dickinson | 2c02bdc | 2009-06-28 21:24:42 +0000 | [diff] [blame] | 394 | Decimal('0.142857142857142857142857142857142857') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 395 | |
| 396 | |