Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 1 | :tocdepth: 2 |
| 2 | |
| 3 | ========================= |
| 4 | Library and Extension FAQ |
| 5 | ========================= |
| 6 | |
| 7 | .. contents:: |
| 8 | |
| 9 | General Library Questions |
| 10 | ========================= |
| 11 | |
| 12 | How do I find a module or application to perform task X? |
| 13 | -------------------------------------------------------- |
| 14 | |
| 15 | Check :ref:`the Library Reference <library-index>` to see if there's a relevant |
| 16 | standard library module. (Eventually you'll learn what's in the standard |
| 17 | library and will able to skip this step.) |
| 18 | |
Georg Brandl | a4314c2 | 2009-10-11 20:16:16 +0000 | [diff] [blame] | 19 | For third-party packages, search the `Python Package Index |
| 20 | <http://pypi.python.org/pypi>`_ or try `Google <http://www.google.com>`_ or |
| 21 | another Web search engine. Searching for "Python" plus a keyword or two for |
| 22 | your topic of interest will usually find something helpful. |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 23 | |
| 24 | |
| 25 | Where is the math.py (socket.py, regex.py, etc.) source file? |
| 26 | ------------------------------------------------------------- |
| 27 | |
| 28 | If you can't find a source file for a module it may be a builtin or dynamically |
| 29 | loaded module implemented in C, C++ or other compiled language. In this case |
| 30 | you may not have the source file or it may be something like mathmodule.c, |
| 31 | somewhere in a C source directory (not on the Python Path). |
| 32 | |
| 33 | There are (at least) three kinds of modules in Python: |
| 34 | |
| 35 | 1) modules written in Python (.py); |
| 36 | 2) modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc); |
| 37 | 3) modules written in C and linked with the interpreter; to get a list of these, |
| 38 | type:: |
| 39 | |
| 40 | import sys |
| 41 | print sys.builtin_module_names |
| 42 | |
| 43 | |
| 44 | How do I make a Python script executable on Unix? |
| 45 | ------------------------------------------------- |
| 46 | |
| 47 | You need to do two things: the script file's mode must be executable and the |
| 48 | first line must begin with ``#!`` followed by the path of the Python |
| 49 | interpreter. |
| 50 | |
| 51 | The first is done by executing ``chmod +x scriptfile`` or perhaps ``chmod 755 |
| 52 | scriptfile``. |
| 53 | |
| 54 | The second can be done in a number of ways. The most straightforward way is to |
| 55 | write :: |
| 56 | |
| 57 | #!/usr/local/bin/python |
| 58 | |
| 59 | as the very first line of your file, using the pathname for where the Python |
| 60 | interpreter is installed on your platform. |
| 61 | |
| 62 | If you would like the script to be independent of where the Python interpreter |
| 63 | lives, you can use the "env" program. Almost all Unix variants support the |
| 64 | following, assuming the python interpreter is in a directory on the user's |
| 65 | $PATH:: |
| 66 | |
| 67 | #!/usr/bin/env python |
| 68 | |
| 69 | *Don't* do this for CGI scripts. The $PATH variable for CGI scripts is often |
| 70 | very minimal, so you need to use the actual absolute pathname of the |
| 71 | interpreter. |
| 72 | |
| 73 | Occasionally, a user's environment is so full that the /usr/bin/env program |
| 74 | fails; or there's no env program at all. In that case, you can try the |
| 75 | following hack (due to Alex Rezinsky):: |
| 76 | |
| 77 | #! /bin/sh |
| 78 | """:" |
| 79 | exec python $0 ${1+"$@"} |
| 80 | """ |
| 81 | |
| 82 | The minor disadvantage is that this defines the script's __doc__ string. |
| 83 | However, you can fix that by adding :: |
| 84 | |
| 85 | __doc__ = """...Whatever...""" |
| 86 | |
| 87 | |
| 88 | |
| 89 | Is there a curses/termcap package for Python? |
| 90 | --------------------------------------------- |
| 91 | |
| 92 | .. XXX curses *is* built by default, isn't it? |
| 93 | |
| 94 | For Unix variants: The standard Python source distribution comes with a curses |
| 95 | module in the ``Modules/`` subdirectory, though it's not compiled by default |
| 96 | (note that this is not available in the Windows distribution -- there is no |
| 97 | curses module for Windows). |
| 98 | |
| 99 | The curses module supports basic curses features as well as many additional |
| 100 | functions from ncurses and SYSV curses such as colour, alternative character set |
| 101 | support, pads, and mouse support. This means the module isn't compatible with |
| 102 | operating systems that only have BSD curses, but there don't seem to be any |
| 103 | currently maintained OSes that fall into this category. |
| 104 | |
| 105 | For Windows: use `the consolelib module |
| 106 | <http://effbot.org/zone/console-index.htm>`_. |
| 107 | |
| 108 | |
| 109 | Is there an equivalent to C's onexit() in Python? |
| 110 | ------------------------------------------------- |
| 111 | |
| 112 | The :mod:`atexit` module provides a register function that is similar to C's |
| 113 | onexit. |
| 114 | |
| 115 | |
| 116 | Why don't my signal handlers work? |
| 117 | ---------------------------------- |
| 118 | |
| 119 | The most common problem is that the signal handler is declared with the wrong |
| 120 | argument list. It is called as :: |
| 121 | |
| 122 | handler(signum, frame) |
| 123 | |
| 124 | so it should be declared with two arguments:: |
| 125 | |
| 126 | def handler(signum, frame): |
| 127 | ... |
| 128 | |
| 129 | |
| 130 | Common tasks |
| 131 | ============ |
| 132 | |
| 133 | How do I test a Python program or component? |
| 134 | -------------------------------------------- |
| 135 | |
| 136 | Python comes with two testing frameworks. The :mod:`doctest` module finds |
| 137 | examples in the docstrings for a module and runs them, comparing the output with |
| 138 | the expected output given in the docstring. |
| 139 | |
| 140 | The :mod:`unittest` module is a fancier testing framework modelled on Java and |
| 141 | Smalltalk testing frameworks. |
| 142 | |
| 143 | For testing, it helps to write the program so that it may be easily tested by |
| 144 | using good modular design. Your program should have almost all functionality |
| 145 | encapsulated in either functions or class methods -- and this sometimes has the |
| 146 | surprising and delightful effect of making the program run faster (because local |
| 147 | variable accesses are faster than global accesses). Furthermore the program |
| 148 | should avoid depending on mutating global variables, since this makes testing |
| 149 | much more difficult to do. |
| 150 | |
| 151 | The "global main logic" of your program may be as simple as :: |
| 152 | |
| 153 | if __name__ == "__main__": |
| 154 | main_logic() |
| 155 | |
| 156 | at the bottom of the main module of your program. |
| 157 | |
| 158 | Once your program is organized as a tractable collection of functions and class |
| 159 | behaviours you should write test functions that exercise the behaviours. A test |
| 160 | suite can be associated with each module which automates a sequence of tests. |
| 161 | This sounds like a lot of work, but since Python is so terse and flexible it's |
| 162 | surprisingly easy. You can make coding much more pleasant and fun by writing |
| 163 | your test functions in parallel with the "production code", since this makes it |
| 164 | easy to find bugs and even design flaws earlier. |
| 165 | |
| 166 | "Support modules" that are not intended to be the main module of a program may |
| 167 | include a self-test of the module. :: |
| 168 | |
| 169 | if __name__ == "__main__": |
| 170 | self_test() |
| 171 | |
| 172 | Even programs that interact with complex external interfaces may be tested when |
| 173 | the external interfaces are unavailable by using "fake" interfaces implemented |
| 174 | in Python. |
| 175 | |
| 176 | |
| 177 | How do I create documentation from doc strings? |
| 178 | ----------------------------------------------- |
| 179 | |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 180 | The :mod:`pydoc` module can create HTML from the doc strings in your Python |
Georg Brandl | a4314c2 | 2009-10-11 20:16:16 +0000 | [diff] [blame] | 181 | source code. An alternative for creating API documentation purely from |
| 182 | docstrings is `epydoc <http://epydoc.sf.net/>`_. `Sphinx |
| 183 | <http://sphinx.pocoo.org>`_ can also include docstring content. |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 184 | |
| 185 | |
| 186 | How do I get a single keypress at a time? |
| 187 | ----------------------------------------- |
| 188 | |
| 189 | For Unix variants: There are several solutions. It's straightforward to do this |
| 190 | using curses, but curses is a fairly large module to learn. Here's a solution |
| 191 | without curses:: |
| 192 | |
| 193 | import termios, fcntl, sys, os |
| 194 | fd = sys.stdin.fileno() |
| 195 | |
| 196 | oldterm = termios.tcgetattr(fd) |
| 197 | newattr = termios.tcgetattr(fd) |
| 198 | newattr[3] = newattr[3] & ~termios.ICANON & ~termios.ECHO |
| 199 | termios.tcsetattr(fd, termios.TCSANOW, newattr) |
| 200 | |
| 201 | oldflags = fcntl.fcntl(fd, fcntl.F_GETFL) |
| 202 | fcntl.fcntl(fd, fcntl.F_SETFL, oldflags | os.O_NONBLOCK) |
| 203 | |
| 204 | try: |
| 205 | while 1: |
| 206 | try: |
| 207 | c = sys.stdin.read(1) |
| 208 | print "Got character", `c` |
| 209 | except IOError: pass |
| 210 | finally: |
| 211 | termios.tcsetattr(fd, termios.TCSAFLUSH, oldterm) |
| 212 | fcntl.fcntl(fd, fcntl.F_SETFL, oldflags) |
| 213 | |
| 214 | You need the :mod:`termios` and the :mod:`fcntl` module for any of this to work, |
| 215 | and I've only tried it on Linux, though it should work elsewhere. In this code, |
| 216 | characters are read and printed one at a time. |
| 217 | |
| 218 | :func:`termios.tcsetattr` turns off stdin's echoing and disables canonical mode. |
| 219 | :func:`fcntl.fnctl` is used to obtain stdin's file descriptor flags and modify |
| 220 | them for non-blocking mode. Since reading stdin when it is empty results in an |
| 221 | :exc:`IOError`, this error is caught and ignored. |
| 222 | |
| 223 | |
| 224 | Threads |
| 225 | ======= |
| 226 | |
| 227 | How do I program using threads? |
| 228 | ------------------------------- |
| 229 | |
| 230 | .. XXX it's _thread in py3k |
| 231 | |
| 232 | Be sure to use the :mod:`threading` module and not the :mod:`thread` module. |
| 233 | The :mod:`threading` module builds convenient abstractions on top of the |
| 234 | low-level primitives provided by the :mod:`thread` module. |
| 235 | |
| 236 | Aahz has a set of slides from his threading tutorial that are helpful; see |
Georg Brandl | a4314c2 | 2009-10-11 20:16:16 +0000 | [diff] [blame] | 237 | http://www.pythoncraft.com/OSCON2001/. |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 238 | |
| 239 | |
| 240 | None of my threads seem to run: why? |
| 241 | ------------------------------------ |
| 242 | |
| 243 | As soon as the main thread exits, all threads are killed. Your main thread is |
| 244 | running too quickly, giving the threads no time to do any work. |
| 245 | |
| 246 | A simple fix is to add a sleep to the end of the program that's long enough for |
| 247 | all the threads to finish:: |
| 248 | |
| 249 | import threading, time |
| 250 | |
| 251 | def thread_task(name, n): |
| 252 | for i in range(n): print name, i |
| 253 | |
| 254 | for i in range(10): |
| 255 | T = threading.Thread(target=thread_task, args=(str(i), i)) |
| 256 | T.start() |
| 257 | |
| 258 | time.sleep(10) # <----------------------------! |
| 259 | |
| 260 | But now (on many platforms) the threads don't run in parallel, but appear to run |
| 261 | sequentially, one at a time! The reason is that the OS thread scheduler doesn't |
| 262 | start a new thread until the previous thread is blocked. |
| 263 | |
| 264 | A simple fix is to add a tiny sleep to the start of the run function:: |
| 265 | |
| 266 | def thread_task(name, n): |
| 267 | time.sleep(0.001) # <---------------------! |
| 268 | for i in range(n): print name, i |
| 269 | |
| 270 | for i in range(10): |
| 271 | T = threading.Thread(target=thread_task, args=(str(i), i)) |
| 272 | T.start() |
| 273 | |
| 274 | time.sleep(10) |
| 275 | |
| 276 | Instead of trying to guess how long a :func:`time.sleep` delay will be enough, |
| 277 | it's better to use some kind of semaphore mechanism. One idea is to use the |
| 278 | :mod:`Queue` module to create a queue object, let each thread append a token to |
| 279 | the queue when it finishes, and let the main thread read as many tokens from the |
| 280 | queue as there are threads. |
| 281 | |
| 282 | |
| 283 | How do I parcel out work among a bunch of worker threads? |
| 284 | --------------------------------------------------------- |
| 285 | |
| 286 | Use the :mod:`Queue` module to create a queue containing a list of jobs. The |
| 287 | :class:`~Queue.Queue` class maintains a list of objects with ``.put(obj)`` to |
| 288 | add an item to the queue and ``.get()`` to return an item. The class will take |
| 289 | care of the locking necessary to ensure that each job is handed out exactly |
| 290 | once. |
| 291 | |
| 292 | Here's a trivial example:: |
| 293 | |
| 294 | import threading, Queue, time |
| 295 | |
| 296 | # The worker thread gets jobs off the queue. When the queue is empty, it |
| 297 | # assumes there will be no more work and exits. |
| 298 | # (Realistically workers will run until terminated.) |
| 299 | def worker (): |
| 300 | print 'Running worker' |
| 301 | time.sleep(0.1) |
| 302 | while True: |
| 303 | try: |
| 304 | arg = q.get(block=False) |
| 305 | except Queue.Empty: |
| 306 | print 'Worker', threading.currentThread(), |
| 307 | print 'queue empty' |
| 308 | break |
| 309 | else: |
| 310 | print 'Worker', threading.currentThread(), |
| 311 | print 'running with argument', arg |
| 312 | time.sleep(0.5) |
| 313 | |
| 314 | # Create queue |
| 315 | q = Queue.Queue() |
| 316 | |
| 317 | # Start a pool of 5 workers |
| 318 | for i in range(5): |
| 319 | t = threading.Thread(target=worker, name='worker %i' % (i+1)) |
| 320 | t.start() |
| 321 | |
| 322 | # Begin adding work to the queue |
| 323 | for i in range(50): |
| 324 | q.put(i) |
| 325 | |
| 326 | # Give threads time to run |
| 327 | print 'Main thread sleeping' |
| 328 | time.sleep(5) |
| 329 | |
| 330 | When run, this will produce the following output: |
| 331 | |
| 332 | Running worker |
| 333 | Running worker |
| 334 | Running worker |
| 335 | Running worker |
| 336 | Running worker |
| 337 | Main thread sleeping |
| 338 | Worker <Thread(worker 1, started)> running with argument 0 |
| 339 | Worker <Thread(worker 2, started)> running with argument 1 |
| 340 | Worker <Thread(worker 3, started)> running with argument 2 |
| 341 | Worker <Thread(worker 4, started)> running with argument 3 |
| 342 | Worker <Thread(worker 5, started)> running with argument 4 |
| 343 | Worker <Thread(worker 1, started)> running with argument 5 |
| 344 | ... |
| 345 | |
| 346 | Consult the module's documentation for more details; the ``Queue`` class |
| 347 | provides a featureful interface. |
| 348 | |
| 349 | |
| 350 | What kinds of global value mutation are thread-safe? |
| 351 | ---------------------------------------------------- |
| 352 | |
| 353 | A global interpreter lock (GIL) is used internally to ensure that only one |
| 354 | thread runs in the Python VM at a time. In general, Python offers to switch |
| 355 | among threads only between bytecode instructions; how frequently it switches can |
| 356 | be set via :func:`sys.setcheckinterval`. Each bytecode instruction and |
| 357 | therefore all the C implementation code reached from each instruction is |
| 358 | therefore atomic from the point of view of a Python program. |
| 359 | |
| 360 | In theory, this means an exact accounting requires an exact understanding of the |
| 361 | PVM bytecode implementation. In practice, it means that operations on shared |
| 362 | variables of builtin data types (ints, lists, dicts, etc) that "look atomic" |
| 363 | really are. |
| 364 | |
| 365 | For example, the following operations are all atomic (L, L1, L2 are lists, D, |
| 366 | D1, D2 are dicts, x, y are objects, i, j are ints):: |
| 367 | |
| 368 | L.append(x) |
| 369 | L1.extend(L2) |
| 370 | x = L[i] |
| 371 | x = L.pop() |
| 372 | L1[i:j] = L2 |
| 373 | L.sort() |
| 374 | x = y |
| 375 | x.field = y |
| 376 | D[x] = y |
| 377 | D1.update(D2) |
| 378 | D.keys() |
| 379 | |
| 380 | These aren't:: |
| 381 | |
| 382 | i = i+1 |
| 383 | L.append(L[-1]) |
| 384 | L[i] = L[j] |
| 385 | D[x] = D[x] + 1 |
| 386 | |
| 387 | Operations that replace other objects may invoke those other objects' |
| 388 | :meth:`__del__` method when their reference count reaches zero, and that can |
| 389 | affect things. This is especially true for the mass updates to dictionaries and |
| 390 | lists. When in doubt, use a mutex! |
| 391 | |
| 392 | |
| 393 | Can't we get rid of the Global Interpreter Lock? |
| 394 | ------------------------------------------------ |
| 395 | |
| 396 | .. XXX mention multiprocessing |
Georg Brandl | a4314c2 | 2009-10-11 20:16:16 +0000 | [diff] [blame] | 397 | .. XXX link to dbeazley's talk about GIL? |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 398 | |
| 399 | The Global Interpreter Lock (GIL) is often seen as a hindrance to Python's |
| 400 | deployment on high-end multiprocessor server machines, because a multi-threaded |
| 401 | Python program effectively only uses one CPU, due to the insistence that |
| 402 | (almost) all Python code can only run while the GIL is held. |
| 403 | |
| 404 | Back in the days of Python 1.5, Greg Stein actually implemented a comprehensive |
| 405 | patch set (the "free threading" patches) that removed the GIL and replaced it |
| 406 | with fine-grained locking. Unfortunately, even on Windows (where locks are very |
| 407 | efficient) this ran ordinary Python code about twice as slow as the interpreter |
| 408 | using the GIL. On Linux the performance loss was even worse because pthread |
| 409 | locks aren't as efficient. |
| 410 | |
| 411 | Since then, the idea of getting rid of the GIL has occasionally come up but |
| 412 | nobody has found a way to deal with the expected slowdown, and users who don't |
| 413 | use threads would not be happy if their code ran at half at the speed. Greg's |
| 414 | free threading patch set has not been kept up-to-date for later Python versions. |
| 415 | |
| 416 | This doesn't mean that you can't make good use of Python on multi-CPU machines! |
| 417 | You just have to be creative with dividing the work up between multiple |
| 418 | *processes* rather than multiple *threads*. Judicious use of C extensions will |
| 419 | also help; if you use a C extension to perform a time-consuming task, the |
| 420 | extension can release the GIL while the thread of execution is in the C code and |
| 421 | allow other threads to get some work done. |
| 422 | |
| 423 | It has been suggested that the GIL should be a per-interpreter-state lock rather |
| 424 | than truly global; interpreters then wouldn't be able to share objects. |
| 425 | Unfortunately, this isn't likely to happen either. It would be a tremendous |
| 426 | amount of work, because many object implementations currently have global state. |
| 427 | For example, small integers and short strings are cached; these caches would |
| 428 | have to be moved to the interpreter state. Other object types have their own |
| 429 | free list; these free lists would have to be moved to the interpreter state. |
| 430 | And so on. |
| 431 | |
| 432 | And I doubt that it can even be done in finite time, because the same problem |
| 433 | exists for 3rd party extensions. It is likely that 3rd party extensions are |
| 434 | being written at a faster rate than you can convert them to store all their |
| 435 | global state in the interpreter state. |
| 436 | |
| 437 | And finally, once you have multiple interpreters not sharing any state, what |
| 438 | have you gained over running each interpreter in a separate process? |
| 439 | |
| 440 | |
| 441 | Input and Output |
| 442 | ================ |
| 443 | |
| 444 | How do I delete a file? (And other file questions...) |
| 445 | ----------------------------------------------------- |
| 446 | |
| 447 | Use ``os.remove(filename)`` or ``os.unlink(filename)``; for documentation, see |
| 448 | the :mod:`os` module. The two functions are identical; :func:`unlink` is simply |
| 449 | the name of the Unix system call for this function. |
| 450 | |
| 451 | To remove a directory, use :func:`os.rmdir`; use :func:`os.mkdir` to create one. |
| 452 | ``os.makedirs(path)`` will create any intermediate directories in ``path`` that |
| 453 | don't exist. ``os.removedirs(path)`` will remove intermediate directories as |
| 454 | long as they're empty; if you want to delete an entire directory tree and its |
| 455 | contents, use :func:`shutil.rmtree`. |
| 456 | |
| 457 | To rename a file, use ``os.rename(old_path, new_path)``. |
| 458 | |
| 459 | To truncate a file, open it using ``f = open(filename, "r+")``, and use |
| 460 | ``f.truncate(offset)``; offset defaults to the current seek position. There's |
| 461 | also ```os.ftruncate(fd, offset)`` for files opened with :func:`os.open`, where |
| 462 | ``fd`` is the file descriptor (a small integer). |
| 463 | |
| 464 | The :mod:`shutil` module also contains a number of functions to work on files |
| 465 | including :func:`~shutil.copyfile`, :func:`~shutil.copytree`, and |
| 466 | :func:`~shutil.rmtree`. |
| 467 | |
| 468 | |
| 469 | How do I copy a file? |
| 470 | --------------------- |
| 471 | |
| 472 | The :mod:`shutil` module contains a :func:`~shutil.copyfile` function. Note |
| 473 | that on MacOS 9 it doesn't copy the resource fork and Finder info. |
| 474 | |
| 475 | |
| 476 | How do I read (or write) binary data? |
| 477 | ------------------------------------- |
| 478 | |
| 479 | To read or write complex binary data formats, it's best to use the :mod:`struct` |
| 480 | module. It allows you to take a string containing binary data (usually numbers) |
| 481 | and convert it to Python objects; and vice versa. |
| 482 | |
| 483 | For example, the following code reads two 2-byte integers and one 4-byte integer |
| 484 | in big-endian format from a file:: |
| 485 | |
| 486 | import struct |
| 487 | |
| 488 | f = open(filename, "rb") # Open in binary mode for portability |
| 489 | s = f.read(8) |
| 490 | x, y, z = struct.unpack(">hhl", s) |
| 491 | |
| 492 | The '>' in the format string forces big-endian data; the letter 'h' reads one |
| 493 | "short integer" (2 bytes), and 'l' reads one "long integer" (4 bytes) from the |
| 494 | string. |
| 495 | |
| 496 | For data that is more regular (e.g. a homogeneous list of ints or thefloats), |
| 497 | you can also use the :mod:`array` module. |
| 498 | |
| 499 | |
| 500 | I can't seem to use os.read() on a pipe created with os.popen(); why? |
| 501 | --------------------------------------------------------------------- |
| 502 | |
| 503 | :func:`os.read` is a low-level function which takes a file descriptor, a small |
| 504 | integer representing the opened file. :func:`os.popen` creates a high-level |
| 505 | file object, the same type returned by the builtin :func:`open` function. Thus, |
| 506 | to read n bytes from a pipe p created with :func:`os.popen`, you need to use |
| 507 | ``p.read(n)``. |
| 508 | |
| 509 | |
| 510 | How do I run a subprocess with pipes connected to both input and output? |
| 511 | ------------------------------------------------------------------------ |
| 512 | |
| 513 | .. XXX update to use subprocess |
| 514 | |
| 515 | Use the :mod:`popen2` module. For example:: |
| 516 | |
| 517 | import popen2 |
| 518 | fromchild, tochild = popen2.popen2("command") |
| 519 | tochild.write("input\n") |
| 520 | tochild.flush() |
| 521 | output = fromchild.readline() |
| 522 | |
| 523 | Warning: in general it is unwise to do this because you can easily cause a |
| 524 | deadlock where your process is blocked waiting for output from the child while |
| 525 | the child is blocked waiting for input from you. This can be caused because the |
| 526 | parent expects the child to output more text than it does, or it can be caused |
| 527 | by data being stuck in stdio buffers due to lack of flushing. The Python parent |
| 528 | can of course explicitly flush the data it sends to the child before it reads |
| 529 | any output, but if the child is a naive C program it may have been written to |
| 530 | never explicitly flush its output, even if it is interactive, since flushing is |
| 531 | normally automatic. |
| 532 | |
| 533 | Note that a deadlock is also possible if you use :func:`popen3` to read stdout |
| 534 | and stderr. If one of the two is too large for the internal buffer (increasing |
| 535 | the buffer size does not help) and you ``read()`` the other one first, there is |
| 536 | a deadlock, too. |
| 537 | |
| 538 | Note on a bug in popen2: unless your program calls ``wait()`` or ``waitpid()``, |
| 539 | finished child processes are never removed, and eventually calls to popen2 will |
| 540 | fail because of a limit on the number of child processes. Calling |
| 541 | :func:`os.waitpid` with the :data:`os.WNOHANG` option can prevent this; a good |
| 542 | place to insert such a call would be before calling ``popen2`` again. |
| 543 | |
| 544 | In many cases, all you really need is to run some data through a command and get |
| 545 | the result back. Unless the amount of data is very large, the easiest way to do |
| 546 | this is to write it to a temporary file and run the command with that temporary |
| 547 | file as input. The standard module :mod:`tempfile` exports a ``mktemp()`` |
| 548 | function to generate unique temporary file names. :: |
| 549 | |
| 550 | import tempfile |
| 551 | import os |
| 552 | |
| 553 | class Popen3: |
| 554 | """ |
| 555 | This is a deadlock-safe version of popen that returns |
| 556 | an object with errorlevel, out (a string) and err (a string). |
| 557 | (capturestderr may not work under windows.) |
| 558 | Example: print Popen3('grep spam','\n\nhere spam\n\n').out |
| 559 | """ |
| 560 | def __init__(self,command,input=None,capturestderr=None): |
| 561 | outfile=tempfile.mktemp() |
| 562 | command="( %s ) > %s" % (command,outfile) |
| 563 | if input: |
| 564 | infile=tempfile.mktemp() |
| 565 | open(infile,"w").write(input) |
| 566 | command=command+" <"+infile |
| 567 | if capturestderr: |
| 568 | errfile=tempfile.mktemp() |
| 569 | command=command+" 2>"+errfile |
| 570 | self.errorlevel=os.system(command) >> 8 |
| 571 | self.out=open(outfile,"r").read() |
| 572 | os.remove(outfile) |
| 573 | if input: |
| 574 | os.remove(infile) |
| 575 | if capturestderr: |
| 576 | self.err=open(errfile,"r").read() |
| 577 | os.remove(errfile) |
| 578 | |
| 579 | Note that many interactive programs (e.g. vi) don't work well with pipes |
| 580 | substituted for standard input and output. You will have to use pseudo ttys |
| 581 | ("ptys") instead of pipes. Or you can use a Python interface to Don Libes' |
| 582 | "expect" library. A Python extension that interfaces to expect is called "expy" |
| 583 | and available from http://expectpy.sourceforge.net. A pure Python solution that |
Georg Brandl | a4314c2 | 2009-10-11 20:16:16 +0000 | [diff] [blame] | 584 | works like expect is `pexpect <http://pypi.python.org/pypi/pexpect/>`_. |
Georg Brandl | 6728c5a | 2009-10-11 18:31:23 +0000 | [diff] [blame] | 585 | |
| 586 | |
| 587 | How do I access the serial (RS232) port? |
| 588 | ---------------------------------------- |
| 589 | |
| 590 | For Win32, POSIX (Linux, BSD, etc.), Jython: |
| 591 | |
| 592 | http://pyserial.sourceforge.net |
| 593 | |
| 594 | For Unix, see a Usenet post by Mitch Chapman: |
| 595 | |
| 596 | http://groups.google.com/groups?selm=34A04430.CF9@ohioee.com |
| 597 | |
| 598 | |
| 599 | Why doesn't closing sys.stdout (stdin, stderr) really close it? |
| 600 | --------------------------------------------------------------- |
| 601 | |
| 602 | Python file objects are a high-level layer of abstraction on top of C streams, |
| 603 | which in turn are a medium-level layer of abstraction on top of (among other |
| 604 | things) low-level C file descriptors. |
| 605 | |
| 606 | For most file objects you create in Python via the builtin ``file`` constructor, |
| 607 | ``f.close()`` marks the Python file object as being closed from Python's point |
| 608 | of view, and also arranges to close the underlying C stream. This also happens |
| 609 | automatically in f's destructor, when f becomes garbage. |
| 610 | |
| 611 | But stdin, stdout and stderr are treated specially by Python, because of the |
| 612 | special status also given to them by C. Running ``sys.stdout.close()`` marks |
| 613 | the Python-level file object as being closed, but does *not* close the |
| 614 | associated C stream. |
| 615 | |
| 616 | To close the underlying C stream for one of these three, you should first be |
| 617 | sure that's what you really want to do (e.g., you may confuse extension modules |
| 618 | trying to do I/O). If it is, use os.close:: |
| 619 | |
| 620 | os.close(0) # close C's stdin stream |
| 621 | os.close(1) # close C's stdout stream |
| 622 | os.close(2) # close C's stderr stream |
| 623 | |
| 624 | |
| 625 | Network/Internet Programming |
| 626 | ============================ |
| 627 | |
| 628 | What WWW tools are there for Python? |
| 629 | ------------------------------------ |
| 630 | |
| 631 | See the chapters titled :ref:`internet` and :ref:`netdata` in the Library |
| 632 | Reference Manual. Python has many modules that will help you build server-side |
| 633 | and client-side web systems. |
| 634 | |
| 635 | .. XXX check if wiki page is still up to date |
| 636 | |
| 637 | A summary of available frameworks is maintained by Paul Boddie at |
| 638 | http://wiki.python.org/moin/WebProgramming . |
| 639 | |
| 640 | Cameron Laird maintains a useful set of pages about Python web technologies at |
| 641 | http://phaseit.net/claird/comp.lang.python/web_python. |
| 642 | |
| 643 | |
| 644 | How can I mimic CGI form submission (METHOD=POST)? |
| 645 | -------------------------------------------------- |
| 646 | |
| 647 | I would like to retrieve web pages that are the result of POSTing a form. Is |
| 648 | there existing code that would let me do this easily? |
| 649 | |
| 650 | Yes. Here's a simple example that uses httplib:: |
| 651 | |
| 652 | #!/usr/local/bin/python |
| 653 | |
| 654 | import httplib, sys, time |
| 655 | |
| 656 | ### build the query string |
| 657 | qs = "First=Josephine&MI=Q&Last=Public" |
| 658 | |
| 659 | ### connect and send the server a path |
| 660 | httpobj = httplib.HTTP('www.some-server.out-there', 80) |
| 661 | httpobj.putrequest('POST', '/cgi-bin/some-cgi-script') |
| 662 | ### now generate the rest of the HTTP headers... |
| 663 | httpobj.putheader('Accept', '*/*') |
| 664 | httpobj.putheader('Connection', 'Keep-Alive') |
| 665 | httpobj.putheader('Content-type', 'application/x-www-form-urlencoded') |
| 666 | httpobj.putheader('Content-length', '%d' % len(qs)) |
| 667 | httpobj.endheaders() |
| 668 | httpobj.send(qs) |
| 669 | ### find out what the server said in response... |
| 670 | reply, msg, hdrs = httpobj.getreply() |
| 671 | if reply != 200: |
| 672 | sys.stdout.write(httpobj.getfile().read()) |
| 673 | |
| 674 | Note that in general for URL-encoded POST operations, query strings must be |
| 675 | quoted by using :func:`urllib.quote`. For example to send name="Guy Steele, |
| 676 | Jr.":: |
| 677 | |
| 678 | >>> from urllib import quote |
| 679 | >>> x = quote("Guy Steele, Jr.") |
| 680 | >>> x |
| 681 | 'Guy%20Steele,%20Jr.' |
| 682 | >>> query_string = "name="+x |
| 683 | >>> query_string |
| 684 | 'name=Guy%20Steele,%20Jr.' |
| 685 | |
| 686 | |
| 687 | What module should I use to help with generating HTML? |
| 688 | ------------------------------------------------------ |
| 689 | |
| 690 | .. XXX add modern template languages |
| 691 | |
| 692 | There are many different modules available: |
| 693 | |
| 694 | * HTMLgen is a class library of objects corresponding to all the HTML 3.2 markup |
| 695 | tags. It's used when you are writing in Python and wish to synthesize HTML |
| 696 | pages for generating a web or for CGI forms, etc. |
| 697 | |
| 698 | * DocumentTemplate and Zope Page Templates are two different systems that are |
| 699 | part of Zope. |
| 700 | |
| 701 | * Quixote's PTL uses Python syntax to assemble strings of text. |
| 702 | |
| 703 | Consult the `Web Programming wiki pages |
| 704 | <http://wiki.python.org/moin/WebProgramming>`_ for more links. |
| 705 | |
| 706 | |
| 707 | How do I send mail from a Python script? |
| 708 | ---------------------------------------- |
| 709 | |
| 710 | Use the standard library module :mod:`smtplib`. |
| 711 | |
| 712 | Here's a very simple interactive mail sender that uses it. This method will |
| 713 | work on any host that supports an SMTP listener. :: |
| 714 | |
| 715 | import sys, smtplib |
| 716 | |
| 717 | fromaddr = raw_input("From: ") |
| 718 | toaddrs = raw_input("To: ").split(',') |
| 719 | print "Enter message, end with ^D:" |
| 720 | msg = '' |
| 721 | while True: |
| 722 | line = sys.stdin.readline() |
| 723 | if not line: |
| 724 | break |
| 725 | msg += line |
| 726 | |
| 727 | # The actual mail send |
| 728 | server = smtplib.SMTP('localhost') |
| 729 | server.sendmail(fromaddr, toaddrs, msg) |
| 730 | server.quit() |
| 731 | |
| 732 | A Unix-only alternative uses sendmail. The location of the sendmail program |
| 733 | varies between systems; sometimes it is ``/usr/lib/sendmail``, sometime |
| 734 | ``/usr/sbin/sendmail``. The sendmail manual page will help you out. Here's |
| 735 | some sample code:: |
| 736 | |
| 737 | SENDMAIL = "/usr/sbin/sendmail" # sendmail location |
| 738 | import os |
| 739 | p = os.popen("%s -t -i" % SENDMAIL, "w") |
| 740 | p.write("To: receiver@example.com\n") |
| 741 | p.write("Subject: test\n") |
| 742 | p.write("\n") # blank line separating headers from body |
| 743 | p.write("Some text\n") |
| 744 | p.write("some more text\n") |
| 745 | sts = p.close() |
| 746 | if sts != 0: |
| 747 | print "Sendmail exit status", sts |
| 748 | |
| 749 | |
| 750 | How do I avoid blocking in the connect() method of a socket? |
| 751 | ------------------------------------------------------------ |
| 752 | |
| 753 | The select module is commonly used to help with asynchronous I/O on sockets. |
| 754 | |
| 755 | To prevent the TCP connect from blocking, you can set the socket to non-blocking |
| 756 | mode. Then when you do the ``connect()``, you will either connect immediately |
| 757 | (unlikely) or get an exception that contains the error number as ``.errno``. |
| 758 | ``errno.EINPROGRESS`` indicates that the connection is in progress, but hasn't |
| 759 | finished yet. Different OSes will return different values, so you're going to |
| 760 | have to check what's returned on your system. |
| 761 | |
| 762 | You can use the ``connect_ex()`` method to avoid creating an exception. It will |
| 763 | just return the errno value. To poll, you can call ``connect_ex()`` again later |
| 764 | -- 0 or ``errno.EISCONN`` indicate that you're connected -- or you can pass this |
| 765 | socket to select to check if it's writable. |
| 766 | |
| 767 | |
| 768 | Databases |
| 769 | ========= |
| 770 | |
| 771 | Are there any interfaces to database packages in Python? |
| 772 | -------------------------------------------------------- |
| 773 | |
| 774 | Yes. |
| 775 | |
| 776 | .. XXX remove bsddb in py3k, fix other module names |
| 777 | |
| 778 | Python 2.3 includes the :mod:`bsddb` package which provides an interface to the |
| 779 | BerkeleyDB library. Interfaces to disk-based hashes such as :mod:`DBM <dbm>` |
| 780 | and :mod:`GDBM <gdbm>` are also included with standard Python. |
| 781 | |
| 782 | Support for most relational databases is available. See the |
| 783 | `DatabaseProgramming wiki page |
| 784 | <http://wiki.python.org/moin/DatabaseProgramming>`_ for details. |
| 785 | |
| 786 | |
| 787 | How do you implement persistent objects in Python? |
| 788 | -------------------------------------------------- |
| 789 | |
| 790 | The :mod:`pickle` library module solves this in a very general way (though you |
| 791 | still can't store things like open files, sockets or windows), and the |
| 792 | :mod:`shelve` library module uses pickle and (g)dbm to create persistent |
| 793 | mappings containing arbitrary Python objects. For better performance, you can |
| 794 | use the :mod:`cPickle` module. |
| 795 | |
| 796 | A more awkward way of doing things is to use pickle's little sister, marshal. |
| 797 | The :mod:`marshal` module provides very fast ways to store noncircular basic |
| 798 | Python types to files and strings, and back again. Although marshal does not do |
| 799 | fancy things like store instances or handle shared references properly, it does |
| 800 | run extremely fast. For example loading a half megabyte of data may take less |
| 801 | than a third of a second. This often beats doing something more complex and |
| 802 | general such as using gdbm with pickle/shelve. |
| 803 | |
| 804 | |
| 805 | Why is cPickle so slow? |
| 806 | ----------------------- |
| 807 | |
| 808 | .. XXX update this, default protocol is 2/3 |
| 809 | |
| 810 | The default format used by the pickle module is a slow one that results in |
| 811 | readable pickles. Making it the default, but it would break backward |
| 812 | compatibility:: |
| 813 | |
| 814 | largeString = 'z' * (100 * 1024) |
| 815 | myPickle = cPickle.dumps(largeString, protocol=1) |
| 816 | |
| 817 | |
| 818 | If my program crashes with a bsddb (or anydbm) database open, it gets corrupted. How come? |
| 819 | ------------------------------------------------------------------------------------------ |
| 820 | |
| 821 | Databases opened for write access with the bsddb module (and often by the anydbm |
| 822 | module, since it will preferentially use bsddb) must explicitly be closed using |
| 823 | the ``.close()`` method of the database. The underlying library caches database |
| 824 | contents which need to be converted to on-disk form and written. |
| 825 | |
| 826 | If you have initialized a new bsddb database but not written anything to it |
| 827 | before the program crashes, you will often wind up with a zero-length file and |
| 828 | encounter an exception the next time the file is opened. |
| 829 | |
| 830 | |
| 831 | I tried to open Berkeley DB file, but bsddb produces bsddb.error: (22, 'Invalid argument'). Help! How can I restore my data? |
| 832 | ---------------------------------------------------------------------------------------------------------------------------- |
| 833 | |
| 834 | Don't panic! Your data is probably intact. The most frequent cause for the error |
| 835 | is that you tried to open an earlier Berkeley DB file with a later version of |
| 836 | the Berkeley DB library. |
| 837 | |
| 838 | Many Linux systems now have all three versions of Berkeley DB available. If you |
| 839 | are migrating from version 1 to a newer version use db_dump185 to dump a plain |
| 840 | text version of the database. If you are migrating from version 2 to version 3 |
| 841 | use db2_dump to create a plain text version of the database. In either case, |
| 842 | use db_load to create a new native database for the latest version installed on |
| 843 | your computer. If you have version 3 of Berkeley DB installed, you should be |
| 844 | able to use db2_load to create a native version 2 database. |
| 845 | |
| 846 | You should move away from Berkeley DB version 1 files because the hash file code |
| 847 | contains known bugs that can corrupt your data. |
| 848 | |
| 849 | |
| 850 | Mathematics and Numerics |
| 851 | ======================== |
| 852 | |
| 853 | How do I generate random numbers in Python? |
| 854 | ------------------------------------------- |
| 855 | |
| 856 | The standard module :mod:`random` implements a random number generator. Usage |
| 857 | is simple:: |
| 858 | |
| 859 | import random |
| 860 | random.random() |
| 861 | |
| 862 | This returns a random floating point number in the range [0, 1). |
| 863 | |
| 864 | There are also many other specialized generators in this module, such as: |
| 865 | |
| 866 | * ``randrange(a, b)`` chooses an integer in the range [a, b). |
| 867 | * ``uniform(a, b)`` chooses a floating point number in the range [a, b). |
| 868 | * ``normalvariate(mean, sdev)`` samples the normal (Gaussian) distribution. |
| 869 | |
| 870 | Some higher-level functions operate on sequences directly, such as: |
| 871 | |
| 872 | * ``choice(S)`` chooses random element from a given sequence |
| 873 | * ``shuffle(L)`` shuffles a list in-place, i.e. permutes it randomly |
| 874 | |
| 875 | There's also a ``Random`` class you can instantiate to create independent |
| 876 | multiple random number generators. |