blob: d0cc54865165cebd4f7d98d0d6e6c456ff509347 [file] [log] [blame]
Georg Brandl8ec7f652007-08-15 14:28:01 +00001:mod:`urllib` --- Open arbitrary resources by URL
2=================================================
3
4.. module:: urllib
5 :synopsis: Open an arbitrary network resource by URL (requires sockets).
6
Brett Cannon8bb8fa52008-07-02 01:57:08 +00007.. note::
8 The :mod:`urllib` module has been split into parts and renamed in
Ezio Melotti510ff542012-05-03 19:21:40 +03009 Python 3 to :mod:`urllib.request`, :mod:`urllib.parse`,
Brett Cannon8bb8fa52008-07-02 01:57:08 +000010 and :mod:`urllib.error`. The :term:`2to3` tool will automatically adapt
Ezio Melotti510ff542012-05-03 19:21:40 +030011 imports when converting your sources to Python 3.
Ezio Melotti9c96f0b2014-02-10 09:59:04 +020012 Also note that the :func:`urllib.request.urlopen` function in Python 3 is
13 equivalent to :func:`urllib2.urlopen` and that :func:`urllib.urlopen` has
14 been removed.
Georg Brandl8ec7f652007-08-15 14:28:01 +000015
16.. index::
17 single: WWW
18 single: World Wide Web
19 single: URL
20
21This module provides a high-level interface for fetching data across the World
22Wide Web. In particular, the :func:`urlopen` function is similar to the
23built-in function :func:`open`, but accepts Universal Resource Locators (URLs)
24instead of filenames. Some restrictions apply --- it can only open URLs for
25reading, and no seek operations are available.
26
Benjamin Peterson2c6ca8a2015-04-20 18:20:27 -040027.. seealso::
28
29 The `Requests package <http://requests.readthedocs.org/>`_
30 is recommended for a higher-level http client interface.
31
Sandro Tosi71a5ea02011-08-12 19:11:24 +020032.. warning:: When opening HTTPS URLs, it does not attempt to validate the
Antoine Pitrou66bfda82010-09-29 11:30:52 +000033 server certificate. Use at your own risk!
34
35
Georg Brandl62647652008-01-07 18:23:27 +000036High-level interface
37--------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +000038
Benjamin Petersonb2064732014-11-23 20:55:24 -060039.. function:: urlopen(url[, data[, proxies[, context]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +000040
R David Murrayc7b8f802012-08-15 11:22:58 -040041 Open a network object denoted by a URL for reading. If the URL does not
42 have a scheme identifier, or if it has :file:`file:` as its scheme
43 identifier, this opens a local file (without :term:`universal newlines`);
44 otherwise it opens a socket to a server somewhere on the network. If the
45 connection cannot be made the :exc:`IOError` exception is raised. If all
46 went well, a file-like object is returned. This supports the following
47 methods: :meth:`read`, :meth:`readline`, :meth:`readlines`, :meth:`fileno`,
48 :meth:`close`, :meth:`info`, :meth:`getcode` and :meth:`geturl`. It also
49 has proper support for the :term:`iterator` protocol. One caveat: the
50 :meth:`read` method, if the size argument is omitted or negative, may not
51 read until the end of the data stream; there is no good way to determine
Georg Brandl8ec7f652007-08-15 14:28:01 +000052 that the entire stream from a socket has been read in the general case.
53
Georg Brandl9b0d46d2008-01-20 11:43:03 +000054 Except for the :meth:`info`, :meth:`getcode` and :meth:`geturl` methods,
55 these methods have the same interface as for file objects --- see section
56 :ref:`bltin-file-objects` in this manual. (It is not a built-in file object,
57 however, so it can't be used at those few places where a true built-in file
58 object is required.)
Georg Brandl8ec7f652007-08-15 14:28:01 +000059
60 .. index:: module: mimetools
61
62 The :meth:`info` method returns an instance of the class
Senthil Kumaran1c919a62010-06-29 13:28:20 +000063 :class:`mimetools.Message` containing meta-information associated with the
Georg Brandl8ec7f652007-08-15 14:28:01 +000064 URL. When the method is HTTP, these headers are those returned by the server
65 at the head of the retrieved HTML page (including Content-Length and
66 Content-Type). When the method is FTP, a Content-Length header will be
67 present if (as is now usual) the server passed back a file length in response
68 to the FTP retrieval request. A Content-Type header will be present if the
69 MIME type can be guessed. When the method is local-file, returned headers
70 will include a Date representing the file's last-modified time, a
71 Content-Length giving file size, and a Content-Type containing a guess at the
72 file's type. See also the description of the :mod:`mimetools` module.
73
74 The :meth:`geturl` method returns the real URL of the page. In some cases, the
75 HTTP server redirects a client to another URL. The :func:`urlopen` function
76 handles this transparently, but in some cases the caller needs to know which URL
77 the client was redirected to. The :meth:`geturl` method can be used to get at
78 this redirected URL.
79
Georg Brandl9b0d46d2008-01-20 11:43:03 +000080 The :meth:`getcode` method returns the HTTP status code that was sent with the
81 response, or ``None`` if the URL is no HTTP URL.
82
Georg Brandl8ec7f652007-08-15 14:28:01 +000083 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
84 argument may be given to specify a ``POST`` request (normally the request type
85 is ``GET``). The *data* argument must be in standard
86 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
87 function below.
88
89 The :func:`urlopen` function works transparently with proxies which do not
90 require authentication. In a Unix or Windows environment, set the
91 :envvar:`http_proxy`, or :envvar:`ftp_proxy` environment variables to a URL that
92 identifies the proxy server before starting the Python interpreter. For example
93 (the ``'%'`` is the command prompt)::
94
95 % http_proxy="http://www.someproxy.com:3128"
96 % export http_proxy
97 % python
98 ...
99
Georg Brandl22350112008-01-20 12:05:43 +0000100 The :envvar:`no_proxy` environment variable can be used to specify hosts which
101 shouldn't be reached via proxy; if set, it should be a comma-separated list
102 of hostname suffixes, optionally with ``:port`` appended, for example
103 ``cern.ch,ncsa.uiuc.edu,some.host:8080``.
104
Georg Brandl8ec7f652007-08-15 14:28:01 +0000105 In a Windows environment, if no proxy environment variables are set, proxy
106 settings are obtained from the registry's Internet Settings section.
107
108 .. index:: single: Internet Config
109
Senthil Kumaran45a505f2009-10-18 01:24:41 +0000110 In a Mac OS X environment, :func:`urlopen` will retrieve proxy information
111 from the OS X System Configuration Framework, which can be managed with
112 Network System Preferences panel.
113
Georg Brandl8ec7f652007-08-15 14:28:01 +0000114
115 Alternatively, the optional *proxies* argument may be used to explicitly specify
116 proxies. It must be a dictionary mapping scheme names to proxy URLs, where an
117 empty dictionary causes no proxies to be used, and ``None`` (the default value)
118 causes environmental proxy settings to be used as discussed above. For
119 example::
120
121 # Use http://www.someproxy.com:3128 for http proxying
122 proxies = {'http': 'http://www.someproxy.com:3128'}
123 filehandle = urllib.urlopen(some_url, proxies=proxies)
124 # Don't use any proxies
125 filehandle = urllib.urlopen(some_url, proxies={})
126 # Use proxies from environment - both versions are equivalent
127 filehandle = urllib.urlopen(some_url, proxies=None)
128 filehandle = urllib.urlopen(some_url)
129
Benjamin Petersonb2064732014-11-23 20:55:24 -0600130 Proxies which require authentication for use are not currently supported;
131 this is considered an implementation limitation.
132
133 The *context* parameter may be set to a :class:`ssl.SSLContext` instance to
134 configure the SSL settings that are used if :func:`urlopen` makes a HTTPS
135 connection.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000136
137 .. versionchanged:: 2.3
138 Added the *proxies* support.
139
Georg Brandl22350112008-01-20 12:05:43 +0000140 .. versionchanged:: 2.6
141 Added :meth:`getcode` to returned object and support for the
142 :envvar:`no_proxy` environment variable.
Georg Brandlc62ef8b2009-01-03 20:55:06 +0000143
Benjamin Petersonb2064732014-11-23 20:55:24 -0600144 .. versionchanged:: 2.7.9
145 The *context* parameter was added.
146
Brett Cannon8bb8fa52008-07-02 01:57:08 +0000147 .. deprecated:: 2.6
Ezio Melotti510ff542012-05-03 19:21:40 +0300148 The :func:`urlopen` function has been removed in Python 3 in favor
Brett Cannon8bb8fa52008-07-02 01:57:08 +0000149 of :func:`urllib2.urlopen`.
Georg Brandl22350112008-01-20 12:05:43 +0000150
Georg Brandl8ec7f652007-08-15 14:28:01 +0000151
152.. function:: urlretrieve(url[, filename[, reporthook[, data]]])
153
154 Copy a network object denoted by a URL to a local file, if necessary. If the URL
155 points to a local file, or a valid cached copy of the object exists, the object
156 is not copied. Return a tuple ``(filename, headers)`` where *filename* is the
157 local file name under which the object can be found, and *headers* is whatever
158 the :meth:`info` method of the object returned by :func:`urlopen` returned (for
159 a remote object, possibly cached). Exceptions are the same as for
160 :func:`urlopen`.
161
162 The second argument, if present, specifies the file location to copy to (if
163 absent, the location will be a tempfile with a generated name). The third
164 argument, if present, is a hook function that will be called once on
165 establishment of the network connection and once after each block read
166 thereafter. The hook will be passed three arguments; a count of blocks
167 transferred so far, a block size in bytes, and the total size of the file. The
168 third argument may be ``-1`` on older FTP servers which do not return a file
169 size in response to a retrieval request.
170
171 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
172 argument may be given to specify a ``POST`` request (normally the request type
173 is ``GET``). The *data* argument must in standard
174 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
175 function below.
176
177 .. versionchanged:: 2.5
178 :func:`urlretrieve` will raise :exc:`ContentTooShortError` when it detects that
179 the amount of data available was less than the expected amount (which is the
180 size reported by a *Content-Length* header). This can occur, for example, when
181 the download is interrupted.
182
183 The *Content-Length* is treated as a lower bound: if there's more data to read,
Eli Benderskyad72bb12011-04-16 15:28:42 +0300184 :func:`urlretrieve` reads more data, but if less data is available, it raises
185 the exception.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000186
187 You can still retrieve the downloaded data in this case, it is stored in the
188 :attr:`content` attribute of the exception instance.
189
Eli Benderskyad72bb12011-04-16 15:28:42 +0300190 If no *Content-Length* header was supplied, :func:`urlretrieve` can not check
191 the size of the data it has downloaded, and just returns it. In this case you
192 just have to assume that the download was successful.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000193
194
195.. data:: _urlopener
196
197 The public functions :func:`urlopen` and :func:`urlretrieve` create an instance
198 of the :class:`FancyURLopener` class and use it to perform their requested
199 actions. To override this functionality, programmers can create a subclass of
200 :class:`URLopener` or :class:`FancyURLopener`, then assign an instance of that
201 class to the ``urllib._urlopener`` variable before calling the desired function.
202 For example, applications may want to specify a different
203 :mailheader:`User-Agent` header than :class:`URLopener` defines. This can be
204 accomplished with the following code::
205
206 import urllib
207
208 class AppURLopener(urllib.FancyURLopener):
209 version = "App/1.7"
210
211 urllib._urlopener = AppURLopener()
212
213
214.. function:: urlcleanup()
215
216 Clear the cache that may have been built up by previous calls to
217 :func:`urlretrieve`.
218
219
Georg Brandl62647652008-01-07 18:23:27 +0000220Utility functions
221-----------------
222
Senthil Kumaran880685f2010-07-22 01:47:30 +0000223.. function:: quote(string[, safe])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000224
225 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran90161372009-08-31 16:40:27 +0000226 digits, and the characters ``'_.-'`` are never quoted. By default, this
R David Murray1d336512011-06-22 20:00:27 -0400227 function is intended for quoting the path section of the URL. The optional
Senthil Kumaran90161372009-08-31 16:40:27 +0000228 *safe* parameter specifies additional characters that should not be quoted
229 --- its default value is ``'/'``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000230
231 Example: ``quote('/~connolly/')`` yields ``'/%7econnolly/'``.
232
233
Senthil Kumaran880685f2010-07-22 01:47:30 +0000234.. function:: quote_plus(string[, safe])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000235
236 Like :func:`quote`, but also replaces spaces by plus signs, as required for
Georg Brandl8d31f542009-07-28 18:55:32 +0000237 quoting HTML form values when building up a query string to go into a URL.
238 Plus signs in the original string are escaped unless they are included in
239 *safe*. It also does not have *safe* default to ``'/'``.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000240
241
242.. function:: unquote(string)
243
244 Replace ``%xx`` escapes by their single-character equivalent.
245
246 Example: ``unquote('/%7Econnolly/')`` yields ``'/~connolly/'``.
247
248
249.. function:: unquote_plus(string)
250
251 Like :func:`unquote`, but also replaces plus signs by spaces, as required for
252 unquoting HTML form values.
253
254
255.. function:: urlencode(query[, doseq])
256
Benjamin Peterson53e812a2010-06-06 00:50:58 +0000257 Convert a mapping object or a sequence of two-element tuples to a
Senthil Kumaranbd13f452010-08-09 20:14:11 +0000258 "percent-encoded" string, suitable to pass to :func:`urlopen` above as the
Senthil Kumaran98bc31f2010-06-02 02:19:15 +0000259 optional *data* argument. This is useful to pass a dictionary of form
260 fields to a ``POST`` request. The resulting string is a series of
261 ``key=value`` pairs separated by ``'&'`` characters, where both *key* and
262 *value* are quoted using :func:`quote_plus` above. When a sequence of
263 two-element tuples is used as the *query* argument, the first element of
264 each tuple is a key and the second is a value. The value element in itself
265 can be a sequence and in that case, if the optional parameter *doseq* is
Benjamin Peterson11591c32010-06-06 00:54:29 +0000266 evaluates to *True*, individual ``key=value`` pairs separated by ``'&'`` are
Senthil Kumaran98bc31f2010-06-02 02:19:15 +0000267 generated for each element of the value sequence for the key. The order of
268 parameters in the encoded string will match the order of parameter tuples in
269 the sequence. The :mod:`urlparse` module provides the functions
Georg Brandl8ec7f652007-08-15 14:28:01 +0000270 :func:`parse_qs` and :func:`parse_qsl` which are used to parse query strings
271 into Python data structures.
272
273
274.. function:: pathname2url(path)
275
276 Convert the pathname *path* from the local syntax for a path to the form used in
277 the path component of a URL. This does not produce a complete URL. The return
278 value will already be quoted using the :func:`quote` function.
279
280
281.. function:: url2pathname(path)
282
Senthil Kumaranbd13f452010-08-09 20:14:11 +0000283 Convert the path component *path* from an percent-encoded URL to the local syntax for a
Georg Brandl8ec7f652007-08-15 14:28:01 +0000284 path. This does not accept a complete URL. This function uses :func:`unquote`
285 to decode *path*.
286
287
Senthil Kumaranc9941862010-02-26 00:47:05 +0000288.. function:: getproxies()
289
290 This helper function returns a dictionary of scheme to proxy server URL
Senthil Kumaran8070ddc2012-01-11 01:35:02 +0800291 mappings. It scans the environment for variables named ``<scheme>_proxy``,
292 in case insensitive way, for all operating systems first, and when it cannot
293 find it, looks for proxy information from Mac OSX System Configuration for
294 Mac OS X and Windows Systems Registry for Windows.
Senthil Kumaranc9941862010-02-26 00:47:05 +0000295
Senthil Kumarana1fb6712013-05-02 05:50:21 -0700296.. note::
297 urllib also exposes certain utility functions like splittype, splithost and
298 others parsing url into various components. But it is recommended to use
299 :mod:`urlparse` for parsing urls than using these functions directly.
300 Python 3 does not expose these helper functions from :mod:`urllib.parse`
301 module.
302
Senthil Kumaranc9941862010-02-26 00:47:05 +0000303
Georg Brandl62647652008-01-07 18:23:27 +0000304URL Opener objects
305------------------
306
Benjamin Petersonb2064732014-11-23 20:55:24 -0600307.. class:: URLopener([proxies[, context[, **x509]]])
Georg Brandl8ec7f652007-08-15 14:28:01 +0000308
309 Base class for opening and reading URLs. Unless you need to support opening
310 objects using schemes other than :file:`http:`, :file:`ftp:`, or :file:`file:`,
311 you probably want to use :class:`FancyURLopener`.
312
313 By default, the :class:`URLopener` class sends a :mailheader:`User-Agent` header
314 of ``urllib/VVV``, where *VVV* is the :mod:`urllib` version number.
315 Applications can define their own :mailheader:`User-Agent` header by subclassing
316 :class:`URLopener` or :class:`FancyURLopener` and setting the class attribute
317 :attr:`version` to an appropriate string value in the subclass definition.
318
319 The optional *proxies* parameter should be a dictionary mapping scheme names to
320 proxy URLs, where an empty dictionary turns proxies off completely. Its default
321 value is ``None``, in which case environmental proxy settings will be used if
322 present, as discussed in the definition of :func:`urlopen`, above.
323
Benjamin Petersonb2064732014-11-23 20:55:24 -0600324 The *context* parameter may be a :class:`ssl.SSLContext` instance. If given,
325 it defines the SSL settings the opener uses to make HTTPS connections.
326
Georg Brandl8ec7f652007-08-15 14:28:01 +0000327 Additional keyword parameters, collected in *x509*, may be used for
328 authentication of the client when using the :file:`https:` scheme. The keywords
329 *key_file* and *cert_file* are supported to provide an SSL key and certificate;
330 both are needed to support client authentication.
331
332 :class:`URLopener` objects will raise an :exc:`IOError` exception if the server
333 returns an error code.
334
Georg Brandl62647652008-01-07 18:23:27 +0000335 .. method:: open(fullurl[, data])
336
337 Open *fullurl* using the appropriate protocol. This method sets up cache and
338 proxy information, then calls the appropriate open method with its input
339 arguments. If the scheme is not recognized, :meth:`open_unknown` is called.
340 The *data* argument has the same meaning as the *data* argument of
341 :func:`urlopen`.
342
343
344 .. method:: open_unknown(fullurl[, data])
345
346 Overridable interface to open unknown URL types.
347
348
349 .. method:: retrieve(url[, filename[, reporthook[, data]]])
350
351 Retrieves the contents of *url* and places it in *filename*. The return value
352 is a tuple consisting of a local filename and either a
353 :class:`mimetools.Message` object containing the response headers (for remote
354 URLs) or ``None`` (for local URLs). The caller must then open and read the
355 contents of *filename*. If *filename* is not given and the URL refers to a
356 local file, the input filename is returned. If the URL is non-local and
357 *filename* is not given, the filename is the output of :func:`tempfile.mktemp`
358 with a suffix that matches the suffix of the last path component of the input
359 URL. If *reporthook* is given, it must be a function accepting three numeric
360 parameters. It will be called after each chunk of data is read from the
361 network. *reporthook* is ignored for local URLs.
362
363 If the *url* uses the :file:`http:` scheme identifier, the optional *data*
364 argument may be given to specify a ``POST`` request (normally the request type
365 is ``GET``). The *data* argument must in standard
366 :mimetype:`application/x-www-form-urlencoded` format; see the :func:`urlencode`
367 function below.
368
369
370 .. attribute:: version
371
372 Variable that specifies the user agent of the opener object. To get
373 :mod:`urllib` to tell servers that it is a particular user agent, set this in a
374 subclass as a class variable or in the constructor before calling the base
375 constructor.
376
Georg Brandl8ec7f652007-08-15 14:28:01 +0000377
378.. class:: FancyURLopener(...)
379
380 :class:`FancyURLopener` subclasses :class:`URLopener` providing default handling
381 for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x
382 response codes listed above, the :mailheader:`Location` header is used to fetch
383 the actual URL. For 401 response codes (authentication required), basic HTTP
384 authentication is performed. For the 30x response codes, recursion is bounded
385 by the value of the *maxtries* attribute, which defaults to 10.
386
387 For all other response codes, the method :meth:`http_error_default` is called
388 which you can override in subclasses to handle the error appropriately.
389
390 .. note::
391
392 According to the letter of :rfc:`2616`, 301 and 302 responses to POST requests
393 must not be automatically redirected without confirmation by the user. In
394 reality, browsers do allow automatic redirection of these responses, changing
395 the POST to a GET, and :mod:`urllib` reproduces this behaviour.
396
397 The parameters to the constructor are the same as those for :class:`URLopener`.
398
399 .. note::
400
401 When performing basic authentication, a :class:`FancyURLopener` instance calls
402 its :meth:`prompt_user_passwd` method. The default implementation asks the
403 users for the required information on the controlling terminal. A subclass may
404 override this method to support more appropriate behavior if needed.
405
Georg Brandl62647652008-01-07 18:23:27 +0000406 The :class:`FancyURLopener` class offers one additional method that should be
407 overloaded to provide the appropriate behavior:
408
409 .. method:: prompt_user_passwd(host, realm)
410
411 Return information needed to authenticate the user at the given host in the
412 specified security realm. The return value should be a tuple, ``(user,
413 password)``, which can be used for basic authentication.
414
415 The implementation prompts for this information on the terminal; an application
416 should override this method to use an appropriate interaction model in the local
417 environment.
Georg Brandl8ec7f652007-08-15 14:28:01 +0000418
419.. exception:: ContentTooShortError(msg[, content])
420
421 This exception is raised when the :func:`urlretrieve` function detects that the
422 amount of the downloaded data is less than the expected amount (given by the
423 *Content-Length* header). The :attr:`content` attribute stores the downloaded
424 (and supposedly truncated) data.
425
426 .. versionadded:: 2.5
427
Georg Brandl62647652008-01-07 18:23:27 +0000428
429:mod:`urllib` Restrictions
430--------------------------
Georg Brandl8ec7f652007-08-15 14:28:01 +0000431
432 .. index::
433 pair: HTTP; protocol
434 pair: FTP; protocol
435
436* Currently, only the following protocols are supported: HTTP, (versions 0.9 and
437 1.0), FTP, and local files.
438
439* The caching feature of :func:`urlretrieve` has been disabled until I find the
440 time to hack proper processing of Expiration time headers.
441
442* There should be a function to query whether a particular URL is in the cache.
443
444* For backward compatibility, if a URL appears to point to a local file but the
445 file can't be opened, the URL is re-interpreted using the FTP protocol. This
446 can sometimes cause confusing error messages.
447
448* The :func:`urlopen` and :func:`urlretrieve` functions can cause arbitrarily
449 long delays while waiting for a network connection to be set up. This means
450 that it is difficult to build an interactive Web client using these functions
451 without using threads.
452
453 .. index::
454 single: HTML
455 pair: HTTP; protocol
456 module: htmllib
457
458* The data returned by :func:`urlopen` or :func:`urlretrieve` is the raw data
459 returned by the server. This may be binary data (such as an image), plain text
460 or (for example) HTML. The HTTP protocol provides type information in the reply
461 header, which can be inspected by looking at the :mailheader:`Content-Type`
462 header. If the returned data is HTML, you can use the module :mod:`htmllib` to
463 parse it.
464
465 .. index:: single: FTP
466
467* The code handling the FTP protocol cannot differentiate between a file and a
468 directory. This can lead to unexpected behavior when attempting to read a URL
469 that points to a file that is not accessible. If the URL ends in a ``/``, it is
470 assumed to refer to a directory and will be handled accordingly. But if an
471 attempt to read a file leads to a 550 error (meaning the URL cannot be found or
472 is not accessible, often for permission reasons), then the path is treated as a
473 directory in order to handle the case when a directory is specified by a URL but
474 the trailing ``/`` has been left off. This can cause misleading results when
475 you try to fetch a file whose read permissions make it inaccessible; the FTP
476 code will try to read it, fail with a 550 error, and then perform a directory
477 listing for the unreadable file. If fine-grained control is needed, consider
Éric Araujoc75f2652011-03-20 18:34:24 +0100478 using the :mod:`ftplib` module, subclassing :class:`FancyURLopener`, or changing
Georg Brandl8ec7f652007-08-15 14:28:01 +0000479 *_urlopener* to meet your needs.
480
481* This module does not support the use of proxies which require authentication.
482 This may be implemented in the future.
483
484 .. index:: module: urlparse
485
486* Although the :mod:`urllib` module contains (undocumented) routines to parse
487 and unparse URL strings, the recommended interface for URL manipulation is in
488 module :mod:`urlparse`.
489
490
Georg Brandl8ec7f652007-08-15 14:28:01 +0000491.. _urllib-examples:
492
493Examples
494--------
495
496Here is an example session that uses the ``GET`` method to retrieve a URL
497containing parameters::
498
499 >>> import urllib
500 >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
501 >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
502 >>> print f.read()
503
504The following example uses the ``POST`` method instead::
505
506 >>> import urllib
507 >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
508 >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
509 >>> print f.read()
510
511The following example uses an explicitly specified HTTP proxy, overriding
512environment settings::
513
514 >>> import urllib
515 >>> proxies = {'http': 'http://proxy.example.com:8080/'}
516 >>> opener = urllib.FancyURLopener(proxies)
517 >>> f = opener.open("http://www.python.org")
518 >>> f.read()
519
520The following example uses no proxies at all, overriding environment settings::
521
522 >>> import urllib
523 >>> opener = urllib.FancyURLopener({})
524 >>> f = opener.open("http://www.python.org/")
525 >>> f.read()
526