Blame - Doc/howto/urllib2.rst - platform/external/python/cpython2

blob: 6e1a2f30b17a3ce1736fc36dbf9856239ed9769c [file] [log] [blame]

Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	1	************************************************
				2	HOWTO Fetch Internet Resources Using urllib2
				3	************************************************
				4
				5	:Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_
				6
				7	.. note::
				8
				9	There is an French translation of an earlier revision of this
				10	HOWTO, available at `urllib2 - Le Manuel manquant
Georg Brandl	0267781	2008-03-15 00:20:19 +0000	[diff] [blame]	11	<http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	12
				13
				14
				15	Introduction
				16	============
				17
				18	.. sidebar:: Related Articles
				19
				20	You may also find useful the following article on fetching web resources
				21	with Python :
				22
				23	* `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_
				24
				25	A tutorial on Basic Authentication, with examples in Python.
				26
				27	urllib2 is a `Python <http://www.python.org>`_ module for fetching URLs
				28	(Uniform Resource Locators). It offers a very simple interface, in the form of
				29	the urlopen function. This is capable of fetching URLs using a variety of
				30	different protocols. It also offers a slightly more complex interface for
				31	handling common situations - like basic authentication, cookies, proxies and so
				32	on. These are provided by objects called handlers and openers.
				33
				34	urllib2 supports fetching URLs for many "URL schemes" (identified by the string
				35	before the ":" in URL - for example "ftp" is the URL scheme of
				36	"ftp://python.org/") using their associated network protocols (e.g. FTP, HTTP).
				37	This tutorial focuses on the most common case, HTTP.
				38
				39	For straightforward situations urlopen is very easy to use. But as soon as you
				40	encounter errors or non-trivial cases when opening HTTP URLs, you will need some
				41	understanding of the HyperText Transfer Protocol. The most comprehensive and
				42	authoritative reference to HTTP is :rfc:`2616`. This is a technical document and
				43	not intended to be easy to read. This HOWTO aims to illustrate using urllib2,
				44	with enough detail about HTTP to help you through. It is not intended to replace
				45	the :mod:`urllib2` docs, but is supplementary to them.
				46
				47
				48	Fetching URLs
				49	=============
				50
				51	The simplest way to use urllib2 is as follows::
				52
				53	import urllib2
				54	response = urllib2.urlopen('http://python.org/')
				55	html = response.read()
				56
				57	Many uses of urllib2 will be that simple (note that instead of an 'http:' URL we
				58	could have used an URL starting with 'ftp:', 'file:', etc.). However, it's the
				59	purpose of this tutorial to explain the more complicated cases, concentrating on
				60	HTTP.
				61
				62	HTTP is based on requests and responses - the client makes requests and servers
				63	send responses. urllib2 mirrors this with a ``Request`` object which represents
				64	the HTTP request you are making. In its simplest form you create a Request
				65	object that specifies the URL you want to fetch. Calling ``urlopen`` with this
				66	Request object returns a response object for the URL requested. This response is
				67	a file-like object, which means you can for example call ``.read()`` on the
				68	response::
				69
				70	import urllib2
				71
				72	req = urllib2.Request('http://www.voidspace.org.uk')
				73	response = urllib2.urlopen(req)
				74	the_page = response.read()
				75
				76	Note that urllib2 makes use of the same Request interface to handle all URL
				77	schemes. For example, you can make an FTP request like so::
				78
				79	req = urllib2.Request('ftp://example.com/')
				80
				81	In the case of HTTP, there are two extra things that Request objects allow you
				82	to do: First, you can pass data to be sent to the server. Second, you can pass
				83	extra information ("metadata") about the data or the about request itself, to
				84	the server - this information is sent as HTTP "headers". Let's look at each of
				85	these in turn.
				86
				87	Data
				88	----
				89
				90	Sometimes you want to send data to a URL (often the URL will refer to a CGI
				91	(Common Gateway Interface) script [#]_ or other web application). With HTTP,
				92	this is often done using what's known as a POST request. This is often what
				93	your browser does when you submit a HTML form that you filled in on the web. Not
				94	all POSTs have to come from forms: you can use a POST to transmit arbitrary data
				95	to your own application. In the common case of HTML forms, the data needs to be
				96	encoded in a standard way, and then passed to the Request object as the ``data``
				97	argument. The encoding is done using a function from the ``urllib`` library
				98	not from ``urllib2``. ::
				99
				100	import urllib
				101	import urllib2
				102
				103	url = 'http://www.someserver.com/cgi-bin/register.cgi'
				104	values = {'name' : 'Michael Foord',
				105	'location' : 'Northampton',
				106	'language' : 'Python' }
				107
				108	data = urllib.urlencode(values)
				109	req = urllib2.Request(url, data)
				110	response = urllib2.urlopen(req)
				111	the_page = response.read()
				112
				113	Note that other encodings are sometimes required (e.g. for file upload from HTML
				114	forms - see `HTML Specification, Form Submission
				115	<http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more
				116	details).
				117
				118	If you do not pass the ``data`` argument, urllib2 uses a GET request. One
				119	way in which GET and POST requests differ is that POST requests often have
				120	"side-effects": they change the state of the system in some way (for example by
				121	placing an order with the website for a hundredweight of tinned spam to be
				122	delivered to your door). Though the HTTP standard makes it clear that POSTs are
				123	intended to always cause side-effects, and GET requests never to cause
				124	side-effects, nothing prevents a GET request from having side-effects, nor a
				125	POST requests from having no side-effects. Data can also be passed in an HTTP
				126	GET request by encoding it in the URL itself.
				127
				128	This is done as follows::
				129
				130	>>> import urllib2
				131	>>> import urllib
				132	>>> data = {}
				133	>>> data['name'] = 'Somebody Here'
				134	>>> data['location'] = 'Northampton'
				135	>>> data['language'] = 'Python'
				136	>>> url_values = urllib.urlencode(data)
				137	>>> print url_values
				138	name=Somebody+Here&language=Python&location=Northampton
				139	>>> url = 'http://www.example.com/example.cgi'
				140	>>> full_url = url + '?' + url_values
				141	>>> data = urllib2.open(full_url)
				142
				143	Notice that the full URL is created by adding a ``?`` to the URL, followed by
				144	the encoded values.
				145
				146	Headers
				147	-------
				148
				149	We'll discuss here one particular HTTP header, to illustrate how to add headers
				150	to your HTTP request.
				151
				152	Some websites [#]_ dislike being browsed by programs, or send different versions
				153	to different browsers [#]_ . By default urllib2 identifies itself as
				154	``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version
				155	numbers of the Python release,
				156	e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain
				157	not work. The way a browser identifies itself is through the
				158	``User-Agent`` header [#]_. When you create a Request object you can
				159	pass a dictionary of headers in. The following example makes the same
				160	request as above, but identifies itself as a version of Internet
				161	Explorer [#]_. ::
				162
				163	import urllib
				164	import urllib2
				165
				166	url = 'http://www.someserver.com/cgi-bin/register.cgi'
				167	user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
				168	values = {'name' : 'Michael Foord',
				169	'location' : 'Northampton',
				170	'language' : 'Python' }
				171	headers = { 'User-Agent' : user_agent }
				172
				173	data = urllib.urlencode(values)
				174	req = urllib2.Request(url, data, headers)
				175	response = urllib2.urlopen(req)
				176	the_page = response.read()
				177
				178	The response also has two useful methods. See the section on `info and geturl`_
				179	which comes after we have a look at what happens when things go wrong.
				180
				181
				182	Handling Exceptions
				183	===================
				184
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	185	urlopen raises :exc:`URLError` when it cannot handle a response (though as usual
				186	with Python APIs, builtin exceptions such as
				187	:exc:`ValueError`, :exc:`TypeError` etc. may also
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	188	be raised).
				189
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	190	:exc:`HTTPError` is the subclass of :exc:`URLError` raised in the specific case of
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	191	HTTP URLs.
				192
				193	URLError
				194	--------
				195
				196	Often, URLError is raised because there is no network connection (no route to
				197	the specified server), or the specified server doesn't exist. In this case, the
				198	exception raised will have a 'reason' attribute, which is a tuple containing an
				199	error code and a text error message.
				200
				201	e.g. ::
				202
				203	>>> req = urllib2.Request('http://www.pretend_server.org')
				204	>>> try: urllib2.urlopen(req)
				205	>>> except URLError, e:
				206	>>> print e.reason
				207	>>>
				208	(4, 'getaddrinfo failed')
				209
				210
				211	HTTPError
				212	---------
				213
				214	Every HTTP response from the server contains a numeric "status code". Sometimes
				215	the status code indicates that the server is unable to fulfil the request. The
				216	default handlers will handle some of these responses for you (for example, if
				217	the response is a "redirection" that requests the client fetch the document from
				218	a different URL, urllib2 will handle that for you). For those it can't handle,
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	219	urlopen will raise an :exc:`HTTPError`. Typical errors include '404' (page not
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	220	found), '403' (request forbidden), and '401' (authentication required).
				221
				222	See section 10 of RFC 2616 for a reference on all the HTTP error codes.
				223
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	224	The :exc:`HTTPError` instance raised will have an integer 'code' attribute, which
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	225	corresponds to the error sent by the server.
				226
				227	Error Codes
				228	~~~~~~~~~~~
				229
				230	Because the default handlers handle redirects (codes in the 300 range), and
				231	codes in the 100-299 range indicate success, you will usually only see error
				232	codes in the 400-599 range.
				233
				234	``BaseHTTPServer.BaseHTTPRequestHandler.responses`` is a useful dictionary of
				235	response codes in that shows all the response codes used by RFC 2616. The
				236	dictionary is reproduced here for convenience ::
				237
				238	# Table mapping response codes to messages; entries have the
				239	# form {code: (shortmessage, longmessage)}.
				240	responses = {
				241	100: ('Continue', 'Request received, please continue'),
				242	101: ('Switching Protocols',
				243	'Switching to new protocol; obey Upgrade header'),
				244
				245	200: ('OK', 'Request fulfilled, document follows'),
				246	201: ('Created', 'Document created, URL follows'),
				247	202: ('Accepted',
				248	'Request accepted, processing continues off-line'),
				249	203: ('Non-Authoritative Information', 'Request fulfilled from cache'),
				250	204: ('No Content', 'Request fulfilled, nothing follows'),
				251	205: ('Reset Content', 'Clear input form for further input.'),
				252	206: ('Partial Content', 'Partial content follows.'),
				253
				254	300: ('Multiple Choices',
				255	'Object has several resources -- see URI list'),
				256	301: ('Moved Permanently', 'Object moved permanently -- see URI list'),
				257	302: ('Found', 'Object moved temporarily -- see URI list'),
				258	303: ('See Other', 'Object moved -- see Method and URL list'),
				259	304: ('Not Modified',
				260	'Document has not changed since given time'),
				261	305: ('Use Proxy',
				262	'You must use proxy specified in Location to access this '
				263	'resource.'),
				264	307: ('Temporary Redirect',
				265	'Object moved temporarily -- see URI list'),
				266
				267	400: ('Bad Request',
				268	'Bad request syntax or unsupported method'),
				269	401: ('Unauthorized',
				270	'No permission -- see authorization schemes'),
				271	402: ('Payment Required',
				272	'No payment -- see charging schemes'),
				273	403: ('Forbidden',
				274	'Request forbidden -- authorization will not help'),
				275	404: ('Not Found', 'Nothing matches the given URI'),
				276	405: ('Method Not Allowed',
				277	'Specified method is invalid for this server.'),
				278	406: ('Not Acceptable', 'URI not available in preferred format.'),
				279	407: ('Proxy Authentication Required', 'You must authenticate with '
				280	'this proxy before proceeding.'),
				281	408: ('Request Timeout', 'Request timed out; try again later.'),
				282	409: ('Conflict', 'Request conflict.'),
				283	410: ('Gone',
				284	'URI no longer exists and has been permanently removed.'),
				285	411: ('Length Required', 'Client must specify Content-Length.'),
				286	412: ('Precondition Failed', 'Precondition in headers is false.'),
				287	413: ('Request Entity Too Large', 'Entity is too large.'),
				288	414: ('Request-URI Too Long', 'URI is too long.'),
				289	415: ('Unsupported Media Type', 'Entity body in unsupported format.'),
				290	416: ('Requested Range Not Satisfiable',
				291	'Cannot satisfy request range.'),
				292	417: ('Expectation Failed',
				293	'Expect condition could not be satisfied.'),
				294
				295	500: ('Internal Server Error', 'Server got itself in trouble'),
				296	501: ('Not Implemented',
				297	'Server does not support this operation'),
				298	502: ('Bad Gateway', 'Invalid responses from another server/proxy.'),
				299	503: ('Service Unavailable',
				300	'The server cannot process the request due to a high load'),
				301	504: ('Gateway Timeout',
				302	'The gateway server did not receive a timely response'),
				303	505: ('HTTP Version Not Supported', 'Cannot fulfill request.'),
				304	}
				305
				306	When an error is raised the server responds by returning an HTTP error code
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	307	and an error page. You can use the :exc:`HTTPError` instance as a response on the
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	308	page returned. This means that as well as the code attribute, it also has read,
				309	geturl, and info, methods. ::
				310
				311	>>> req = urllib2.Request('http://www.python.org/fish.html')
				312	>>> try:
				313	>>> urllib2.urlopen(req)
				314	>>> except URLError, e:
				315	>>> print e.code
				316	>>> print e.read()
				317	>>>
				318	404
				319	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
				320	"http://www.w3.org/TR/html4/loose.dtd">
				321	<?xml-stylesheet href="./css/ht2html.css"
				322	type="text/css"?>
				323	<html><head><title>Error 404: File Not Found</title>
				324	...... etc...
				325
				326	Wrapping it Up
				327	--------------
				328
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	329	So if you want to be prepared for :exc:`HTTPError` or :exc:`URLError` there are two
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	330	basic approaches. I prefer the second approach.
				331
				332	Number 1
				333	~~~~~~~~
				334
				335	::
				336
				337
				338	from urllib2 import Request, urlopen, URLError, HTTPError
				339	req = Request(someurl)
				340	try:
				341	response = urlopen(req)
				342	except HTTPError, e:
				343	print 'The server couldn\'t fulfill the request.'
				344	print 'Error code: ', e.code
				345	except URLError, e:
				346	print 'We failed to reach a server.'
				347	print 'Reason: ', e.reason
				348	else:
				349	# everything is fine
				350
				351
				352	.. note::
				353
				354	The ``except HTTPError`` must come first, otherwise ``except URLError``
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	355	will also catch an :exc:`HTTPError`.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	356
				357	Number 2
				358	~~~~~~~~
				359
				360	::
				361
				362	from urllib2 import Request, urlopen, URLError
				363	req = Request(someurl)
				364	try:
				365	response = urlopen(req)
				366	except URLError, e:
				367	if hasattr(e, 'reason'):
				368	print 'We failed to reach a server.'
				369	print 'Reason: ', e.reason
				370	elif hasattr(e, 'code'):
				371	print 'The server couldn\'t fulfill the request.'
				372	print 'Error code: ', e.code
				373	else:
				374	# everything is fine
				375
				376
				377	info and geturl
				378	===============
				379
Andrew M. Kuchling	db74c8a	2008-09-30 13:00:51 +0000	[diff] [blame]	380	The response returned by urlopen (or the :exc:`HTTPError` instance) has two useful
				381	methods :meth:`info` and :meth:`geturl`.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	382
				383	geturl - this returns the real URL of the page fetched. This is useful
				384	because ``urlopen`` (or the opener object used) may have followed a
				385	redirect. The URL of the page fetched may not be the same as the URL requested.
				386
				387	info - this returns a dictionary-like object that describes the page
				388	fetched, particularly the headers sent by the server. It is currently an
				389	``httplib.HTTPMessage`` instance.
				390
				391	Typical headers include 'Content-length', 'Content-type', and so on. See the
				392	`Quick Reference to HTTP Headers <http://www.cs.tut.fi/~jkorpela/http.html>`_
				393	for a useful listing of HTTP headers with brief explanations of their meaning
				394	and use.
				395
				396
				397	Openers and Handlers
				398	====================
				399
				400	When you fetch a URL you use an opener (an instance of the perhaps
				401	confusingly-named :class:`urllib2.OpenerDirector`). Normally we have been using
				402	the default opener - via ``urlopen`` - but you can create custom
				403	openers. Openers use handlers. All the "heavy lifting" is done by the
				404	handlers. Each handler knows how to open URLs for a particular URL scheme (http,
				405	ftp, etc.), or how to handle an aspect of URL opening, for example HTTP
				406	redirections or HTTP cookies.
				407
				408	You will want to create openers if you want to fetch URLs with specific handlers
				409	installed, for example to get an opener that handles cookies, or to get an
				410	opener that does not handle redirections.
				411
				412	To create an opener, instantiate an ``OpenerDirector``, and then call
				413	``.add_handler(some_handler_instance)`` repeatedly.
				414
				415	Alternatively, you can use ``build_opener``, which is a convenience function for
				416	creating opener objects with a single function call. ``build_opener`` adds
				417	several handlers by default, but provides a quick way to add more and/or
				418	override the default handlers.
				419
				420	Other sorts of handlers you might want to can handle proxies, authentication,
				421	and other common but slightly specialised situations.
				422
				423	``install_opener`` can be used to make an ``opener`` object the (global) default
				424	opener. This means that calls to ``urlopen`` will use the opener you have
				425	installed.
				426
				427	Opener objects have an ``open`` method, which can be called directly to fetch
				428	urls in the same way as the ``urlopen`` function: there's no need to call
				429	``install_opener``, except as a convenience.
				430
				431
				432	Basic Authentication
				433	====================
				434
				435	To illustrate creating and installing a handler we will use the
				436	``HTTPBasicAuthHandler``. For a more detailed discussion of this subject --
				437	including an explanation of how Basic Authentication works - see the `Basic
				438	Authentication Tutorial
				439	<http://www.voidspace.org.uk/python/articles/authentication.shtml>`_.
				440
				441	When authentication is required, the server sends a header (as well as the 401
				442	error code) requesting authentication. This specifies the authentication scheme
				443	and a 'realm'. The header looks like : ``Www-authenticate: SCHEME
				444	realm="REALM"``.
				445
				446	e.g. ::
				447
				448	Www-authenticate: Basic realm="cPanel Users"
				449
				450
				451	The client should then retry the request with the appropriate name and password
				452	for the realm included as a header in the request. This is 'basic
				453	authentication'. In order to simplify this process we can create an instance of
				454	``HTTPBasicAuthHandler`` and an opener to use this handler.
				455
				456	The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle
				457	the mapping of URLs and realms to passwords and usernames. If you know what the
				458	realm is (from the authentication header sent by the server), then you can use a
				459	``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that
				460	case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows
				461	you to specify a default username and password for a URL. This will be supplied
				462	in the absence of you providing an alternative combination for a specific
				463	realm. We indicate this by providing ``None`` as the realm argument to the
				464	``add_password`` method.
				465
				466	The top-level URL is the first URL that requires authentication. URLs "deeper"
				467	than the URL you pass to .add_password() will also match. ::
				468
				469	# create a password manager
				470	password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
				471
				472	# Add the username and password.
Georg Brandl	fc29f27	2009-01-02 20:25:14 +0000	[diff] [blame^]	473	# If we knew the realm, we could use it instead of None.
Georg Brandl	8ec7f65	2007-08-15 14:28:01 +0000	[diff] [blame]	474	top_level_url = "http://example.com/foo/"
				475	password_mgr.add_password(None, top_level_url, username, password)
				476
				477	handler = urllib2.HTTPBasicAuthHandler(password_mgr)
				478
				479	# create "opener" (OpenerDirector instance)
				480	opener = urllib2.build_opener(handler)
				481
				482	# use the opener to fetch a URL
				483	opener.open(a_url)
				484
				485	# Install the opener.
				486	# Now all calls to urllib2.urlopen use our opener.
				487	urllib2.install_opener(opener)
				488
				489	.. note::
				490
				491	In the above example we only supplied our ``HHTPBasicAuthHandler`` to
				492	``build_opener``. By default openers have the handlers for normal situations
				493	-- ``ProxyHandler``, ``UnknownHandler``, ``HTTPHandler``,
				494	``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``,
				495	``FileHandler``, ``HTTPErrorProcessor``.
				496
				497	``top_level_url`` is in fact either a full URL (including the 'http:' scheme
				498	component and the hostname and optionally the port number)
				499	e.g. "http://example.com/" or an "authority" (i.e. the hostname,
				500	optionally including the port number) e.g. "example.com" or "example.com:8080"
				501	(the latter example includes a port number). The authority, if present, must
				502	NOT contain the "userinfo" component - for example "joe@password:example.com" is
				503	not correct.
				504
				505
				506	Proxies
				507	=======
				508
				509	urllib2 will auto-detect your proxy settings and use those. This is through
				510	the ``ProxyHandler`` which is part of the normal handler chain. Normally that's
				511	a good thing, but there are occasions when it may not be helpful [#]_. One way
				512	to do this is to setup our own ``ProxyHandler``, with no proxies defined. This
				513	is done using similar steps to setting up a `Basic Authentication`_ handler : ::
				514
				515	>>> proxy_support = urllib2.ProxyHandler({})
				516	>>> opener = urllib2.build_opener(proxy_support)
				517	>>> urllib2.install_opener(opener)
				518
				519	.. note::
				520
				521	Currently ``urllib2`` does not support fetching of ``https`` locations
				522	through a proxy. However, this can be enabled by extending urllib2 as
				523	shown in the recipe [#]_.
				524
				525
				526	Sockets and Layers
				527	==================
				528
				529	The Python support for fetching resources from the web is layered. urllib2 uses
				530	the httplib library, which in turn uses the socket library.
				531
				532	As of Python 2.3 you can specify how long a socket should wait for a response
				533	before timing out. This can be useful in applications which have to fetch web
				534	pages. By default the socket module has no timeout and can hang. Currently,
				535	the socket timeout is not exposed at the httplib or urllib2 levels. However,
				536	you can set the default timeout globally for all sockets using ::
				537
				538	import socket
				539	import urllib2
				540
				541	# timeout in seconds
				542	timeout = 10
				543	socket.setdefaulttimeout(timeout)
				544
				545	# this call to urllib2.urlopen now uses the default timeout
				546	# we have set in the socket module
				547	req = urllib2.Request('http://www.voidspace.org.uk')
				548	response = urllib2.urlopen(req)
				549
				550
				551	-------
				552
				553
				554	Footnotes
				555	=========
				556
				557	This document was reviewed and revised by John Lee.
				558
				559	.. [#] For an introduction to the CGI protocol see
				560	`Writing Web Applications in Python <http://www.pyzine.com/Issue008/Section_Articles/article_CGIOne.html>`_.
				561	.. [#] Like Google for example. The proper way to use google from a program
				562	is to use `PyGoogle <http://pygoogle.sourceforge.net>`_ of course. See
				563	`Voidspace Google <http://www.voidspace.org.uk/python/recipebook.shtml#google>`_
				564	for some examples of using the Google API.
				565	.. [#] Browser sniffing is a very bad practise for website design - building
				566	sites using web standards is much more sensible. Unfortunately a lot of
				567	sites still send different versions to different browsers.
				568	.. [#] The user agent for MSIE 6 is
				569	'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'
				570	.. [#] For details of more HTTP request headers, see
				571	`Quick Reference to HTTP Headers`_.
				572	.. [#] In my case I have to use a proxy to access the internet at work. If you
				573	attempt to fetch localhost URLs through this proxy it blocks them. IE
				574	is set to use the proxy, which urllib2 picks up on. In order to test
				575	scripts with a localhost server, I have to prevent urllib2 from using
				576	the proxy.
				577	.. [#] urllib2 opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe
				578	<http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/456195>`_.
				579