Georg Brandl | 9e4ff75 | 2009-12-19 17:57:51 +0000 | [diff] [blame] | 1 | .. _urllib-howto: |
| 2 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 3 | *********************************************************** |
| 4 | HOWTO Fetch Internet Resources Using The urllib Package |
| 5 | *********************************************************** |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 6 | |
| 7 | :Author: `Michael Foord <http://www.voidspace.org.uk/python/index.shtml>`_ |
| 8 | |
| 9 | .. note:: |
| 10 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 11 | There is a French translation of an earlier revision of this |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 12 | HOWTO, available at `urllib2 - Le Manuel manquant |
Christian Heimes | dd15f6c | 2008-03-16 00:07:10 +0000 | [diff] [blame] | 13 | <http://www.voidspace.org.uk/python/articles/urllib2_francais.shtml>`_. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 15 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 16 | |
| 17 | Introduction |
| 18 | ============ |
| 19 | |
| 20 | .. sidebar:: Related Articles |
| 21 | |
| 22 | You may also find useful the following article on fetching web resources |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 23 | with Python: |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 24 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 25 | * `Basic Authentication <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_ |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 26 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 27 | A tutorial on *Basic Authentication*, with examples in Python. |
| 28 | |
Georg Brandl | e73778c | 2014-10-29 08:36:35 +0100 | [diff] [blame] | 29 | **urllib.request** is a Python module for fetching URLs |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 30 | (Uniform Resource Locators). It offers a very simple interface, in the form of |
| 31 | the *urlopen* function. This is capable of fetching URLs using a variety of |
| 32 | different protocols. It also offers a slightly more complex interface for |
| 33 | handling common situations - like basic authentication, cookies, proxies and so |
| 34 | on. These are provided by objects called handlers and openers. |
| 35 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 36 | urllib.request supports fetching URLs for many "URL schemes" (identified by the string |
Serhiy Storchaka | d97b7dc | 2017-05-16 23:18:09 +0300 | [diff] [blame] | 37 | before the ``":"`` in URL - for example ``"ftp"`` is the URL scheme of |
| 38 | ``"ftp://python.org/"``) using their associated network protocols (e.g. FTP, HTTP). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 39 | This tutorial focuses on the most common case, HTTP. |
| 40 | |
| 41 | For straightforward situations *urlopen* is very easy to use. But as soon as you |
| 42 | encounter errors or non-trivial cases when opening HTTP URLs, you will need some |
| 43 | understanding of the HyperText Transfer Protocol. The most comprehensive and |
| 44 | authoritative reference to HTTP is :rfc:`2616`. This is a technical document and |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 45 | not intended to be easy to read. This HOWTO aims to illustrate using *urllib*, |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 46 | with enough detail about HTTP to help you through. It is not intended to replace |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 47 | the :mod:`urllib.request` docs, but is supplementary to them. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 48 | |
| 49 | |
| 50 | Fetching URLs |
| 51 | ============= |
| 52 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 53 | The simplest way to use urllib.request is as follows:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 54 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 55 | import urllib.request |
Berker Peksag | 9575e18 | 2015-04-12 13:52:49 +0300 | [diff] [blame] | 56 | with urllib.request.urlopen('http://python.org/') as response: |
| 57 | html = response.read() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 58 | |
Andrés Delfino | c89b221 | 2018-04-16 11:02:56 -0300 | [diff] [blame^] | 59 | If you wish to retrieve a resource via URL and store it in a temporary |
| 60 | location, you can do so via the :func:`shutil.copyfileobj` and |
| 61 | :func:`tempfile.NamedTemporaryFile` functions:: |
Senthil Kumaran | e24f96a | 2012-03-13 19:29:33 -0700 | [diff] [blame] | 62 | |
Andrés Delfino | c89b221 | 2018-04-16 11:02:56 -0300 | [diff] [blame^] | 63 | import shutil |
| 64 | import tempfile |
Senthil Kumaran | e24f96a | 2012-03-13 19:29:33 -0700 | [diff] [blame] | 65 | import urllib.request |
Andrés Delfino | c89b221 | 2018-04-16 11:02:56 -0300 | [diff] [blame^] | 66 | |
| 67 | with urllib.request.urlopen('http://python.org/') as response: |
| 68 | with tempfile.NamedTemporaryFile(delete=False) as tmp_file: |
| 69 | shutil.copyfileobj(response, tmp_file) |
| 70 | |
| 71 | with open(tmp_file.name) as html: |
| 72 | pass |
Senthil Kumaran | e24f96a | 2012-03-13 19:29:33 -0700 | [diff] [blame] | 73 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 74 | Many uses of urllib will be that simple (note that instead of an 'http:' URL we |
Martin Panter | 6245cb3 | 2016-04-15 02:14:19 +0000 | [diff] [blame] | 75 | could have used a URL starting with 'ftp:', 'file:', etc.). However, it's the |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 76 | purpose of this tutorial to explain the more complicated cases, concentrating on |
| 77 | HTTP. |
| 78 | |
| 79 | HTTP is based on requests and responses - the client makes requests and servers |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 80 | send responses. urllib.request mirrors this with a ``Request`` object which represents |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 81 | the HTTP request you are making. In its simplest form you create a Request |
| 82 | object that specifies the URL you want to fetch. Calling ``urlopen`` with this |
| 83 | Request object returns a response object for the URL requested. This response is |
| 84 | a file-like object, which means you can for example call ``.read()`` on the |
| 85 | response:: |
| 86 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 87 | import urllib.request |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 88 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 89 | req = urllib.request.Request('http://www.voidspace.org.uk') |
Berker Peksag | 9575e18 | 2015-04-12 13:52:49 +0300 | [diff] [blame] | 90 | with urllib.request.urlopen(req) as response: |
| 91 | the_page = response.read() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 92 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 93 | Note that urllib.request makes use of the same Request interface to handle all URL |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 94 | schemes. For example, you can make an FTP request like so:: |
| 95 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 96 | req = urllib.request.Request('ftp://example.com/') |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 97 | |
| 98 | In the case of HTTP, there are two extra things that Request objects allow you |
| 99 | to do: First, you can pass data to be sent to the server. Second, you can pass |
| 100 | extra information ("metadata") *about* the data or the about request itself, to |
| 101 | the server - this information is sent as HTTP "headers". Let's look at each of |
| 102 | these in turn. |
| 103 | |
| 104 | Data |
| 105 | ---- |
| 106 | |
| 107 | Sometimes you want to send data to a URL (often the URL will refer to a CGI |
Berker Peksag | fd6400a | 2014-07-01 06:02:42 +0300 | [diff] [blame] | 108 | (Common Gateway Interface) script or other web application). With HTTP, |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 109 | this is often done using what's known as a **POST** request. This is often what |
| 110 | your browser does when you submit a HTML form that you filled in on the web. Not |
| 111 | all POSTs have to come from forms: you can use a POST to transmit arbitrary data |
| 112 | to your own application. In the common case of HTML forms, the data needs to be |
| 113 | encoded in a standard way, and then passed to the Request object as the ``data`` |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 114 | argument. The encoding is done using a function from the :mod:`urllib.parse` |
| 115 | library. :: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 116 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 117 | import urllib.parse |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 118 | import urllib.request |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 119 | |
| 120 | url = 'http://www.someserver.com/cgi-bin/register.cgi' |
| 121 | values = {'name' : 'Michael Foord', |
| 122 | 'location' : 'Northampton', |
| 123 | 'language' : 'Python' } |
| 124 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 125 | data = urllib.parse.urlencode(values) |
Martin Panter | cda85a0 | 2015-11-24 22:33:18 +0000 | [diff] [blame] | 126 | data = data.encode('ascii') # data should be bytes |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 127 | req = urllib.request.Request(url, data) |
Berker Peksag | 9575e18 | 2015-04-12 13:52:49 +0300 | [diff] [blame] | 128 | with urllib.request.urlopen(req) as response: |
| 129 | the_page = response.read() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 130 | |
| 131 | Note that other encodings are sometimes required (e.g. for file upload from HTML |
| 132 | forms - see `HTML Specification, Form Submission |
Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 133 | <https://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13>`_ for more |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 134 | details). |
| 135 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 136 | If you do not pass the ``data`` argument, urllib uses a **GET** request. One |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 137 | way in which GET and POST requests differ is that POST requests often have |
| 138 | "side-effects": they change the state of the system in some way (for example by |
| 139 | placing an order with the website for a hundredweight of tinned spam to be |
| 140 | delivered to your door). Though the HTTP standard makes it clear that POSTs are |
| 141 | intended to *always* cause side-effects, and GET requests *never* to cause |
| 142 | side-effects, nothing prevents a GET request from having side-effects, nor a |
| 143 | POST requests from having no side-effects. Data can also be passed in an HTTP |
| 144 | GET request by encoding it in the URL itself. |
| 145 | |
| 146 | This is done as follows:: |
| 147 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 148 | >>> import urllib.request |
| 149 | >>> import urllib.parse |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 150 | >>> data = {} |
| 151 | >>> data['name'] = 'Somebody Here' |
| 152 | >>> data['location'] = 'Northampton' |
| 153 | >>> data['language'] = 'Python' |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 154 | >>> url_values = urllib.parse.urlencode(data) |
Senthil Kumaran | 570bc4c | 2012-10-09 00:38:17 -0700 | [diff] [blame] | 155 | >>> print(url_values) # The order may differ from below. #doctest: +SKIP |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 156 | name=Somebody+Here&language=Python&location=Northampton |
| 157 | >>> url = 'http://www.example.com/example.cgi' |
| 158 | >>> full_url = url + '?' + url_values |
Georg Brandl | 06ad13e | 2011-07-23 08:04:40 +0200 | [diff] [blame] | 159 | >>> data = urllib.request.urlopen(full_url) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | Notice that the full URL is created by adding a ``?`` to the URL, followed by |
| 162 | the encoded values. |
| 163 | |
| 164 | Headers |
| 165 | ------- |
| 166 | |
| 167 | We'll discuss here one particular HTTP header, to illustrate how to add headers |
| 168 | to your HTTP request. |
| 169 | |
| 170 | Some websites [#]_ dislike being browsed by programs, or send different versions |
Serhiy Storchaka | a4d170d | 2013-12-23 18:20:51 +0200 | [diff] [blame] | 171 | to different browsers [#]_. By default urllib identifies itself as |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 172 | ``Python-urllib/x.y`` (where ``x`` and ``y`` are the major and minor version |
| 173 | numbers of the Python release, |
| 174 | e.g. ``Python-urllib/2.5``), which may confuse the site, or just plain |
| 175 | not work. The way a browser identifies itself is through the |
| 176 | ``User-Agent`` header [#]_. When you create a Request object you can |
| 177 | pass a dictionary of headers in. The following example makes the same |
| 178 | request as above, but identifies itself as a version of Internet |
| 179 | Explorer [#]_. :: |
| 180 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 181 | import urllib.parse |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 182 | import urllib.request |
| 183 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 184 | url = 'http://www.someserver.com/cgi-bin/register.cgi' |
Benjamin Peterson | 95acbce | 2015-09-20 23:16:45 +0500 | [diff] [blame] | 185 | user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' |
Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 186 | values = {'name': 'Michael Foord', |
| 187 | 'location': 'Northampton', |
| 188 | 'language': 'Python' } |
| 189 | headers = {'User-Agent': user_agent} |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 190 | |
Martin Panter | cda85a0 | 2015-11-24 22:33:18 +0000 | [diff] [blame] | 191 | data = urllib.parse.urlencode(values) |
| 192 | data = data.encode('ascii') |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 193 | req = urllib.request.Request(url, data, headers) |
Berker Peksag | 9575e18 | 2015-04-12 13:52:49 +0300 | [diff] [blame] | 194 | with urllib.request.urlopen(req) as response: |
| 195 | the_page = response.read() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 196 | |
| 197 | The response also has two useful methods. See the section on `info and geturl`_ |
| 198 | which comes after we have a look at what happens when things go wrong. |
| 199 | |
| 200 | |
| 201 | Handling Exceptions |
| 202 | =================== |
| 203 | |
Georg Brandl | 22b3431 | 2009-07-26 14:54:51 +0000 | [diff] [blame] | 204 | *urlopen* raises :exc:`URLError` when it cannot handle a response (though as |
| 205 | usual with Python APIs, built-in exceptions such as :exc:`ValueError`, |
| 206 | :exc:`TypeError` etc. may also be raised). |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 207 | |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 208 | :exc:`HTTPError` is the subclass of :exc:`URLError` raised in the specific case of |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 209 | HTTP URLs. |
| 210 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 211 | The exception classes are exported from the :mod:`urllib.error` module. |
| 212 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 213 | URLError |
| 214 | -------- |
| 215 | |
| 216 | Often, URLError is raised because there is no network connection (no route to |
| 217 | the specified server), or the specified server doesn't exist. In this case, the |
| 218 | exception raised will have a 'reason' attribute, which is a tuple containing an |
| 219 | error code and a text error message. |
| 220 | |
| 221 | e.g. :: |
| 222 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 223 | >>> req = urllib.request.Request('http://www.pretend_server.org') |
| 224 | >>> try: urllib.request.urlopen(req) |
Senthil Kumaran | 570bc4c | 2012-10-09 00:38:17 -0700 | [diff] [blame] | 225 | ... except urllib.error.URLError as e: |
Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 226 | ... print(e.reason) #doctest: +SKIP |
Senthil Kumaran | 570bc4c | 2012-10-09 00:38:17 -0700 | [diff] [blame] | 227 | ... |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 228 | (4, 'getaddrinfo failed') |
| 229 | |
| 230 | |
| 231 | HTTPError |
| 232 | --------- |
| 233 | |
| 234 | Every HTTP response from the server contains a numeric "status code". Sometimes |
| 235 | the status code indicates that the server is unable to fulfil the request. The |
| 236 | default handlers will handle some of these responses for you (for example, if |
| 237 | the response is a "redirection" that requests the client fetch the document from |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 238 | a different URL, urllib will handle that for you). For those it can't handle, |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 239 | urlopen will raise an :exc:`HTTPError`. Typical errors include '404' (page not |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 240 | found), '403' (request forbidden), and '401' (authentication required). |
| 241 | |
| 242 | See section 10 of RFC 2616 for a reference on all the HTTP error codes. |
| 243 | |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 244 | The :exc:`HTTPError` instance raised will have an integer 'code' attribute, which |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 245 | corresponds to the error sent by the server. |
| 246 | |
| 247 | Error Codes |
| 248 | ~~~~~~~~~~~ |
| 249 | |
| 250 | Because the default handlers handle redirects (codes in the 300 range), and |
Serhiy Storchaka | c7b1a0b | 2016-11-26 13:43:28 +0200 | [diff] [blame] | 251 | codes in the 100--299 range indicate success, you will usually only see error |
| 252 | codes in the 400--599 range. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 253 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 254 | :attr:`http.server.BaseHTTPRequestHandler.responses` is a useful dictionary of |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 255 | response codes in that shows all the response codes used by RFC 2616. The |
| 256 | dictionary is reproduced here for convenience :: |
| 257 | |
| 258 | # Table mapping response codes to messages; entries have the |
| 259 | # form {code: (shortmessage, longmessage)}. |
| 260 | responses = { |
| 261 | 100: ('Continue', 'Request received, please continue'), |
| 262 | 101: ('Switching Protocols', |
| 263 | 'Switching to new protocol; obey Upgrade header'), |
| 264 | |
| 265 | 200: ('OK', 'Request fulfilled, document follows'), |
| 266 | 201: ('Created', 'Document created, URL follows'), |
| 267 | 202: ('Accepted', |
| 268 | 'Request accepted, processing continues off-line'), |
| 269 | 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), |
| 270 | 204: ('No Content', 'Request fulfilled, nothing follows'), |
| 271 | 205: ('Reset Content', 'Clear input form for further input.'), |
| 272 | 206: ('Partial Content', 'Partial content follows.'), |
| 273 | |
| 274 | 300: ('Multiple Choices', |
| 275 | 'Object has several resources -- see URI list'), |
| 276 | 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), |
| 277 | 302: ('Found', 'Object moved temporarily -- see URI list'), |
| 278 | 303: ('See Other', 'Object moved -- see Method and URL list'), |
| 279 | 304: ('Not Modified', |
| 280 | 'Document has not changed since given time'), |
| 281 | 305: ('Use Proxy', |
| 282 | 'You must use proxy specified in Location to access this ' |
| 283 | 'resource.'), |
| 284 | 307: ('Temporary Redirect', |
| 285 | 'Object moved temporarily -- see URI list'), |
| 286 | |
| 287 | 400: ('Bad Request', |
| 288 | 'Bad request syntax or unsupported method'), |
| 289 | 401: ('Unauthorized', |
| 290 | 'No permission -- see authorization schemes'), |
| 291 | 402: ('Payment Required', |
| 292 | 'No payment -- see charging schemes'), |
| 293 | 403: ('Forbidden', |
| 294 | 'Request forbidden -- authorization will not help'), |
| 295 | 404: ('Not Found', 'Nothing matches the given URI'), |
| 296 | 405: ('Method Not Allowed', |
| 297 | 'Specified method is invalid for this server.'), |
| 298 | 406: ('Not Acceptable', 'URI not available in preferred format.'), |
| 299 | 407: ('Proxy Authentication Required', 'You must authenticate with ' |
| 300 | 'this proxy before proceeding.'), |
| 301 | 408: ('Request Timeout', 'Request timed out; try again later.'), |
| 302 | 409: ('Conflict', 'Request conflict.'), |
| 303 | 410: ('Gone', |
| 304 | 'URI no longer exists and has been permanently removed.'), |
| 305 | 411: ('Length Required', 'Client must specify Content-Length.'), |
| 306 | 412: ('Precondition Failed', 'Precondition in headers is false.'), |
| 307 | 413: ('Request Entity Too Large', 'Entity is too large.'), |
| 308 | 414: ('Request-URI Too Long', 'URI is too long.'), |
| 309 | 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), |
| 310 | 416: ('Requested Range Not Satisfiable', |
| 311 | 'Cannot satisfy request range.'), |
| 312 | 417: ('Expectation Failed', |
| 313 | 'Expect condition could not be satisfied.'), |
| 314 | |
| 315 | 500: ('Internal Server Error', 'Server got itself in trouble'), |
| 316 | 501: ('Not Implemented', |
| 317 | 'Server does not support this operation'), |
| 318 | 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), |
| 319 | 503: ('Service Unavailable', |
| 320 | 'The server cannot process the request due to a high load'), |
| 321 | 504: ('Gateway Timeout', |
| 322 | 'The gateway server did not receive a timely response'), |
| 323 | 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), |
| 324 | } |
| 325 | |
| 326 | When an error is raised the server responds by returning an HTTP error code |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 327 | *and* an error page. You can use the :exc:`HTTPError` instance as a response on the |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 328 | page returned. This means that as well as the code attribute, it also has read, |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 329 | geturl, and info, methods as returned by the ``urllib.response`` module:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 330 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 331 | >>> req = urllib.request.Request('http://www.python.org/fish.html') |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 332 | >>> try: |
Senthil Kumaran | 570bc4c | 2012-10-09 00:38:17 -0700 | [diff] [blame] | 333 | ... urllib.request.urlopen(req) |
| 334 | ... except urllib.error.HTTPError as e: |
| 335 | ... print(e.code) |
| 336 | ... print(e.read()) #doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
| 337 | ... |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 338 | 404 |
Senthil Kumaran | 570bc4c | 2012-10-09 00:38:17 -0700 | [diff] [blame] | 339 | b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| 340 | "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html |
| 341 | ... |
| 342 | <title>Page Not Found</title>\n |
| 343 | ... |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 344 | |
| 345 | Wrapping it Up |
| 346 | -------------- |
| 347 | |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 348 | So if you want to be prepared for :exc:`HTTPError` *or* :exc:`URLError` there are two |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 349 | basic approaches. I prefer the second approach. |
| 350 | |
| 351 | Number 1 |
| 352 | ~~~~~~~~ |
| 353 | |
| 354 | :: |
| 355 | |
| 356 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 357 | from urllib.request import Request, urlopen |
| 358 | from urllib.error import URLError, HTTPError |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 359 | req = Request(someurl) |
| 360 | try: |
| 361 | response = urlopen(req) |
Michael Foord | 20b50b1 | 2009-05-12 11:19:14 +0000 | [diff] [blame] | 362 | except HTTPError as e: |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 363 | print('The server couldn\'t fulfill the request.') |
| 364 | print('Error code: ', e.code) |
Michael Foord | 20b50b1 | 2009-05-12 11:19:14 +0000 | [diff] [blame] | 365 | except URLError as e: |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 366 | print('We failed to reach a server.') |
| 367 | print('Reason: ', e.reason) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 368 | else: |
| 369 | # everything is fine |
| 370 | |
| 371 | |
| 372 | .. note:: |
| 373 | |
| 374 | The ``except HTTPError`` *must* come first, otherwise ``except URLError`` |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 375 | will *also* catch an :exc:`HTTPError`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 376 | |
| 377 | Number 2 |
| 378 | ~~~~~~~~ |
| 379 | |
| 380 | :: |
| 381 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 382 | from urllib.request import Request, urlopen |
Serhiy Storchaka | dba9039 | 2016-05-10 12:01:23 +0300 | [diff] [blame] | 383 | from urllib.error import URLError |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 384 | req = Request(someurl) |
| 385 | try: |
| 386 | response = urlopen(req) |
Michael Foord | 20b50b1 | 2009-05-12 11:19:14 +0000 | [diff] [blame] | 387 | except URLError as e: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 388 | if hasattr(e, 'reason'): |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 389 | print('We failed to reach a server.') |
| 390 | print('Reason: ', e.reason) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 391 | elif hasattr(e, 'code'): |
Georg Brandl | 6911e3c | 2007-09-04 07:15:32 +0000 | [diff] [blame] | 392 | print('The server couldn\'t fulfill the request.') |
| 393 | print('Error code: ', e.code) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 394 | else: |
| 395 | # everything is fine |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 396 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 397 | |
| 398 | info and geturl |
| 399 | =============== |
| 400 | |
Benjamin Peterson | e5384b0 | 2008-10-04 22:00:42 +0000 | [diff] [blame] | 401 | The response returned by urlopen (or the :exc:`HTTPError` instance) has two |
| 402 | useful methods :meth:`info` and :meth:`geturl` and is defined in the module |
| 403 | :mod:`urllib.response`.. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 404 | |
| 405 | **geturl** - this returns the real URL of the page fetched. This is useful |
| 406 | because ``urlopen`` (or the opener object used) may have followed a |
| 407 | redirect. The URL of the page fetched may not be the same as the URL requested. |
| 408 | |
| 409 | **info** - this returns a dictionary-like object that describes the page |
| 410 | fetched, particularly the headers sent by the server. It is currently an |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 411 | :class:`http.client.HTTPMessage` instance. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 412 | |
| 413 | Typical headers include 'Content-length', 'Content-type', and so on. See the |
Sanyam Khurana | 338cd83 | 2018-01-20 05:55:37 +0530 | [diff] [blame] | 414 | `Quick Reference to HTTP Headers <http://jkorpela.fi/http.html>`_ |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 415 | for a useful listing of HTTP headers with brief explanations of their meaning |
| 416 | and use. |
| 417 | |
| 418 | |
| 419 | Openers and Handlers |
| 420 | ==================== |
| 421 | |
| 422 | When you fetch a URL you use an opener (an instance of the perhaps |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 423 | confusingly-named :class:`urllib.request.OpenerDirector`). Normally we have been using |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 424 | the default opener - via ``urlopen`` - but you can create custom |
| 425 | openers. Openers use handlers. All the "heavy lifting" is done by the |
| 426 | handlers. Each handler knows how to open URLs for a particular URL scheme (http, |
| 427 | ftp, etc.), or how to handle an aspect of URL opening, for example HTTP |
| 428 | redirections or HTTP cookies. |
| 429 | |
| 430 | You will want to create openers if you want to fetch URLs with specific handlers |
| 431 | installed, for example to get an opener that handles cookies, or to get an |
| 432 | opener that does not handle redirections. |
| 433 | |
| 434 | To create an opener, instantiate an ``OpenerDirector``, and then call |
| 435 | ``.add_handler(some_handler_instance)`` repeatedly. |
| 436 | |
| 437 | Alternatively, you can use ``build_opener``, which is a convenience function for |
| 438 | creating opener objects with a single function call. ``build_opener`` adds |
| 439 | several handlers by default, but provides a quick way to add more and/or |
| 440 | override the default handlers. |
| 441 | |
| 442 | Other sorts of handlers you might want to can handle proxies, authentication, |
| 443 | and other common but slightly specialised situations. |
| 444 | |
| 445 | ``install_opener`` can be used to make an ``opener`` object the (global) default |
| 446 | opener. This means that calls to ``urlopen`` will use the opener you have |
| 447 | installed. |
| 448 | |
| 449 | Opener objects have an ``open`` method, which can be called directly to fetch |
| 450 | urls in the same way as the ``urlopen`` function: there's no need to call |
| 451 | ``install_opener``, except as a convenience. |
| 452 | |
| 453 | |
| 454 | Basic Authentication |
| 455 | ==================== |
| 456 | |
| 457 | To illustrate creating and installing a handler we will use the |
| 458 | ``HTTPBasicAuthHandler``. For a more detailed discussion of this subject -- |
| 459 | including an explanation of how Basic Authentication works - see the `Basic |
| 460 | Authentication Tutorial |
| 461 | <http://www.voidspace.org.uk/python/articles/authentication.shtml>`_. |
| 462 | |
| 463 | When authentication is required, the server sends a header (as well as the 401 |
| 464 | error code) requesting authentication. This specifies the authentication scheme |
Serhiy Storchaka | f47036c | 2013-12-24 11:04:36 +0200 | [diff] [blame] | 465 | and a 'realm'. The header looks like: ``WWW-Authenticate: SCHEME |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 466 | realm="REALM"``. |
| 467 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 468 | e.g. |
| 469 | |
| 470 | .. code-block:: none |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 471 | |
Sandro Tosi | 08ccbf4 | 2012-04-24 17:36:41 +0200 | [diff] [blame] | 472 | WWW-Authenticate: Basic realm="cPanel Users" |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 473 | |
| 474 | |
| 475 | The client should then retry the request with the appropriate name and password |
| 476 | for the realm included as a header in the request. This is 'basic |
| 477 | authentication'. In order to simplify this process we can create an instance of |
| 478 | ``HTTPBasicAuthHandler`` and an opener to use this handler. |
| 479 | |
| 480 | The ``HTTPBasicAuthHandler`` uses an object called a password manager to handle |
| 481 | the mapping of URLs and realms to passwords and usernames. If you know what the |
| 482 | realm is (from the authentication header sent by the server), then you can use a |
| 483 | ``HTTPPasswordMgr``. Frequently one doesn't care what the realm is. In that |
| 484 | case, it is convenient to use ``HTTPPasswordMgrWithDefaultRealm``. This allows |
| 485 | you to specify a default username and password for a URL. This will be supplied |
| 486 | in the absence of you providing an alternative combination for a specific |
| 487 | realm. We indicate this by providing ``None`` as the realm argument to the |
| 488 | ``add_password`` method. |
| 489 | |
| 490 | The top-level URL is the first URL that requires authentication. URLs "deeper" |
| 491 | than the URL you pass to .add_password() will also match. :: |
| 492 | |
| 493 | # create a password manager |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 494 | password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 495 | |
| 496 | # Add the username and password. |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 497 | # If we knew the realm, we could use it instead of None. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 498 | top_level_url = "http://example.com/foo/" |
| 499 | password_mgr.add_password(None, top_level_url, username, password) |
| 500 | |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 501 | handler = urllib.request.HTTPBasicAuthHandler(password_mgr) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 502 | |
| 503 | # create "opener" (OpenerDirector instance) |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 504 | opener = urllib.request.build_opener(handler) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 505 | |
| 506 | # use the opener to fetch a URL |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 507 | opener.open(a_url) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 508 | |
| 509 | # Install the opener. |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 510 | # Now all calls to urllib.request.urlopen use our opener. |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 511 | urllib.request.install_opener(opener) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 512 | |
| 513 | .. note:: |
| 514 | |
Ezio Melotti | 8e87fec | 2009-07-21 20:37:52 +0000 | [diff] [blame] | 515 | In the above example we only supplied our ``HTTPBasicAuthHandler`` to |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 516 | ``build_opener``. By default openers have the handlers for normal situations |
R David Murray | 5aea37a | 2013-04-28 11:07:16 -0400 | [diff] [blame] | 517 | -- ``ProxyHandler`` (if a proxy setting such as an :envvar:`http_proxy` |
| 518 | environment variable is set), ``UnknownHandler``, ``HTTPHandler``, |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 519 | ``HTTPDefaultErrorHandler``, ``HTTPRedirectHandler``, ``FTPHandler``, |
R David Murray | 5aea37a | 2013-04-28 11:07:16 -0400 | [diff] [blame] | 520 | ``FileHandler``, ``DataHandler``, ``HTTPErrorProcessor``. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 521 | |
| 522 | ``top_level_url`` is in fact *either* a full URL (including the 'http:' scheme |
| 523 | component and the hostname and optionally the port number) |
Serhiy Storchaka | d97b7dc | 2017-05-16 23:18:09 +0300 | [diff] [blame] | 524 | e.g. ``"http://example.com/"`` *or* an "authority" (i.e. the hostname, |
| 525 | optionally including the port number) e.g. ``"example.com"`` or ``"example.com:8080"`` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 526 | (the latter example includes a port number). The authority, if present, must |
Serhiy Storchaka | d97b7dc | 2017-05-16 23:18:09 +0300 | [diff] [blame] | 527 | NOT contain the "userinfo" component - for example ``"joe:password@example.com"`` is |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 528 | not correct. |
| 529 | |
| 530 | |
| 531 | Proxies |
| 532 | ======= |
| 533 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 534 | **urllib** will auto-detect your proxy settings and use those. This is through |
R David Murray | 5aea37a | 2013-04-28 11:07:16 -0400 | [diff] [blame] | 535 | the ``ProxyHandler``, which is part of the normal handler chain when a proxy |
R David Murray | 9330a94 | 2013-04-28 11:24:35 -0400 | [diff] [blame] | 536 | setting is detected. Normally that's a good thing, but there are occasions |
| 537 | when it may not be helpful [#]_. One way to do this is to setup our own |
| 538 | ``ProxyHandler``, with no proxies defined. This is done using similar steps to |
Serhiy Storchaka | f47036c | 2013-12-24 11:04:36 +0200 | [diff] [blame] | 539 | setting up a `Basic Authentication`_ handler: :: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 540 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 541 | >>> proxy_support = urllib.request.ProxyHandler({}) |
| 542 | >>> opener = urllib.request.build_opener(proxy_support) |
| 543 | >>> urllib.request.install_opener(opener) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 544 | |
| 545 | .. note:: |
| 546 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 547 | Currently ``urllib.request`` *does not* support fetching of ``https`` locations |
| 548 | through a proxy. However, this can be enabled by extending urllib.request as |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 549 | shown in the recipe [#]_. |
| 550 | |
Senthil Kumaran | 4cbb23f | 2016-07-30 23:24:16 -0700 | [diff] [blame] | 551 | .. note:: |
| 552 | |
Senthil Kumaran | 17742f2 | 2016-07-30 23:39:06 -0700 | [diff] [blame] | 553 | ``HTTP_PROXY`` will be ignored if a variable ``REQUEST_METHOD`` is set; see |
| 554 | the documentation on :func:`~urllib.request.getproxies`. |
Senthil Kumaran | 4cbb23f | 2016-07-30 23:24:16 -0700 | [diff] [blame] | 555 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 556 | |
| 557 | Sockets and Layers |
| 558 | ================== |
| 559 | |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 560 | The Python support for fetching resources from the web is layered. urllib uses |
| 561 | the :mod:`http.client` library, which in turn uses the socket library. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 562 | |
| 563 | As of Python 2.3 you can specify how long a socket should wait for a response |
| 564 | before timing out. This can be useful in applications which have to fetch web |
| 565 | pages. By default the socket module has *no timeout* and can hang. Currently, |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 566 | the socket timeout is not exposed at the http.client or urllib.request levels. |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 567 | However, you can set the default timeout globally for all sockets using :: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 568 | |
| 569 | import socket |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 570 | import urllib.request |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 571 | |
| 572 | # timeout in seconds |
| 573 | timeout = 10 |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 574 | socket.setdefaulttimeout(timeout) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 575 | |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 576 | # this call to urllib.request.urlopen now uses the default timeout |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 577 | # we have set in the socket module |
Senthil Kumaran | aca8fd7 | 2008-06-23 04:41:59 +0000 | [diff] [blame] | 578 | req = urllib.request.Request('http://www.voidspace.org.uk') |
| 579 | response = urllib.request.urlopen(req) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 580 | |
| 581 | |
| 582 | ------- |
| 583 | |
| 584 | |
| 585 | Footnotes |
| 586 | ========= |
| 587 | |
| 588 | This document was reviewed and revised by John Lee. |
| 589 | |
Benjamin Peterson | 16ad5cf | 2015-09-20 23:17:41 +0500 | [diff] [blame] | 590 | .. [#] Google for example. |
Martin Panter | 898573a | 2016-12-10 05:12:56 +0000 | [diff] [blame] | 591 | .. [#] Browser sniffing is a very bad practice for website design - building |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 592 | sites using web standards is much more sensible. Unfortunately a lot of |
| 593 | sites still send different versions to different browsers. |
| 594 | .. [#] The user agent for MSIE 6 is |
| 595 | *'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'* |
| 596 | .. [#] For details of more HTTP request headers, see |
| 597 | `Quick Reference to HTTP Headers`_. |
| 598 | .. [#] In my case I have to use a proxy to access the internet at work. If you |
| 599 | attempt to fetch *localhost* URLs through this proxy it blocks them. IE |
Georg Brandl | 0f7ede4 | 2008-06-23 11:23:31 +0000 | [diff] [blame] | 600 | is set to use the proxy, which urllib picks up on. In order to test |
| 601 | scripts with a localhost server, I have to prevent urllib from using |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 602 | the proxy. |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 603 | .. [#] urllib opener for SSL proxy (CONNECT method): `ASPN Cookbook Recipe |
Serhiy Storchaka | 6dff020 | 2016-05-07 10:49:07 +0300 | [diff] [blame] | 604 | <https://code.activestate.com/recipes/456195/>`_. |
Georg Brandl | 48310cd | 2009-01-03 21:18:54 +0000 | [diff] [blame] | 605 | |