blob: ebf58119d6ade6f0923e5813c0432f44b3d49f93 [file] [log] [blame]
Guido van Rossume7b146f2000-02-04 15:28:42 +00001"""An extensible library for opening URLs using a variety of protocols
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00002
3The simplest way to use this module is to call the urlopen function,
Tim Peterse1190062001-01-15 03:34:38 +00004which accepts a string containing a URL or a Request object (described
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00005below). It opens the URL and returns the results as file-like
6object; the returned object has some extra methods described below.
7
Jeremy Hyltone1906632002-10-11 17:27:55 +00008The OpenerDirector manages a collection of Handler objects that do
Tim Peterse1190062001-01-15 03:34:38 +00009all the actual work. Each Handler implements a particular protocol or
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000010option. The OpenerDirector is a composite object that invokes the
11Handlers needed to open the requested URL. For example, the
12HTTPHandler performs HTTP GET and POST requests and deals with
13non-error returns. The HTTPRedirectHandler automatically deals with
Raymond Hettinger024aaa12003-04-24 15:32:12 +000014HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler
15deals with digest authentication.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000016
Facundo Batistaca90ca82007-03-05 16:31:54 +000017urlopen(url, data=None) -- Basic usage is the same as original
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000018urllib. pass the url and optionally data to post to an HTTP URL, and
Tim Peterse1190062001-01-15 03:34:38 +000019get a file-like object back. One difference is that you can also pass
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000020a Request instance instead of URL. Raises a URLError (subclass of
21IOError); for HTTP errors, raises an HTTPError, which can also be
22treated as a valid response.
23
Facundo Batistaca90ca82007-03-05 16:31:54 +000024build_opener -- Function that creates a new OpenerDirector instance.
25Will install the default handlers. Accepts one or more Handlers as
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000026arguments, either instances or Handler classes that it will
Facundo Batistaca90ca82007-03-05 16:31:54 +000027instantiate. If one of the argument is a subclass of the default
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000028handler, the argument will be installed instead of the default.
29
Facundo Batistaca90ca82007-03-05 16:31:54 +000030install_opener -- Installs a new opener as the default opener.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000031
32objects of interest:
Senthil Kumaran51200272009-11-15 06:10:30 +000033
34OpenerDirector -- Sets up the User Agent as the Python-urllib client and manages
35the Handler classes, while dealing with requests and responses.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000036
Facundo Batistaca90ca82007-03-05 16:31:54 +000037Request -- An object that encapsulates the state of a request. The
38state can be as simple as the URL. It can also include extra HTTP
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000039headers, e.g. a User-Agent.
40
41BaseHandler --
42
43exceptions:
Facundo Batistaca90ca82007-03-05 16:31:54 +000044URLError -- A subclass of IOError, individual protocols have their own
45specific subclass.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000046
Facundo Batistaca90ca82007-03-05 16:31:54 +000047HTTPError -- Also a valid HTTP response, so you can treat an HTTP error
48as an exceptional event or valid response.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000049
50internals:
51BaseHandler and parent
52_call_chain conventions
53
54Example usage:
55
56import urllib2
57
58# set up authentication info
59authinfo = urllib2.HTTPBasicAuthHandler()
Neal Norwitz8eea9ac2007-04-24 04:53:12 +000060authinfo.add_password(realm='PDQ Application',
61 uri='https://mahler:8092/site-updates.py',
62 user='klem',
63 passwd='geheim$parole')
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000064
Moshe Zadka8a18e992001-03-01 08:40:42 +000065proxy_support = urllib2.ProxyHandler({"http" : "http://ahad-haam:3128"})
66
Tim Peterse1190062001-01-15 03:34:38 +000067# build a new opener that adds authentication and caching FTP handlers
Moshe Zadka8a18e992001-03-01 08:40:42 +000068opener = urllib2.build_opener(proxy_support, authinfo, urllib2.CacheFTPHandler)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000069
70# install it
71urllib2.install_opener(opener)
72
73f = urllib2.urlopen('http://www.python.org/')
74
75
76"""
77
78# XXX issues:
79# If an authentication error handler that tries to perform
Fred Draked5214b02001-11-08 17:19:29 +000080# authentication for some reason but fails, how should the error be
81# signalled? The client needs to know the HTTP error code. But if
82# the handler knows that the problem was, e.g., that it didn't know
83# that hash algo that requested in the challenge, it would be good to
84# pass that information along to the client, too.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000085# ftp errors aren't handled cleanly
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000086# check digest against correct (i.e. non-apache) implementation
87
Georg Brandlc5ffd912006-04-02 20:48:11 +000088# Possible extensions:
89# complex proxies XXX not sure what exactly was meant by this
90# abstract factory for opener
91
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +000092import base64
Georg Brandlbffb0bc2006-04-30 08:57:35 +000093import hashlib
Georg Brandl9d6da3e2006-05-17 15:17:00 +000094import httplib
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +000095import mimetools
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +000096import os
97import posixpath
98import random
99import re
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000100import socket
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000101import sys
102import time
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000103import urlparse
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000104import bisect
Senthil Kumaranb0d85fd2012-05-15 23:59:19 +0800105import warnings
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000106
107try:
108 from cStringIO import StringIO
109except ImportError:
110 from StringIO import StringIO
111
Georg Brandl7fff58c2006-04-02 21:13:13 +0000112from urllib import (unwrap, unquote, splittype, splithost, quote,
Senthil Kumaran01fe5fa2012-07-07 17:37:53 -0700113 addinfourl, splitport, splittag, toBytes,
Brett Cannon88f801d2008-08-18 00:46:22 +0000114 splitattr, ftpwrapper, splituser, splitpasswd, splitvalue)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000115
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000116# support for FileHandler, proxies via environment variables
Senthil Kumaran27468662009-10-11 02:00:07 +0000117from urllib import localhost, url2pathname, getproxies, proxy_bypass
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000118
Georg Brandl720096a2006-04-02 20:45:34 +0000119# used in User-Agent header sent
120__version__ = sys.version[:3]
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000121
122_opener = None
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000123def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000124 global _opener
125 if _opener is None:
126 _opener = build_opener()
Facundo Batista10951d52007-06-06 17:15:23 +0000127 return _opener.open(url, data, timeout)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000128
129def install_opener(opener):
130 global _opener
131 _opener = opener
132
133# do these error classes make sense?
Tim Peterse1190062001-01-15 03:34:38 +0000134# make sure all of the IOError stuff is overridden. we just want to be
Fred Drakea87a5212002-08-13 13:59:55 +0000135# subtypes.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000136
137class URLError(IOError):
138 # URLError is a sub-type of IOError, but it doesn't share any of
Jeremy Hylton0a4a50d2003-10-06 05:15:13 +0000139 # the implementation. need to override __init__ and __str__.
140 # It sets self.args for compatibility with other EnvironmentError
141 # subclasses, but args doesn't have the typical format with errno in
142 # slot 0 and strerror in slot 1. This may be better than nothing.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000143 def __init__(self, reason):
Jeremy Hylton0a4a50d2003-10-06 05:15:13 +0000144 self.args = reason,
Fred Drake13a2c272000-02-10 17:17:14 +0000145 self.reason = reason
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000146
147 def __str__(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000148 return '<urlopen error %s>' % self.reason
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000149
150class HTTPError(URLError, addinfourl):
151 """Raised when HTTP error occurs, but also acts like non-error return"""
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000152 __super_init = addinfourl.__init__
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000153
154 def __init__(self, url, code, msg, hdrs, fp):
Fred Drake13a2c272000-02-10 17:17:14 +0000155 self.code = code
156 self.msg = msg
157 self.hdrs = hdrs
158 self.fp = fp
Fred Drake13a2c272000-02-10 17:17:14 +0000159 self.filename = url
Jeremy Hylton40bbae32002-06-03 16:53:00 +0000160 # The addinfourl classes depend on fp being a valid file
161 # object. In some cases, the HTTPError may not have a valid
162 # file object. If this happens, the simplest workaround is to
Tim Petersc411dba2002-07-16 21:35:23 +0000163 # not initialize the base classes.
Jeremy Hylton40bbae32002-06-03 16:53:00 +0000164 if fp is not None:
Georg Brandl99bb5f32008-04-09 17:57:38 +0000165 self.__super_init(fp, hdrs, url, code)
Tim Peterse1190062001-01-15 03:34:38 +0000166
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000167 def __str__(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000168 return 'HTTP Error %s: %s' % (self.code, self.msg)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000169
Jason R. Coombs974d8632011-11-07 10:44:25 -0500170 # since URLError specifies a .reason attribute, HTTPError should also
171 # provide this attribute. See issue13211 fo discussion.
172 @property
173 def reason(self):
174 return self.msg
175
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000176# copied from cookielib.py
Neal Norwitzb678ce52006-05-18 06:51:46 +0000177_cut_port_re = re.compile(r":\d+$")
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000178def request_host(request):
179 """Return request-host, as defined by RFC 2965.
180
181 Variation from RFC: returned value is lowercased, for convenient
182 comparison.
183
184 """
185 url = request.get_full_url()
186 host = urlparse.urlparse(url)[1]
187 if host == "":
188 host = request.get_header("Host", "")
189
190 # remove port, if present
Neal Norwitzb678ce52006-05-18 06:51:46 +0000191 host = _cut_port_re.sub("", host, 1)
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000192 return host.lower()
Moshe Zadka8a18e992001-03-01 08:40:42 +0000193
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000194class Request:
Moshe Zadka8a18e992001-03-01 08:40:42 +0000195
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000196 def __init__(self, url, data=None, headers={},
197 origin_req_host=None, unverifiable=False):
Fred Drake13a2c272000-02-10 17:17:14 +0000198 # unwrap('<URL:type://host/path>') --> 'type://host/path'
Senthil Kumaran5d60e562012-07-08 02:20:27 -0700199 self.__original = unwrap(url)
Senthil Kumaran49c44082011-04-13 07:31:45 +0800200 self.__original, self.__fragment = splittag(self.__original)
Fred Drake13a2c272000-02-10 17:17:14 +0000201 self.type = None
202 # self.__r_type is what's left after doing the splittype
203 self.host = None
204 self.port = None
Senthil Kumarane266f252009-05-24 09:14:50 +0000205 self._tunnel_host = None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000206 self.data = data
Fred Drake13a2c272000-02-10 17:17:14 +0000207 self.headers = {}
Brett Cannonc8b188a2003-05-17 19:51:26 +0000208 for key, value in headers.items():
Brett Cannon86503b12003-05-12 07:29:42 +0000209 self.add_header(key, value)
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000210 self.unredirected_hdrs = {}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000211 if origin_req_host is None:
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000212 origin_req_host = request_host(self)
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000213 self.origin_req_host = origin_req_host
214 self.unverifiable = unverifiable
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000215
216 def __getattr__(self, attr):
Fred Drake13a2c272000-02-10 17:17:14 +0000217 # XXX this is a fallback mechanism to guard against these
Tim Peterse1190062001-01-15 03:34:38 +0000218 # methods getting called in a non-standard order. this may be
Fred Drake13a2c272000-02-10 17:17:14 +0000219 # too complicated and/or unnecessary.
220 # XXX should the __r_XXX attributes be public?
221 if attr[:12] == '_Request__r_':
222 name = attr[12:]
223 if hasattr(Request, 'get_' + name):
224 getattr(self, 'get_' + name)()
225 return getattr(self, attr)
226 raise AttributeError, attr
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000227
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000228 def get_method(self):
229 if self.has_data():
230 return "POST"
231 else:
232 return "GET"
233
Jeremy Hylton023518a2003-12-17 18:52:16 +0000234 # XXX these helper methods are lame
235
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000236 def add_data(self, data):
237 self.data = data
238
239 def has_data(self):
240 return self.data is not None
241
242 def get_data(self):
243 return self.data
244
245 def get_full_url(self):
Senthil Kumaran49c44082011-04-13 07:31:45 +0800246 if self.__fragment:
247 return '%s#%s' % (self.__original, self.__fragment)
248 else:
249 return self.__original
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000250
251 def get_type(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000252 if self.type is None:
253 self.type, self.__r_type = splittype(self.__original)
Jeremy Hylton78cae612001-05-09 15:49:24 +0000254 if self.type is None:
255 raise ValueError, "unknown url type: %s" % self.__original
Fred Drake13a2c272000-02-10 17:17:14 +0000256 return self.type
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000257
258 def get_host(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000259 if self.host is None:
260 self.host, self.__r_host = splithost(self.__r_type)
261 if self.host:
262 self.host = unquote(self.host)
263 return self.host
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000264
265 def get_selector(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000266 return self.__r_host
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000267
Moshe Zadka8a18e992001-03-01 08:40:42 +0000268 def set_proxy(self, host, type):
Senthil Kumarane266f252009-05-24 09:14:50 +0000269 if self.type == 'https' and not self._tunnel_host:
270 self._tunnel_host = self.host
271 else:
272 self.type = type
273 self.__r_host = self.__original
274
275 self.host = host
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000276
Facundo Batistaeb90b782008-08-16 14:44:07 +0000277 def has_proxy(self):
278 return self.__r_host == self.__original
279
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000280 def get_origin_req_host(self):
281 return self.origin_req_host
282
283 def is_unverifiable(self):
284 return self.unverifiable
285
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000286 def add_header(self, key, val):
Fred Drake13a2c272000-02-10 17:17:14 +0000287 # useful for something like authentication
Georg Brandl8c036cc2006-08-20 13:15:39 +0000288 self.headers[key.capitalize()] = val
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000289
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000290 def add_unredirected_header(self, key, val):
291 # will not be added to a redirected request
Georg Brandl8c036cc2006-08-20 13:15:39 +0000292 self.unredirected_hdrs[key.capitalize()] = val
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000293
294 def has_header(self, header_name):
Neal Norwitz1cdd3632004-06-07 03:49:50 +0000295 return (header_name in self.headers or
296 header_name in self.unredirected_hdrs)
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000297
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000298 def get_header(self, header_name, default=None):
299 return self.headers.get(
300 header_name,
301 self.unredirected_hdrs.get(header_name, default))
302
303 def header_items(self):
304 hdrs = self.unredirected_hdrs.copy()
305 hdrs.update(self.headers)
306 return hdrs.items()
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000307
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000308class OpenerDirector:
309 def __init__(self):
Georg Brandl8d457c72005-06-26 22:01:35 +0000310 client_version = "Python-urllib/%s" % __version__
Georg Brandl8c036cc2006-08-20 13:15:39 +0000311 self.addheaders = [('User-agent', client_version)]
R. David Murray14f66352010-12-23 19:50:56 +0000312 # self.handlers is retained only for backward compatibility
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000313 self.handlers = []
R. David Murray14f66352010-12-23 19:50:56 +0000314 # manage the individual handlers
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000315 self.handle_open = {}
316 self.handle_error = {}
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000317 self.process_response = {}
318 self.process_request = {}
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000319
320 def add_handler(self, handler):
Georg Brandlf91149e2007-07-12 08:05:45 +0000321 if not hasattr(handler, "add_parent"):
322 raise TypeError("expected BaseHandler instance, got %r" %
323 type(handler))
324
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000325 added = False
Jeremy Hylton8b78b992001-10-09 16:18:45 +0000326 for meth in dir(handler):
Georg Brandl261e2512006-05-29 20:52:54 +0000327 if meth in ["redirect_request", "do_open", "proxy_open"]:
328 # oops, coincidental match
329 continue
330
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000331 i = meth.find("_")
332 protocol = meth[:i]
333 condition = meth[i+1:]
334
335 if condition.startswith("error"):
Neal Norwitz1cdd3632004-06-07 03:49:50 +0000336 j = condition.find("_") + i + 1
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000337 kind = meth[j+1:]
338 try:
Eric S. Raymondb08b2d32001-02-09 11:10:16 +0000339 kind = int(kind)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000340 except ValueError:
341 pass
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000342 lookup = self.handle_error.get(protocol, {})
343 self.handle_error[protocol] = lookup
344 elif condition == "open":
345 kind = protocol
Raymond Hettingerf7bf02d2005-02-05 14:37:06 +0000346 lookup = self.handle_open
347 elif condition == "response":
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000348 kind = protocol
Raymond Hettingerf7bf02d2005-02-05 14:37:06 +0000349 lookup = self.process_response
350 elif condition == "request":
351 kind = protocol
352 lookup = self.process_request
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000353 else:
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000354 continue
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000355
356 handlers = lookup.setdefault(kind, [])
357 if handlers:
358 bisect.insort(handlers, handler)
359 else:
360 handlers.append(handler)
361 added = True
362
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000363 if added:
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000364 bisect.insort(self.handlers, handler)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000365 handler.add_parent(self)
Tim Peterse1190062001-01-15 03:34:38 +0000366
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000367 def close(self):
Jeremy Hyltondce391c2003-12-15 16:08:48 +0000368 # Only exists for backwards compatibility.
369 pass
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000370
371 def _call_chain(self, chain, kind, meth_name, *args):
Georg Brandlc5ffd912006-04-02 20:48:11 +0000372 # Handlers raise an exception if no one else should try to handle
373 # the request, or return None if they can't but another handler
374 # could. Otherwise, they return the response.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000375 handlers = chain.get(kind, ())
376 for handler in handlers:
377 func = getattr(handler, meth_name)
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000378
379 result = func(*args)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000380 if result is not None:
381 return result
382
Facundo Batista4f1b1ed2008-05-29 16:39:26 +0000383 def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
Fred Drake13a2c272000-02-10 17:17:14 +0000384 # accept a URL or a Request object
Walter Dörwald65230a22002-06-03 15:58:32 +0000385 if isinstance(fullurl, basestring):
Fred Drake13a2c272000-02-10 17:17:14 +0000386 req = Request(fullurl, data)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000387 else:
388 req = fullurl
389 if data is not None:
390 req.add_data(data)
Tim Peterse1190062001-01-15 03:34:38 +0000391
Facundo Batista10951d52007-06-06 17:15:23 +0000392 req.timeout = timeout
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000393 protocol = req.get_type()
394
395 # pre-process request
396 meth_name = protocol+"_request"
397 for processor in self.process_request.get(protocol, []):
398 meth = getattr(processor, meth_name)
399 req = meth(req)
400
401 response = self._open(req, data)
402
403 # post-process response
404 meth_name = protocol+"_response"
405 for processor in self.process_response.get(protocol, []):
406 meth = getattr(processor, meth_name)
407 response = meth(req, response)
408
409 return response
410
411 def _open(self, req, data=None):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000412 result = self._call_chain(self.handle_open, 'default',
Tim Peterse1190062001-01-15 03:34:38 +0000413 'default_open', req)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000414 if result:
415 return result
416
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000417 protocol = req.get_type()
418 result = self._call_chain(self.handle_open, protocol, protocol +
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000419 '_open', req)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000420 if result:
421 return result
422
423 return self._call_chain(self.handle_open, 'unknown',
424 'unknown_open', req)
425
426 def error(self, proto, *args):
Raymond Hettingerdbecd932005-02-06 06:57:08 +0000427 if proto in ('http', 'https'):
Fred Draked5214b02001-11-08 17:19:29 +0000428 # XXX http[s] protocols are special-cased
429 dict = self.handle_error['http'] # https is not different than http
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000430 proto = args[2] # YUCK!
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000431 meth_name = 'http_error_%s' % proto
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000432 http_err = 1
433 orig_args = args
434 else:
435 dict = self.handle_error
436 meth_name = proto + '_error'
437 http_err = 0
438 args = (dict, proto, meth_name) + args
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000439 result = self._call_chain(*args)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000440 if result:
441 return result
442
443 if http_err:
444 args = (dict, 'default', 'http_error_default') + orig_args
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000445 return self._call_chain(*args)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000446
Gustavo Niemeyer9556fba2003-06-07 17:53:08 +0000447# XXX probably also want an abstract factory that knows when it makes
448# sense to skip a superclass in favor of a subclass and when it might
449# make sense to include both
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000450
451def build_opener(*handlers):
452 """Create an opener object from a list of handlers.
453
454 The opener will use several default handlers, including support
Senthil Kumaran51200272009-11-15 06:10:30 +0000455 for HTTP, FTP and when applicable, HTTPS.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000456
457 If any of the handlers passed as arguments are subclasses of the
458 default handlers, the default handlers will not be used.
459 """
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000460 import types
461 def isclass(obj):
Benjamin Peterson4bb96fe2009-02-12 04:17:04 +0000462 return isinstance(obj, (types.ClassType, type))
Tim Peterse1190062001-01-15 03:34:38 +0000463
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000464 opener = OpenerDirector()
465 default_classes = [ProxyHandler, UnknownHandler, HTTPHandler,
466 HTTPDefaultErrorHandler, HTTPRedirectHandler,
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000467 FTPHandler, FileHandler, HTTPErrorProcessor]
Moshe Zadka8a18e992001-03-01 08:40:42 +0000468 if hasattr(httplib, 'HTTPS'):
469 default_classes.append(HTTPSHandler)
Amaury Forgeot d'Arc96865852008-04-22 21:14:41 +0000470 skip = set()
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000471 for klass in default_classes:
472 for check in handlers:
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000473 if isclass(check):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000474 if issubclass(check, klass):
Amaury Forgeot d'Arc96865852008-04-22 21:14:41 +0000475 skip.add(klass)
Jeremy Hylton8b78b992001-10-09 16:18:45 +0000476 elif isinstance(check, klass):
Amaury Forgeot d'Arc96865852008-04-22 21:14:41 +0000477 skip.add(klass)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000478 for klass in skip:
479 default_classes.remove(klass)
480
481 for klass in default_classes:
482 opener.add_handler(klass())
483
484 for h in handlers:
Georg Brandl9d6da3e2006-05-17 15:17:00 +0000485 if isclass(h):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000486 h = h()
487 opener.add_handler(h)
488 return opener
489
490class BaseHandler:
Gustavo Niemeyer9556fba2003-06-07 17:53:08 +0000491 handler_order = 500
492
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000493 def add_parent(self, parent):
494 self.parent = parent
Tim Peters58eb11c2004-01-18 20:29:55 +0000495
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000496 def close(self):
Jeremy Hyltondce391c2003-12-15 16:08:48 +0000497 # Only exists for backwards compatibility
498 pass
Tim Peters58eb11c2004-01-18 20:29:55 +0000499
Gustavo Niemeyer9556fba2003-06-07 17:53:08 +0000500 def __lt__(self, other):
501 if not hasattr(other, "handler_order"):
502 # Try to preserve the old behavior of having custom classes
503 # inserted after default ones (works only for custom user
504 # classes which are not aware of handler_order).
505 return True
506 return self.handler_order < other.handler_order
Tim Petersf545baa2003-06-15 23:26:30 +0000507
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000508
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000509class HTTPErrorProcessor(BaseHandler):
510 """Process HTTP error responses."""
511 handler_order = 1000 # after all other processing
512
513 def http_response(self, request, response):
514 code, msg, hdrs = response.code, response.msg, response.info()
515
Neal Norwitz0d4c06e2007-04-25 06:30:05 +0000516 # According to RFC 2616, "2xx" code indicates that the client's
Facundo Batista9fab9f12007-04-23 17:08:31 +0000517 # request was successfully received, understood, and accepted.
518 if not (200 <= code < 300):
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000519 response = self.parent.error(
520 'http', request, response, code, msg, hdrs)
521
522 return response
523
524 https_response = http_response
525
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000526class HTTPDefaultErrorHandler(BaseHandler):
527 def http_error_default(self, req, fp, code, msg, hdrs):
Fred Drake13a2c272000-02-10 17:17:14 +0000528 raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000529
530class HTTPRedirectHandler(BaseHandler):
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000531 # maximum number of redirections to any single URL
532 # this is needed because of the state that cookies introduce
533 max_repeats = 4
534 # maximum total number of redirections (regardless of URL) before
535 # assuming we're in a loop
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000536 max_redirections = 10
537
Jeremy Hylton03892952003-05-05 04:09:13 +0000538 def redirect_request(self, req, fp, code, msg, headers, newurl):
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000539 """Return a Request or None in response to a redirect.
540
Jeremy Hyltonaefae552003-07-10 13:30:12 +0000541 This is called by the http_error_30x methods when a
542 redirection response is received. If a redirection should
543 take place, return a new Request to allow http_error_30x to
544 perform the redirect. Otherwise, raise HTTPError if no-one
545 else should try to handle this url. Return None if you can't
546 but another Handler might.
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000547 """
Jeremy Hylton828023b2003-05-04 23:44:49 +0000548 m = req.get_method()
549 if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
Martin v. Löwis162f0812003-07-12 07:33:32 +0000550 or code in (301, 302, 303) and m == "POST"):
551 # Strictly (according to RFC 2616), 301 or 302 in response
552 # to a POST MUST NOT cause a redirection without confirmation
Jeremy Hylton828023b2003-05-04 23:44:49 +0000553 # from the user (of urllib2, in this case). In practice,
554 # essentially all clients do redirect in this case, so we
555 # do the same.
Georg Brandlddb84d72006-03-18 11:35:18 +0000556 # be conciliant with URIs containing a space
557 newurl = newurl.replace(' ', '%20')
Facundo Batista86371d62008-02-07 19:06:52 +0000558 newheaders = dict((k,v) for k,v in req.headers.items()
559 if k.lower() not in ("content-length", "content-type")
560 )
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000561 return Request(newurl,
Facundo Batista86371d62008-02-07 19:06:52 +0000562 headers=newheaders,
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000563 origin_req_host=req.get_origin_req_host(),
564 unverifiable=True)
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000565 else:
Martin v. Löwise3b67bc2003-06-14 05:51:25 +0000566 raise HTTPError(req.get_full_url(), code, msg, headers, fp)
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000567
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000568 # Implementation note: To avoid the server sending us into an
569 # infinite loop, the request object needs to track what URLs we
570 # have already seen. Do this by adding a handler-specific
571 # attribute to the Request object.
572 def http_error_302(self, req, fp, code, msg, headers):
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000573 # Some servers (incorrectly) return multiple Location headers
574 # (so probably same goes for URI). Use first header.
Raymond Hettinger54f02222002-06-01 14:18:47 +0000575 if 'location' in headers:
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000576 newurl = headers.getheaders('location')[0]
Raymond Hettinger54f02222002-06-01 14:18:47 +0000577 elif 'uri' in headers:
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000578 newurl = headers.getheaders('uri')[0]
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000579 else:
580 return
Facundo Batista94f243a2008-08-17 03:38:39 +0000581
582 # fix a possible malformed URL
583 urlparts = urlparse.urlparse(newurl)
584 if not urlparts.path:
585 urlparts = list(urlparts)
586 urlparts[2] = "/"
587 newurl = urlparse.urlunparse(urlparts)
588
Jeremy Hylton73574ee2000-10-12 18:54:18 +0000589 newurl = urlparse.urljoin(req.get_full_url(), newurl)
590
guido@google.com60a4a902011-03-24 08:07:45 -0700591 # For security reasons we do not allow redirects to protocols
guido@google.com2bc23b82011-03-24 10:44:17 -0700592 # other than HTTP, HTTPS or FTP.
guido@google.com60a4a902011-03-24 08:07:45 -0700593 newurl_lower = newurl.lower()
594 if not (newurl_lower.startswith('http://') or
guido@google.com2bc23b82011-03-24 10:44:17 -0700595 newurl_lower.startswith('https://') or
596 newurl_lower.startswith('ftp://')):
guido@google.comf1509302011-03-28 13:47:01 -0700597 raise HTTPError(newurl, code,
598 msg + " - Redirection to url '%s' is not allowed" %
599 newurl,
600 headers, fp)
guido@google.com60a4a902011-03-24 08:07:45 -0700601
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000602 # XXX Probably want to forget about the state of the current
603 # request, although that might interact poorly with other
604 # handlers that also use handler-specific request attributes
Jeremy Hylton03892952003-05-05 04:09:13 +0000605 new = self.redirect_request(req, fp, code, msg, headers, newurl)
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000606 if new is None:
607 return
608
609 # loop detection
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000610 # .redirect_dict has a key url if url was previously visited.
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000611 if hasattr(req, 'redirect_dict'):
612 visited = new.redirect_dict = req.redirect_dict
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000613 if (visited.get(newurl, 0) >= self.max_repeats or
614 len(visited) >= self.max_redirections):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000615 raise HTTPError(req.get_full_url(), code,
Jeremy Hylton54e99e82001-08-07 21:12:25 +0000616 self.inf_msg + msg, headers, fp)
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +0000617 else:
618 visited = new.redirect_dict = req.redirect_dict = {}
Martin v. Löwis2a6ba902004-05-31 18:22:40 +0000619 visited[newurl] = visited.get(newurl, 0) + 1
Jeremy Hylton54e99e82001-08-07 21:12:25 +0000620
621 # Don't close the fp until we are sure that we won't use it
Tim Petersab9ba272001-08-09 21:40:30 +0000622 # with HTTPError.
Jeremy Hylton54e99e82001-08-07 21:12:25 +0000623 fp.read()
624 fp.close()
625
Senthil Kumaran5fee4602009-07-19 02:43:43 +0000626 return self.parent.open(new, timeout=req.timeout)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000627
Raymond Hettinger024aaa12003-04-24 15:32:12 +0000628 http_error_301 = http_error_303 = http_error_307 = http_error_302
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000629
Martin v. Löwis162f0812003-07-12 07:33:32 +0000630 inf_msg = "The HTTP server returned a redirect error that would " \
Thomas Wouters7e474022000-07-16 12:04:32 +0000631 "lead to an infinite loop.\n" \
Martin v. Löwis162f0812003-07-12 07:33:32 +0000632 "The last 30x error message was:\n"
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000633
Georg Brandl720096a2006-04-02 20:45:34 +0000634
635def _parse_proxy(proxy):
636 """Return (scheme, user, password, host/port) given a URL or an authority.
637
638 If a URL is supplied, it must have an authority (host:port) component.
639 According to RFC 3986, having an authority component means the URL must
640 have two slashes after the scheme:
641
642 >>> _parse_proxy('file:/ftp.example.com/')
643 Traceback (most recent call last):
644 ValueError: proxy URL with no authority: 'file:/ftp.example.com/'
645
646 The first three items of the returned tuple may be None.
647
648 Examples of authority parsing:
649
650 >>> _parse_proxy('proxy.example.com')
651 (None, None, None, 'proxy.example.com')
652 >>> _parse_proxy('proxy.example.com:3128')
653 (None, None, None, 'proxy.example.com:3128')
654
655 The authority component may optionally include userinfo (assumed to be
656 username:password):
657
658 >>> _parse_proxy('joe:password@proxy.example.com')
659 (None, 'joe', 'password', 'proxy.example.com')
660 >>> _parse_proxy('joe:password@proxy.example.com:3128')
661 (None, 'joe', 'password', 'proxy.example.com:3128')
662
663 Same examples, but with URLs instead:
664
665 >>> _parse_proxy('http://proxy.example.com/')
666 ('http', None, None, 'proxy.example.com')
667 >>> _parse_proxy('http://proxy.example.com:3128/')
668 ('http', None, None, 'proxy.example.com:3128')
669 >>> _parse_proxy('http://joe:password@proxy.example.com/')
670 ('http', 'joe', 'password', 'proxy.example.com')
671 >>> _parse_proxy('http://joe:password@proxy.example.com:3128')
672 ('http', 'joe', 'password', 'proxy.example.com:3128')
673
674 Everything after the authority is ignored:
675
676 >>> _parse_proxy('ftp://joe:password@proxy.example.com/rubbish:3128')
677 ('ftp', 'joe', 'password', 'proxy.example.com')
678
679 Test for no trailing '/' case:
680
681 >>> _parse_proxy('http://joe:password@proxy.example.com')
682 ('http', 'joe', 'password', 'proxy.example.com')
683
684 """
Georg Brandl720096a2006-04-02 20:45:34 +0000685 scheme, r_scheme = splittype(proxy)
686 if not r_scheme.startswith("/"):
687 # authority
688 scheme = None
689 authority = proxy
690 else:
691 # URL
692 if not r_scheme.startswith("//"):
693 raise ValueError("proxy URL with no authority: %r" % proxy)
694 # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
695 # and 3.3.), path is empty or starts with '/'
696 end = r_scheme.find("/", 2)
697 if end == -1:
698 end = None
699 authority = r_scheme[2:end]
700 userinfo, hostport = splituser(authority)
701 if userinfo is not None:
702 user, password = splitpasswd(userinfo)
703 else:
704 user = password = None
705 return scheme, user, password, hostport
706
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000707class ProxyHandler(BaseHandler):
Gustavo Niemeyer9556fba2003-06-07 17:53:08 +0000708 # Proxies must be in front
709 handler_order = 100
710
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000711 def __init__(self, proxies=None):
Fred Drake13a2c272000-02-10 17:17:14 +0000712 if proxies is None:
713 proxies = getproxies()
714 assert hasattr(proxies, 'has_key'), "proxies must be a mapping"
715 self.proxies = proxies
Brett Cannondf0d87a2003-05-18 02:25:07 +0000716 for type, url in proxies.items():
Tim Peterse1190062001-01-15 03:34:38 +0000717 setattr(self, '%s_open' % type,
Fred Drake13a2c272000-02-10 17:17:14 +0000718 lambda r, proxy=url, type=type, meth=self.proxy_open: \
719 meth(r, proxy, type))
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000720
721 def proxy_open(self, req, proxy, type):
Fred Drake13a2c272000-02-10 17:17:14 +0000722 orig_type = req.get_type()
Georg Brandl720096a2006-04-02 20:45:34 +0000723 proxy_type, user, password, hostport = _parse_proxy(proxy)
Senthil Kumaran27468662009-10-11 02:00:07 +0000724
Georg Brandl720096a2006-04-02 20:45:34 +0000725 if proxy_type is None:
726 proxy_type = orig_type
Senthil Kumaran27468662009-10-11 02:00:07 +0000727
728 if req.host and proxy_bypass(req.host):
729 return None
730
Georg Brandl531ceba2006-01-21 07:20:56 +0000731 if user and password:
Georg Brandl720096a2006-04-02 20:45:34 +0000732 user_pass = '%s:%s' % (unquote(user), unquote(password))
Andrew M. Kuchling872dba42006-10-27 17:11:23 +0000733 creds = base64.b64encode(user_pass).strip()
Georg Brandl8c036cc2006-08-20 13:15:39 +0000734 req.add_header('Proxy-authorization', 'Basic ' + creds)
Georg Brandl720096a2006-04-02 20:45:34 +0000735 hostport = unquote(hostport)
736 req.set_proxy(hostport, proxy_type)
Senthil Kumaran27468662009-10-11 02:00:07 +0000737
Senthil Kumarane266f252009-05-24 09:14:50 +0000738 if orig_type == proxy_type or orig_type == 'https':
Fred Drake13a2c272000-02-10 17:17:14 +0000739 # let other handlers take care of it
Fred Drake13a2c272000-02-10 17:17:14 +0000740 return None
741 else:
742 # need to start over, because the other handlers don't
743 # grok the proxy's URL type
Georg Brandl720096a2006-04-02 20:45:34 +0000744 # e.g. if we have a constructor arg proxies like so:
745 # {'http': 'ftp://proxy.example.com'}, we may end up turning
746 # a request for http://acme.example.com/a into one for
747 # ftp://proxy.example.com/a
Senthil Kumaran5fee4602009-07-19 02:43:43 +0000748 return self.parent.open(req, timeout=req.timeout)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000749
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000750class HTTPPasswordMgr:
Georg Brandlfa42bd72006-04-30 07:06:11 +0000751
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000752 def __init__(self):
Fred Drake13a2c272000-02-10 17:17:14 +0000753 self.passwd = {}
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000754
755 def add_password(self, realm, uri, user, passwd):
Fred Drake13a2c272000-02-10 17:17:14 +0000756 # uri could be a single URI or a sequence
Walter Dörwald65230a22002-06-03 15:58:32 +0000757 if isinstance(uri, basestring):
Fred Drake13a2c272000-02-10 17:17:14 +0000758 uri = [uri]
Raymond Hettinger54f02222002-06-01 14:18:47 +0000759 if not realm in self.passwd:
Fred Drake13a2c272000-02-10 17:17:14 +0000760 self.passwd[realm] = {}
Georg Brandl2b330372006-05-28 20:23:12 +0000761 for default_port in True, False:
762 reduced_uri = tuple(
763 [self.reduce_uri(u, default_port) for u in uri])
764 self.passwd[realm][reduced_uri] = (user, passwd)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000765
766 def find_user_password(self, realm, authuri):
Fred Drake13a2c272000-02-10 17:17:14 +0000767 domains = self.passwd.get(realm, {})
Georg Brandl2b330372006-05-28 20:23:12 +0000768 for default_port in True, False:
769 reduced_authuri = self.reduce_uri(authuri, default_port)
770 for uris, authinfo in domains.iteritems():
771 for uri in uris:
772 if self.is_suburi(uri, reduced_authuri):
773 return authinfo
Fred Drake13a2c272000-02-10 17:17:14 +0000774 return None, None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000775
Georg Brandl2b330372006-05-28 20:23:12 +0000776 def reduce_uri(self, uri, default_port=True):
777 """Accept authority or URI and extract only the authority and path."""
778 # note HTTP URLs do not have a userinfo component
Georg Brandlfa42bd72006-04-30 07:06:11 +0000779 parts = urlparse.urlsplit(uri)
Fred Drake13a2c272000-02-10 17:17:14 +0000780 if parts[1]:
Georg Brandlfa42bd72006-04-30 07:06:11 +0000781 # URI
Georg Brandl2b330372006-05-28 20:23:12 +0000782 scheme = parts[0]
783 authority = parts[1]
784 path = parts[2] or '/'
Fred Drake13a2c272000-02-10 17:17:14 +0000785 else:
Georg Brandl2b330372006-05-28 20:23:12 +0000786 # host or host:port
787 scheme = None
788 authority = uri
789 path = '/'
790 host, port = splitport(authority)
791 if default_port and port is None and scheme is not None:
792 dport = {"http": 80,
793 "https": 443,
794 }.get(scheme)
795 if dport is not None:
796 authority = "%s:%d" % (host, dport)
797 return authority, path
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000798
799 def is_suburi(self, base, test):
Fred Drake13a2c272000-02-10 17:17:14 +0000800 """Check if test is below base in a URI tree
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000801
Fred Drake13a2c272000-02-10 17:17:14 +0000802 Both args must be URIs in reduced form.
803 """
804 if base == test:
Guido van Rossum8ca162f2002-04-07 06:36:23 +0000805 return True
Fred Drake13a2c272000-02-10 17:17:14 +0000806 if base[0] != test[0]:
Guido van Rossum8ca162f2002-04-07 06:36:23 +0000807 return False
Moshe Zadka8a18e992001-03-01 08:40:42 +0000808 common = posixpath.commonprefix((base[1], test[1]))
Fred Drake13a2c272000-02-10 17:17:14 +0000809 if len(common) == len(base[1]):
Guido van Rossum8ca162f2002-04-07 06:36:23 +0000810 return True
811 return False
Tim Peterse1190062001-01-15 03:34:38 +0000812
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000813
Moshe Zadka8a18e992001-03-01 08:40:42 +0000814class HTTPPasswordMgrWithDefaultRealm(HTTPPasswordMgr):
815
816 def find_user_password(self, realm, authuri):
Jeremy Hyltonaefae552003-07-10 13:30:12 +0000817 user, password = HTTPPasswordMgr.find_user_password(self, realm,
818 authuri)
Moshe Zadka8a18e992001-03-01 08:40:42 +0000819 if user is not None:
820 return user, password
821 return HTTPPasswordMgr.find_user_password(self, None, authuri)
822
823
824class AbstractBasicAuthHandler:
825
Georg Brandl172e7252007-03-07 07:39:06 +0000826 # XXX this allows for multiple auth-schemes, but will stupidly pick
827 # the last one with a realm specified.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000828
Georg Brandl33124322008-03-21 19:54:00 +0000829 # allow for double- and single-quoted realm values
830 # (single quotes are a violation of the RFC, but appear in the wild)
831 rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
Senthil Kumaran6a2a6c22012-05-15 22:24:10 +0800832 'realm=(["\']?)([^"\']*)\\2', re.I)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000833
Georg Brandl261e2512006-05-29 20:52:54 +0000834 # XXX could pre-emptively send auth info already accepted (RFC 2617,
835 # end of section 2, and section 1.2 immediately after "credentials"
836 # production).
837
Moshe Zadka8a18e992001-03-01 08:40:42 +0000838 def __init__(self, password_mgr=None):
839 if password_mgr is None:
840 password_mgr = HTTPPasswordMgr()
841 self.passwd = password_mgr
Fred Drake13a2c272000-02-10 17:17:14 +0000842 self.add_password = self.passwd.add_password
Senthil Kumaran4f0108b2010-06-01 12:40:07 +0000843 self.retried = 0
Tim Peterse1190062001-01-15 03:34:38 +0000844
Senthil Kumaran4f1ba0d2010-08-19 17:32:03 +0000845 def reset_retry_count(self):
846 self.retried = 0
847
Moshe Zadka8a18e992001-03-01 08:40:42 +0000848 def http_error_auth_reqed(self, authreq, host, req, headers):
Georg Brandlfa42bd72006-04-30 07:06:11 +0000849 # host may be an authority (without userinfo) or a URL with an
850 # authority
Moshe Zadka8a18e992001-03-01 08:40:42 +0000851 # XXX could be multiple headers
852 authreq = headers.get(authreq, None)
Senthil Kumaran4f0108b2010-06-01 12:40:07 +0000853
854 if self.retried > 5:
855 # retry sending the username:password 5 times before failing.
856 raise HTTPError(req.get_full_url(), 401, "basic auth failed",
857 headers, None)
858 else:
859 self.retried += 1
860
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000861 if authreq:
Martin v. Löwis65a79752004-08-03 12:59:55 +0000862 mo = AbstractBasicAuthHandler.rx.search(authreq)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000863 if mo:
Georg Brandl33124322008-03-21 19:54:00 +0000864 scheme, quote, realm = mo.groups()
Senthil Kumaranb0d85fd2012-05-15 23:59:19 +0800865 if quote not in ['"', "'"]:
866 warnings.warn("Basic Auth Realm was unquoted",
867 UserWarning, 2)
Eric S. Raymondb08b2d32001-02-09 11:10:16 +0000868 if scheme.lower() == 'basic':
Senthil Kumaran7e8fd5e2010-08-26 06:20:13 +0000869 response = self.retry_http_basic_auth(host, req, realm)
870 if response and response.code != 401:
871 self.retried = 0
872 return response
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000873
Moshe Zadka8a18e992001-03-01 08:40:42 +0000874 def retry_http_basic_auth(self, host, req, realm):
Georg Brandlfa42bd72006-04-30 07:06:11 +0000875 user, pw = self.passwd.find_user_password(realm, host)
Martin v. Löwis8b3e8712004-05-06 01:41:26 +0000876 if pw is not None:
Fred Drake13a2c272000-02-10 17:17:14 +0000877 raw = "%s:%s" % (user, pw)
Andrew M. Kuchling872dba42006-10-27 17:11:23 +0000878 auth = 'Basic %s' % base64.b64encode(raw).strip()
Jeremy Hylton52a17be2001-11-09 16:46:51 +0000879 if req.headers.get(self.auth_header, None) == auth:
880 return None
Senthil Kumaran8526adf2010-02-24 16:45:46 +0000881 req.add_unredirected_header(self.auth_header, auth)
Senthil Kumaran5fee4602009-07-19 02:43:43 +0000882 return self.parent.open(req, timeout=req.timeout)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000883 else:
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000884 return None
885
Georg Brandlfa42bd72006-04-30 07:06:11 +0000886
Moshe Zadka8a18e992001-03-01 08:40:42 +0000887class HTTPBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000888
Jeremy Hylton52a17be2001-11-09 16:46:51 +0000889 auth_header = 'Authorization'
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000890
Moshe Zadka8a18e992001-03-01 08:40:42 +0000891 def http_error_401(self, req, fp, code, msg, headers):
Georg Brandlfa42bd72006-04-30 07:06:11 +0000892 url = req.get_full_url()
Senthil Kumaran4f1ba0d2010-08-19 17:32:03 +0000893 response = self.http_error_auth_reqed('www-authenticate',
894 url, req, headers)
895 self.reset_retry_count()
896 return response
Moshe Zadka8a18e992001-03-01 08:40:42 +0000897
898
899class ProxyBasicAuthHandler(AbstractBasicAuthHandler, BaseHandler):
900
Georg Brandl8c036cc2006-08-20 13:15:39 +0000901 auth_header = 'Proxy-authorization'
Moshe Zadka8a18e992001-03-01 08:40:42 +0000902
903 def http_error_407(self, req, fp, code, msg, headers):
Georg Brandlfa42bd72006-04-30 07:06:11 +0000904 # http_error_auth_reqed requires that there is no userinfo component in
905 # authority. Assume there isn't one, since urllib2 does not (and
906 # should not, RFC 3986 s. 3.2.1) support requests for URLs containing
907 # userinfo.
908 authority = req.get_host()
Senthil Kumaran4f1ba0d2010-08-19 17:32:03 +0000909 response = self.http_error_auth_reqed('proxy-authenticate',
Georg Brandlfa42bd72006-04-30 07:06:11 +0000910 authority, req, headers)
Senthil Kumaran4f1ba0d2010-08-19 17:32:03 +0000911 self.reset_retry_count()
912 return response
Moshe Zadka8a18e992001-03-01 08:40:42 +0000913
914
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000915def randombytes(n):
916 """Return n random bytes."""
917 # Use /dev/urandom if it is available. Fall back to random module
918 # if not. It might be worthwhile to extend this function to use
919 # other platform-specific mechanisms for getting random bytes.
920 if os.path.exists("/dev/urandom"):
921 f = open("/dev/urandom")
922 s = f.read(n)
923 f.close()
924 return s
925 else:
926 L = [chr(random.randrange(0, 256)) for i in range(n)]
927 return "".join(L)
928
Moshe Zadka8a18e992001-03-01 08:40:42 +0000929class AbstractDigestAuthHandler:
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000930 # Digest authentication is specified in RFC 2617.
931
932 # XXX The client does not inspect the Authentication-Info header
933 # in a successful response.
934
935 # XXX It should be possible to test this implementation against
936 # a mock server that just generates a static set of challenges.
937
938 # XXX qop="auth-int" supports is shaky
Moshe Zadka8a18e992001-03-01 08:40:42 +0000939
940 def __init__(self, passwd=None):
941 if passwd is None:
Jeremy Hylton54e99e82001-08-07 21:12:25 +0000942 passwd = HTTPPasswordMgr()
Moshe Zadka8a18e992001-03-01 08:40:42 +0000943 self.passwd = passwd
Fred Drake13a2c272000-02-10 17:17:14 +0000944 self.add_password = self.passwd.add_password
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000945 self.retried = 0
946 self.nonce_count = 0
Senthil Kumaran20eb4f02009-11-15 08:36:20 +0000947 self.last_nonce = None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000948
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000949 def reset_retry_count(self):
950 self.retried = 0
951
952 def http_error_auth_reqed(self, auth_header, host, req, headers):
953 authreq = headers.get(auth_header, None)
954 if self.retried > 5:
955 # Don't fail endlessly - if we failed once, we'll probably
956 # fail a second time. Hm. Unless the Password Manager is
957 # prompting for the information. Crap. This isn't great
958 # but it's better than the current 'repeat until recursion
959 # depth exceeded' approach <wink>
Tim Peters58eb11c2004-01-18 20:29:55 +0000960 raise HTTPError(req.get_full_url(), 401, "digest auth failed",
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000961 headers, None)
962 else:
963 self.retried += 1
Fred Drake13a2c272000-02-10 17:17:14 +0000964 if authreq:
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000965 scheme = authreq.split()[0]
966 if scheme.lower() == 'digest':
Fred Drake13a2c272000-02-10 17:17:14 +0000967 return self.retry_http_digest_auth(req, authreq)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000968
969 def retry_http_digest_auth(self, req, auth):
Eric S. Raymondb08b2d32001-02-09 11:10:16 +0000970 token, challenge = auth.split(' ', 1)
Fred Drake13a2c272000-02-10 17:17:14 +0000971 chal = parse_keqv_list(parse_http_list(challenge))
972 auth = self.get_authorization(req, chal)
973 if auth:
Jeremy Hylton52a17be2001-11-09 16:46:51 +0000974 auth_val = 'Digest %s' % auth
975 if req.headers.get(self.auth_header, None) == auth_val:
976 return None
Georg Brandl852bb002006-05-03 05:05:02 +0000977 req.add_unredirected_header(self.auth_header, auth_val)
Senthil Kumaran5fee4602009-07-19 02:43:43 +0000978 resp = self.parent.open(req, timeout=req.timeout)
Fred Drake13a2c272000-02-10 17:17:14 +0000979 return resp
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000980
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000981 def get_cnonce(self, nonce):
982 # The cnonce-value is an opaque
983 # quoted string value provided by the client and used by both client
984 # and server to avoid chosen plaintext attacks, to provide mutual
985 # authentication, and to provide some message integrity protection.
986 # This isn't a fabulous effort, but it's probably Good Enough.
Georg Brandlbffb0bc2006-04-30 08:57:35 +0000987 dig = hashlib.sha1("%s:%s:%s:%s" % (self.nonce_count, nonce, time.ctime(),
988 randombytes(8))).hexdigest()
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000989 return dig[:16]
990
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +0000991 def get_authorization(self, req, chal):
Fred Drake13a2c272000-02-10 17:17:14 +0000992 try:
993 realm = chal['realm']
994 nonce = chal['nonce']
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +0000995 qop = chal.get('qop')
Fred Drake13a2c272000-02-10 17:17:14 +0000996 algorithm = chal.get('algorithm', 'MD5')
997 # mod_digest doesn't send an opaque, even though it isn't
998 # supposed to be optional
999 opaque = chal.get('opaque', None)
1000 except KeyError:
1001 return None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001002
Fred Drake13a2c272000-02-10 17:17:14 +00001003 H, KD = self.get_algorithm_impls(algorithm)
1004 if H is None:
1005 return None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001006
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001007 user, pw = self.passwd.find_user_password(realm, req.get_full_url())
Fred Drake13a2c272000-02-10 17:17:14 +00001008 if user is None:
1009 return None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001010
Fred Drake13a2c272000-02-10 17:17:14 +00001011 # XXX not implemented yet
1012 if req.has_data():
1013 entdig = self.get_entity_digest(req.get_data(), chal)
1014 else:
1015 entdig = None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001016
Fred Drake13a2c272000-02-10 17:17:14 +00001017 A1 = "%s:%s:%s" % (user, realm, pw)
Johannes Gijsberscdd625a2005-01-09 05:51:49 +00001018 A2 = "%s:%s" % (req.get_method(),
Fred Drake13a2c272000-02-10 17:17:14 +00001019 # XXX selector: what about proxies and full urls
1020 req.get_selector())
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001021 if qop == 'auth':
Senthil Kumaran20eb4f02009-11-15 08:36:20 +00001022 if nonce == self.last_nonce:
1023 self.nonce_count += 1
1024 else:
1025 self.nonce_count = 1
1026 self.last_nonce = nonce
1027
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001028 ncvalue = '%08x' % self.nonce_count
1029 cnonce = self.get_cnonce(nonce)
1030 noncebit = "%s:%s:%s:%s:%s" % (nonce, ncvalue, cnonce, qop, H(A2))
1031 respdig = KD(H(A1), noncebit)
1032 elif qop is None:
1033 respdig = KD(H(A1), "%s:%s" % (nonce, H(A2)))
1034 else:
1035 # XXX handle auth-int.
Georg Brandlff871222007-06-07 13:34:10 +00001036 raise URLError("qop '%s' is not supported." % qop)
Tim Peters58eb11c2004-01-18 20:29:55 +00001037
Fred Drake13a2c272000-02-10 17:17:14 +00001038 # XXX should the partial digests be encoded too?
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001039
Fred Drake13a2c272000-02-10 17:17:14 +00001040 base = 'username="%s", realm="%s", nonce="%s", uri="%s", ' \
1041 'response="%s"' % (user, realm, nonce, req.get_selector(),
1042 respdig)
1043 if opaque:
Jeremy Hyltonb300ae32004-12-22 14:27:19 +00001044 base += ', opaque="%s"' % opaque
Fred Drake13a2c272000-02-10 17:17:14 +00001045 if entdig:
Jeremy Hyltonb300ae32004-12-22 14:27:19 +00001046 base += ', digest="%s"' % entdig
1047 base += ', algorithm="%s"' % algorithm
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001048 if qop:
Jeremy Hyltonb300ae32004-12-22 14:27:19 +00001049 base += ', qop=auth, nc=%s, cnonce="%s"' % (ncvalue, cnonce)
Fred Drake13a2c272000-02-10 17:17:14 +00001050 return base
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001051
1052 def get_algorithm_impls(self, algorithm):
Georg Brandl8d66dcd2008-05-04 21:40:44 +00001053 # algorithm should be case-insensitive according to RFC2617
1054 algorithm = algorithm.upper()
Fred Drake13a2c272000-02-10 17:17:14 +00001055 # lambdas assume digest modules are imported at the top level
1056 if algorithm == 'MD5':
Georg Brandlbffb0bc2006-04-30 08:57:35 +00001057 H = lambda x: hashlib.md5(x).hexdigest()
Fred Drake13a2c272000-02-10 17:17:14 +00001058 elif algorithm == 'SHA':
Georg Brandlbffb0bc2006-04-30 08:57:35 +00001059 H = lambda x: hashlib.sha1(x).hexdigest()
Fred Drake13a2c272000-02-10 17:17:14 +00001060 # XXX MD5-sess
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001061 KD = lambda s, d: H("%s:%s" % (s, d))
Fred Drake13a2c272000-02-10 17:17:14 +00001062 return H, KD
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001063
1064 def get_entity_digest(self, data, chal):
Fred Drake13a2c272000-02-10 17:17:14 +00001065 # XXX not implemented yet
1066 return None
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001067
Moshe Zadka8a18e992001-03-01 08:40:42 +00001068
1069class HTTPDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
1070 """An authentication protocol defined by RFC 2069
1071
1072 Digest authentication improves on basic authentication because it
1073 does not transmit passwords in the clear.
1074 """
1075
Jeremy Hyltonaefae552003-07-10 13:30:12 +00001076 auth_header = 'Authorization'
Georg Brandl261e2512006-05-29 20:52:54 +00001077 handler_order = 490 # before Basic auth
Moshe Zadka8a18e992001-03-01 08:40:42 +00001078
1079 def http_error_401(self, req, fp, code, msg, headers):
1080 host = urlparse.urlparse(req.get_full_url())[1]
Tim Peters58eb11c2004-01-18 20:29:55 +00001081 retry = self.http_error_auth_reqed('www-authenticate',
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001082 host, req, headers)
1083 self.reset_retry_count()
1084 return retry
Moshe Zadka8a18e992001-03-01 08:40:42 +00001085
1086
1087class ProxyDigestAuthHandler(BaseHandler, AbstractDigestAuthHandler):
1088
Jeremy Hyltonaefae552003-07-10 13:30:12 +00001089 auth_header = 'Proxy-Authorization'
Georg Brandl261e2512006-05-29 20:52:54 +00001090 handler_order = 490 # before Basic auth
Moshe Zadka8a18e992001-03-01 08:40:42 +00001091
1092 def http_error_407(self, req, fp, code, msg, headers):
1093 host = req.get_host()
Tim Peters58eb11c2004-01-18 20:29:55 +00001094 retry = self.http_error_auth_reqed('proxy-authenticate',
Jeremy Hyltonfcefd0d2003-10-21 18:07:07 +00001095 host, req, headers)
1096 self.reset_retry_count()
1097 return retry
Tim Peterse1190062001-01-15 03:34:38 +00001098
Moshe Zadka8a18e992001-03-01 08:40:42 +00001099class AbstractHTTPHandler(BaseHandler):
1100
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001101 def __init__(self, debuglevel=0):
1102 self._debuglevel = debuglevel
1103
1104 def set_http_debuglevel(self, level):
1105 self._debuglevel = level
1106
Martin v. Löwis2a6ba902004-05-31 18:22:40 +00001107 def do_request_(self, request):
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001108 host = request.get_host()
1109 if not host:
1110 raise URLError('no host given')
1111
1112 if request.has_data(): # POST
1113 data = request.get_data()
Georg Brandl8c036cc2006-08-20 13:15:39 +00001114 if not request.has_header('Content-type'):
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001115 request.add_unredirected_header(
Georg Brandl8c036cc2006-08-20 13:15:39 +00001116 'Content-type',
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001117 'application/x-www-form-urlencoded')
Georg Brandl8c036cc2006-08-20 13:15:39 +00001118 if not request.has_header('Content-length'):
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001119 request.add_unredirected_header(
Georg Brandl8c036cc2006-08-20 13:15:39 +00001120 'Content-length', '%d' % len(data))
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001121
Facundo Batistaeb90b782008-08-16 14:44:07 +00001122 sel_host = host
1123 if request.has_proxy():
1124 scheme, sel = splittype(request.get_selector())
1125 sel_host, sel_path = splithost(sel)
1126
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001127 if not request.has_header('Host'):
Facundo Batistaeb90b782008-08-16 14:44:07 +00001128 request.add_unredirected_header('Host', sel_host)
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001129 for name, value in self.parent.addheaders:
Georg Brandl8c036cc2006-08-20 13:15:39 +00001130 name = name.capitalize()
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001131 if not request.has_header(name):
1132 request.add_unredirected_header(name, value)
1133
1134 return request
1135
Moshe Zadka8a18e992001-03-01 08:40:42 +00001136 def do_open(self, http_class, req):
Jeremy Hylton023518a2003-12-17 18:52:16 +00001137 """Return an addinfourl object for the request, using http_class.
1138
1139 http_class must implement the HTTPConnection API from httplib.
1140 The addinfourl return value is a file-like object. It also
1141 has methods and attributes including:
1142 - info(): return a mimetools.Message object for the headers
1143 - geturl(): return the original request URL
1144 - code: HTTP status code
1145 """
Moshe Zadka76676802001-04-11 07:44:53 +00001146 host = req.get_host()
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001147 if not host:
1148 raise URLError('no host given')
1149
Facundo Batista10951d52007-06-06 17:15:23 +00001150 h = http_class(host, timeout=req.timeout) # will parse host:port
Jeremy Hyltonc1be59f2003-12-14 05:27:34 +00001151 h.set_debuglevel(self._debuglevel)
Tim Peterse1190062001-01-15 03:34:38 +00001152
Senthil Kumaran176c73d2010-09-27 01:40:59 +00001153 headers = dict(req.unredirected_hdrs)
1154 headers.update(dict((k, v) for k, v in req.headers.items()
1155 if k not in headers))
1156
Jeremy Hyltonb3ee6f92004-02-24 19:40:35 +00001157 # We want to make an HTTP/1.1 request, but the addinfourl
1158 # class isn't prepared to deal with a persistent connection.
1159 # It will try to read all remaining data from the socket,
1160 # which will block while the server waits for the next request.
1161 # So make sure the connection gets closed after the (only)
1162 # request.
1163 headers["Connection"] = "close"
Georg Brandl8c036cc2006-08-20 13:15:39 +00001164 headers = dict(
1165 (name.title(), val) for name, val in headers.items())
Senthil Kumarane266f252009-05-24 09:14:50 +00001166
1167 if req._tunnel_host:
Senthil Kumaran7713acf2009-12-20 06:05:13 +00001168 tunnel_headers = {}
1169 proxy_auth_hdr = "Proxy-Authorization"
1170 if proxy_auth_hdr in headers:
1171 tunnel_headers[proxy_auth_hdr] = headers[proxy_auth_hdr]
1172 # Proxy-Authorization should not be sent to origin
1173 # server.
1174 del headers[proxy_auth_hdr]
1175 h.set_tunnel(req._tunnel_host, headers=tunnel_headers)
Senthil Kumarane266f252009-05-24 09:14:50 +00001176
Jeremy Hylton828023b2003-05-04 23:44:49 +00001177 try:
Jeremy Hylton023518a2003-12-17 18:52:16 +00001178 h.request(req.get_method(), req.get_selector(), req.data, headers)
Senthil Kumaran7d7702b2011-07-27 09:37:17 +08001179 except socket.error, err: # XXX what error?
1180 h.close()
1181 raise URLError(err)
1182 else:
Kristján Valur Jónsson3c43fcb2009-01-11 16:23:37 +00001183 try:
1184 r = h.getresponse(buffering=True)
Senthil Kumaran7d7702b2011-07-27 09:37:17 +08001185 except TypeError: # buffering kw not supported
Kristján Valur Jónsson3c43fcb2009-01-11 16:23:37 +00001186 r = h.getresponse()
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001187
Andrew M. Kuchlingf9ea7c02004-07-10 15:34:34 +00001188 # Pick apart the HTTPResponse object to get the addinfourl
Jeremy Hylton5d9c3032004-08-07 17:40:50 +00001189 # object initialized properly.
1190
1191 # Wrap the HTTPResponse object in socket's file object adapter
1192 # for Windows. That adapter calls recv(), so delegate recv()
1193 # to read(). This weird wrapping allows the returned object to
1194 # have readline() and readlines() methods.
Tim Peters9ca3f852004-08-08 01:05:14 +00001195
Jeremy Hylton5d9c3032004-08-07 17:40:50 +00001196 # XXX It might be better to extract the read buffering code
1197 # out of socket._fileobject() and into a base class.
Tim Peters9ca3f852004-08-08 01:05:14 +00001198
Jeremy Hylton5d9c3032004-08-07 17:40:50 +00001199 r.recv = r.read
Georg Brandldd7b0522007-01-21 10:35:10 +00001200 fp = socket._fileobject(r, close=True)
Tim Peters9ca3f852004-08-08 01:05:14 +00001201
Jeremy Hylton5d9c3032004-08-07 17:40:50 +00001202 resp = addinfourl(fp, r.msg, req.get_full_url())
Andrew M. Kuchlingf9ea7c02004-07-10 15:34:34 +00001203 resp.code = r.status
1204 resp.msg = r.reason
1205 return resp
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001206
Moshe Zadka8a18e992001-03-01 08:40:42 +00001207
1208class HTTPHandler(AbstractHTTPHandler):
1209
1210 def http_open(self, req):
Jeremy Hylton023518a2003-12-17 18:52:16 +00001211 return self.do_open(httplib.HTTPConnection, req)
Moshe Zadka8a18e992001-03-01 08:40:42 +00001212
Martin v. Löwis2a6ba902004-05-31 18:22:40 +00001213 http_request = AbstractHTTPHandler.do_request_
Moshe Zadka8a18e992001-03-01 08:40:42 +00001214
1215if hasattr(httplib, 'HTTPS'):
1216 class HTTPSHandler(AbstractHTTPHandler):
1217
1218 def https_open(self, req):
Jeremy Hylton023518a2003-12-17 18:52:16 +00001219 return self.do_open(httplib.HTTPSConnection, req)
Moshe Zadka8a18e992001-03-01 08:40:42 +00001220
Martin v. Löwis2a6ba902004-05-31 18:22:40 +00001221 https_request = AbstractHTTPHandler.do_request_
1222
1223class HTTPCookieProcessor(BaseHandler):
1224 def __init__(self, cookiejar=None):
Georg Brandl9d6da3e2006-05-17 15:17:00 +00001225 import cookielib
Martin v. Löwis2a6ba902004-05-31 18:22:40 +00001226 if cookiejar is None:
Neal Norwitz1cdd3632004-06-07 03:49:50 +00001227 cookiejar = cookielib.CookieJar()
Martin v. Löwis2a6ba902004-05-31 18:22:40 +00001228 self.cookiejar = cookiejar
1229
1230 def http_request(self, request):
1231 self.cookiejar.add_cookie_header(request)
1232 return request
1233
1234 def http_response(self, request, response):
1235 self.cookiejar.extract_cookies(response, request)
1236 return response
1237
1238 https_request = http_request
1239 https_response = http_response
Moshe Zadka8a18e992001-03-01 08:40:42 +00001240
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001241class UnknownHandler(BaseHandler):
1242 def unknown_open(self, req):
Fred Drake13a2c272000-02-10 17:17:14 +00001243 type = req.get_type()
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001244 raise URLError('unknown url type: %s' % type)
1245
1246def parse_keqv_list(l):
1247 """Parse list of key=value strings where keys are not duplicated."""
1248 parsed = {}
1249 for elt in l:
Eric S. Raymondb08b2d32001-02-09 11:10:16 +00001250 k, v = elt.split('=', 1)
Fred Drake13a2c272000-02-10 17:17:14 +00001251 if v[0] == '"' and v[-1] == '"':
1252 v = v[1:-1]
1253 parsed[k] = v
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001254 return parsed
1255
1256def parse_http_list(s):
1257 """Parse lists as described by RFC 2068 Section 2.
Tim Peters9e34c042005-08-26 15:20:46 +00001258
Andrew M. Kuchling22ab06e2004-04-06 19:43:03 +00001259 In particular, parse comma-separated lists where the elements of
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001260 the list may include quoted-strings. A quoted-string could
Georg Brandle1b13d22005-08-24 22:20:32 +00001261 contain a comma. A non-quoted string could have quotes in the
1262 middle. Neither commas nor quotes count if they are escaped.
1263 Only double-quotes count, not single-quotes.
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001264 """
Georg Brandle1b13d22005-08-24 22:20:32 +00001265 res = []
1266 part = ''
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001267
Georg Brandle1b13d22005-08-24 22:20:32 +00001268 escape = quote = False
1269 for cur in s:
1270 if escape:
1271 part += cur
1272 escape = False
1273 continue
1274 if quote:
1275 if cur == '\\':
1276 escape = True
Fred Drake13a2c272000-02-10 17:17:14 +00001277 continue
Georg Brandle1b13d22005-08-24 22:20:32 +00001278 elif cur == '"':
1279 quote = False
1280 part += cur
1281 continue
1282
1283 if cur == ',':
1284 res.append(part)
1285 part = ''
1286 continue
1287
1288 if cur == '"':
1289 quote = True
Tim Peters9e34c042005-08-26 15:20:46 +00001290
Georg Brandle1b13d22005-08-24 22:20:32 +00001291 part += cur
1292
1293 # append last part
1294 if part:
1295 res.append(part)
1296
1297 return [part.strip() for part in res]
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001298
Senthil Kumaran7cc0fe42010-08-11 18:18:22 +00001299def _safe_gethostbyname(host):
1300 try:
1301 return socket.gethostbyname(host)
1302 except socket.gaierror:
1303 return None
1304
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001305class FileHandler(BaseHandler):
1306 # Use local file or FTP depending on form of URL
1307 def file_open(self, req):
Fred Drake13a2c272000-02-10 17:17:14 +00001308 url = req.get_selector()
Senthil Kumaran87ed31a2010-07-11 03:18:51 +00001309 if url[:2] == '//' and url[2:3] != '/' and (req.host and
1310 req.host != 'localhost'):
Fred Drake13a2c272000-02-10 17:17:14 +00001311 req.type = 'ftp'
1312 return self.parent.open(req)
1313 else:
1314 return self.open_local_file(req)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001315
1316 # names for the localhost
1317 names = None
1318 def get_names(self):
Fred Drake13a2c272000-02-10 17:17:14 +00001319 if FileHandler.names is None:
Georg Brandl4eb521e2006-04-02 20:37:17 +00001320 try:
Senthil Kumaran13c2ef92009-12-27 09:11:09 +00001321 FileHandler.names = tuple(
1322 socket.gethostbyname_ex('localhost')[2] +
1323 socket.gethostbyname_ex(socket.gethostname())[2])
Georg Brandl4eb521e2006-04-02 20:37:17 +00001324 except socket.gaierror:
1325 FileHandler.names = (socket.gethostbyname('localhost'),)
Fred Drake13a2c272000-02-10 17:17:14 +00001326 return FileHandler.names
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001327
1328 # not entirely sure what the rules are here
1329 def open_local_file(self, req):
Georg Brandl5a096e12007-01-22 19:40:21 +00001330 import email.utils
Georg Brandl9d6da3e2006-05-17 15:17:00 +00001331 import mimetypes
Fred Drake13a2c272000-02-10 17:17:14 +00001332 host = req.get_host()
Senthil Kumaran18e4dd72010-05-08 05:00:11 +00001333 filename = req.get_selector()
1334 localfile = url2pathname(filename)
Georg Brandlceede5c2007-03-13 08:14:27 +00001335 try:
1336 stats = os.stat(localfile)
1337 size = stats.st_size
1338 modified = email.utils.formatdate(stats.st_mtime, usegmt=True)
Senthil Kumaran18e4dd72010-05-08 05:00:11 +00001339 mtype = mimetypes.guess_type(filename)[0]
Georg Brandlceede5c2007-03-13 08:14:27 +00001340 headers = mimetools.Message(StringIO(
1341 'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
1342 (mtype or 'text/plain', size, modified)))
1343 if host:
1344 host, port = splitport(host)
1345 if not host or \
Senthil Kumaran7cc0fe42010-08-11 18:18:22 +00001346 (not port and _safe_gethostbyname(host) in self.get_names()):
Senthil Kumaran18e4dd72010-05-08 05:00:11 +00001347 if host:
1348 origurl = 'file://' + host + filename
1349 else:
1350 origurl = 'file://' + filename
1351 return addinfourl(open(localfile, 'rb'), headers, origurl)
Georg Brandlceede5c2007-03-13 08:14:27 +00001352 except OSError, msg:
1353 # urllib2 users shouldn't expect OSErrors coming from urlopen()
1354 raise URLError(msg)
Fred Drake13a2c272000-02-10 17:17:14 +00001355 raise URLError('file not on local host')
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001356
1357class FTPHandler(BaseHandler):
1358 def ftp_open(self, req):
Georg Brandl9d6da3e2006-05-17 15:17:00 +00001359 import ftplib
1360 import mimetypes
Fred Drake13a2c272000-02-10 17:17:14 +00001361 host = req.get_host()
1362 if not host:
Neal Norwitz70700942008-01-24 07:40:51 +00001363 raise URLError('ftp error: no host given')
Martin v. Löwisa79449e2004-02-15 21:19:18 +00001364 host, port = splitport(host)
1365 if port is None:
1366 port = ftplib.FTP_PORT
Kurt B. Kaiser3f7cb5d2004-07-11 17:14:13 +00001367 else:
1368 port = int(port)
Martin v. Löwisa79449e2004-02-15 21:19:18 +00001369
1370 # username/password handling
1371 user, host = splituser(host)
1372 if user:
1373 user, passwd = splitpasswd(user)
1374 else:
1375 passwd = None
1376 host = unquote(host)
Senthil Kumaran9fce5512010-11-20 11:24:08 +00001377 user = user or ''
1378 passwd = passwd or ''
Martin v. Löwisa79449e2004-02-15 21:19:18 +00001379
Jeremy Hylton73574ee2000-10-12 18:54:18 +00001380 try:
1381 host = socket.gethostbyname(host)
1382 except socket.error, msg:
1383 raise URLError(msg)
Fred Drake13a2c272000-02-10 17:17:14 +00001384 path, attrs = splitattr(req.get_selector())
Eric S. Raymondb08b2d32001-02-09 11:10:16 +00001385 dirs = path.split('/')
Martin v. Löwis7db04e72004-02-15 20:51:39 +00001386 dirs = map(unquote, dirs)
Fred Drake13a2c272000-02-10 17:17:14 +00001387 dirs, file = dirs[:-1], dirs[-1]
1388 if dirs and not dirs[0]:
1389 dirs = dirs[1:]
Fred Drake13a2c272000-02-10 17:17:14 +00001390 try:
Facundo Batista10951d52007-06-06 17:15:23 +00001391 fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
Fred Drake13a2c272000-02-10 17:17:14 +00001392 type = file and 'I' or 'D'
1393 for attr in attrs:
Kurt B. Kaiser3f7cb5d2004-07-11 17:14:13 +00001394 attr, value = splitvalue(attr)
Eric S. Raymondb08b2d32001-02-09 11:10:16 +00001395 if attr.lower() == 'type' and \
Fred Drake13a2c272000-02-10 17:17:14 +00001396 value in ('a', 'A', 'i', 'I', 'd', 'D'):
Eric S. Raymondb08b2d32001-02-09 11:10:16 +00001397 type = value.upper()
Fred Drake13a2c272000-02-10 17:17:14 +00001398 fp, retrlen = fw.retrfile(file, type)
Guido van Rossum833a8d82001-08-24 13:10:13 +00001399 headers = ""
1400 mtype = mimetypes.guess_type(req.get_full_url())[0]
1401 if mtype:
Georg Brandl8c036cc2006-08-20 13:15:39 +00001402 headers += "Content-type: %s\n" % mtype
Fred Drake13a2c272000-02-10 17:17:14 +00001403 if retrlen is not None and retrlen >= 0:
Georg Brandl8c036cc2006-08-20 13:15:39 +00001404 headers += "Content-length: %d\n" % retrlen
Guido van Rossum833a8d82001-08-24 13:10:13 +00001405 sf = StringIO(headers)
1406 headers = mimetools.Message(sf)
Fred Drake13a2c272000-02-10 17:17:14 +00001407 return addinfourl(fp, headers, req.get_full_url())
1408 except ftplib.all_errors, msg:
Neal Norwitz70700942008-01-24 07:40:51 +00001409 raise URLError, ('ftp error: %s' % msg), sys.exc_info()[2]
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001410
Facundo Batista10951d52007-06-06 17:15:23 +00001411 def connect_ftp(self, user, passwd, host, port, dirs, timeout):
Nadeem Vawdab42c53e2011-07-23 15:51:16 +02001412 fw = ftpwrapper(user, passwd, host, port, dirs, timeout,
1413 persistent=False)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001414## fw.ftp.set_debuglevel(1)
1415 return fw
1416
1417class CacheFTPHandler(FTPHandler):
1418 # XXX would be nice to have pluggable cache strategies
1419 # XXX this stuff is definitely not thread safe
1420 def __init__(self):
1421 self.cache = {}
1422 self.timeout = {}
1423 self.soonest = 0
1424 self.delay = 60
Fred Drake13a2c272000-02-10 17:17:14 +00001425 self.max_conns = 16
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001426
1427 def setTimeout(self, t):
1428 self.delay = t
1429
1430 def setMaxConns(self, m):
Fred Drake13a2c272000-02-10 17:17:14 +00001431 self.max_conns = m
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001432
Facundo Batista10951d52007-06-06 17:15:23 +00001433 def connect_ftp(self, user, passwd, host, port, dirs, timeout):
1434 key = user, host, port, '/'.join(dirs), timeout
Raymond Hettinger54f02222002-06-01 14:18:47 +00001435 if key in self.cache:
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001436 self.timeout[key] = time.time() + self.delay
1437 else:
Facundo Batista10951d52007-06-06 17:15:23 +00001438 self.cache[key] = ftpwrapper(user, passwd, host, port, dirs, timeout)
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001439 self.timeout[key] = time.time() + self.delay
Fred Drake13a2c272000-02-10 17:17:14 +00001440 self.check_cache()
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001441 return self.cache[key]
1442
1443 def check_cache(self):
Fred Drake13a2c272000-02-10 17:17:14 +00001444 # first check for old ones
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001445 t = time.time()
1446 if self.soonest <= t:
Raymond Hettinger4ec4fa22003-05-23 08:51:51 +00001447 for k, v in self.timeout.items():
Jeremy Hylton6d7e47b2000-01-20 18:19:08 +00001448 if v < t:
1449 self.cache[k].close()
1450 del self.cache[k]
1451 del self.timeout[k]
1452 self.soonest = min(self.timeout.values())
1453
1454 # then check the size
Fred Drake13a2c272000-02-10 17:17:14 +00001455 if len(self.cache) == self.max_conns:
Brett Cannonc8b188a2003-05-17 19:51:26 +00001456 for k, v in self.timeout.items():
Fred Drake13a2c272000-02-10 17:17:14 +00001457 if v == self.soonest:
1458 del self.cache[k]
1459 del self.timeout[k]
1460 break
1461 self.soonest = min(self.timeout.values())
Nadeem Vawdab42c53e2011-07-23 15:51:16 +02001462
1463 def clear_cache(self):
1464 for conn in self.cache.values():
1465 conn.close()
1466 self.cache.clear()
1467 self.timeout.clear()