Revert "[3.8] bpo-27657: Fix urlparse() with numeric paths (GH-16839)" (GH-18525)
This reverts commit 0f3187c1ce3b3ace60f6c1691dfa3d4e744f0384.
The change broke the backwards compatibility of parsing behavior in a
patch release of Python (3.8.1). A decision was taken to revert this
patch in 3.8.2.
In https://bugs.python.org/issue27657 it was decided that the previous
behavior like
>>> urlparse('localhost:8080')
ParseResult(scheme='', netloc='', path='localhost:8080', params='', query='', fragment='')
>>> urlparse('undefined:8080')
ParseResult(scheme='', netloc='', path='undefined:8080', params='', query='', fragment='')
needs to be preserved in patch releases as number of users rely upon it.
Explicitly mention the releases involved with the revert in NEWS.
Adopt the wording suggested by @ned-deily.
diff --git a/Lib/urllib/parse.py b/Lib/urllib/parse.py
index 0b39b6e..e2b6f13 100644
--- a/Lib/urllib/parse.py
+++ b/Lib/urllib/parse.py
@@ -431,11 +431,31 @@
netloc = query = fragment = ''
i = url.find(':')
if i > 0:
+ if url[:i] == 'http': # optimize the common case
+ url = url[i+1:]
+ if url[:2] == '//':
+ netloc, url = _splitnetloc(url, 2)
+ if (('[' in netloc and ']' not in netloc) or
+ (']' in netloc and '[' not in netloc)):
+ raise ValueError("Invalid IPv6 URL")
+ if allow_fragments and '#' in url:
+ url, fragment = url.split('#', 1)
+ if '?' in url:
+ url, query = url.split('?', 1)
+ _checknetloc(netloc)
+ v = SplitResult('http', netloc, url, query, fragment)
+ _parse_cache[key] = v
+ return _coerce_result(v)
for c in url[:i]:
if c not in scheme_chars:
break
else:
- scheme, url = url[:i].lower(), url[i+1:]
+ # make sure "url" is not actually a port number (in which case
+ # "scheme" is really part of the path)
+ rest = url[i+1:]
+ if not rest or any(c not in '0123456789' for c in rest):
+ # not a port number
+ scheme, url = url[:i].lower(), rest
if url[:2] == '//':
netloc, url = _splitnetloc(url, 2)