urllib: Simplify splithost by calling into urlparse. (#1849) The current regex based splitting produces a wrong result. For example:: http://abc#@def Web browsers parse that URL as ``http://abc/#@def``, that is, the host is ``abc``, the path is ``/``, and the fragment is ``#@def``.

commit: 90e01e50ef8a9e6c91f30d965563c378a4ad26de [log] [tgz]
author: postmasters <namnguyen@google.com> Tue Jun 20 06:02:44 2017 -0700
committer: Victor Stinner <victor.stinner@gmail.com> Tue Jun 20 15:02:44 2017 +0200
tree: e467f98aa737fb5c517df080f25d7734d81a5d55
parent: 5cc7ac24da10568d2a910a91a24183b904118cf8 [diff] [blame]
diff --git a/Lib/urllib/parse.py b/Lib/urllib/parse.py
index 1af2906..01eb549 100644
--- a/Lib/urllib/parse.py
+++ b/Lib/urllib/parse.py

@@ -947,7 +947,7 @@
     """splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
     global _hostprog
     if _hostprog is None:
-        _hostprog = re.compile('//([^/?]*)(.*)', re.DOTALL)
+        _hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
 
     match = _hostprog.match(url)
     if match:
commit	90e01e50ef8a9e6c91f30d965563c378a4ad26de	[log] [tgz]
author	postmasters <namnguyen@google.com>	Tue Jun 20 06:02:44 2017 -0700
committer	Victor Stinner <victor.stinner@gmail.com>	Tue Jun 20 15:02:44 2017 +0200
tree	e467f98aa737fb5c517df080f25d7734d81a5d55
parent	5cc7ac24da10568d2a910a91a24183b904118cf8 [diff] [blame]