Blame - Doc/library/urllib.parse.rst - platform/external/python/cpython3

blob: 84d289bc4415c8c7c925e4d04f187dcb34d0b6ea [file] [log] [blame]

Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	1	:mod:`urllib.parse` --- Parse URLs into components
				2	==================================================
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	3
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	4	.. module:: urllib.parse
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	5	:synopsis: Parse URLs into or assemble them from components.
				6
Terry Jan Reedy	fa089b9	2016-06-11 15:02:54 -0400	[diff] [blame]	7	Source code: :source:`Lib/urllib/parse.py`
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	8
				9	.. index::
				10	single: WWW
				11	single: World Wide Web
				12	single: URL
				13	pair: URL; parsing
				14	pair: relative; URL
				15
Éric Araujo	19f9b71	2011-08-19 00:49:18 +0200	[diff] [blame]	16	--------------
				17
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	18	This module defines a standard interface to break Uniform Resource Locator (URL)
				19	strings up in components (addressing scheme, network location, path etc.), to
				20	combine the components back into a URL string, and to convert a "relative URL"
				21	to an absolute URL given a "base URL."
				22
				23	The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran	4a27d9f	2012-06-28 21:07:58 -0700	[diff] [blame]	24	Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
				25	``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
				26	``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
				27	``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksag	f676748	2016-09-16 14:43:58 +0300	[diff] [blame]	28	``wais``, ``ws``, ``wss``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	29
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	30	The :mod:`urllib.parse` module defines functions that fall into two broad
				31	categories: URL parsing and URL quoting. These are covered in detail in
				32	the following sections.
				33
				34	URL Parsing
				35	-----------
				36
				37	The URL parsing functions focus on splitting a URL string into its components,
				38	or on combining URL components into a URL string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	39
R. David Murray	f5077aa	2010-05-25 15:36:46 +0000	[diff] [blame]	40	.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	41
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	42	Parse a URL into six components, returning a 6-item :term:`named tuple`. This
				43	corresponds to the general structure of a URL:
				44	``scheme://netloc/path;parameters?query#fragment``.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	45	Each tuple item is a string, possibly empty. The components are not broken up in
				46	smaller parts (for example, the network location is a single string), and %
				47	escapes are not expanded. The delimiters as shown above are not part of the
				48	result, except for a leading slash in the path component, which is retained if
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	49	present. For example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	50
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	51	>>> from urllib.parse import urlparse
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	52	>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	53	>>> o # doctest: +NORMALIZE_WHITESPACE
				54	ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
				55	params='', query='', fragment='')
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	56	>>> o.scheme
				57	'http'
				58	>>> o.port
				59	80
				60	>>> o.geturl()
				61	'http://www.cwi.nl:80/%7Eguido/Python.html'
				62
Senthil Kumaran	7089a4e	2010-11-07 12:57:04 +0000	[diff] [blame]	63	Following the syntax specifications in :rfc:`1808`, urlparse recognizes
				64	a netloc only if it is properly introduced by '//'. Otherwise the
				65	input is presumed to be a relative URL and thus to start with
				66	a path component.
Senthil Kumaran	84c7d9f	2010-08-04 04:50:44 +0000	[diff] [blame]	67
Marco Buttu	e65fcde	2017-04-27 14:23:34 +0200	[diff] [blame]	68	.. doctest::
				69	:options: +NORMALIZE_WHITESPACE
				70
Senthil Kumaran	fe9230a	2011-06-19 13:52:49 -0700	[diff] [blame]	71	>>> from urllib.parse import urlparse
Senthil Kumaran	84c7d9f	2010-08-04 04:50:44 +0000	[diff] [blame]	72	>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
				73	ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
				74	params='', query='', fragment='')
Senthil Kumaran	8fd3669	2013-02-26 01:02:58 -0800	[diff] [blame]	75	>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran	21b2933	2013-09-30 22:12:16 -0700	[diff] [blame]	76	ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran	84c7d9f	2010-08-04 04:50:44 +0000	[diff] [blame]	77	params='', query='', fragment='')
				78	>>> urlparse('help/Python.html')
				79	ParseResult(scheme='', netloc='', path='help/Python.html', params='',
				80	query='', fragment='')
				81
Berker Peksag	89584c9	2015-06-25 23:38:48 +0300	[diff] [blame]	82	The scheme argument gives the default addressing scheme, to be
				83	used only if the URL does not specify one. It should be the same type
				84	(text or bytes) as urlstring, except that the default value ``''`` is
				85	always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	86
				87	If the allow_fragments argument is false, fragment identifiers are not
Berker Peksag	89584c9	2015-06-25 23:38:48 +0300	[diff] [blame]	88	recognized. Instead, they are parsed as part of the path, parameters
				89	or query component, and :attr:`fragment` is set to the empty string in
				90	the return value.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	91
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	92	The return value is a :term:`named tuple`, which means that its items can
				93	be accessed by index or as named attributes, which are:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	94
				95	+------------------+-------+--------------------------+----------------------+
				96	\| Attribute \| Index \| Value \| Value if not present \|
				97	+==================+=======+==========================+======================+
Berker Peksag	89584c9	2015-06-25 23:38:48 +0300	[diff] [blame]	98	\| :attr:`scheme` \| 0 \| URL scheme specifier \| scheme parameter \|
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	99	+------------------+-------+--------------------------+----------------------+
				100	\| :attr:`netloc` \| 1 \| Network location part \| empty string \|
				101	+------------------+-------+--------------------------+----------------------+
				102	\| :attr:`path` \| 2 \| Hierarchical path \| empty string \|
				103	+------------------+-------+--------------------------+----------------------+
				104	\| :attr:`params` \| 3 \| Parameters for last path \| empty string \|
				105	\| \| \| element \| \|
				106	+------------------+-------+--------------------------+----------------------+
				107	\| :attr:`query` \| 4 \| Query component \| empty string \|
				108	+------------------+-------+--------------------------+----------------------+
				109	\| :attr:`fragment` \| 5 \| Fragment identifier \| empty string \|
				110	+------------------+-------+--------------------------+----------------------+
				111	\| :attr:`username` \| \| User name \| :const:`None` \|
				112	+------------------+-------+--------------------------+----------------------+
				113	\| :attr:`password` \| \| Password \| :const:`None` \|
				114	+------------------+-------+--------------------------+----------------------+
				115	\| :attr:`hostname` \| \| Host name (lower case) \| :const:`None` \|
				116	+------------------+-------+--------------------------+----------------------+
				117	\| :attr:`port` \| \| Port number as integer, \| :const:`None` \|
				118	\| \| \| if present \| \|
				119	+------------------+-------+--------------------------+----------------------+
				120
Robert Collins	dfa95c9	2015-08-10 09:53:30 +1200	[diff] [blame]	121	Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
				122	an invalid port is specified in the URL. See section
				123	:ref:`urlparse-result-object` for more information on the result object.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	124
Howie Benefiel	f6e863d	2017-05-15 23:48:16 -0500	[diff] [blame]	125	Unmatched square brackets in the :attr:`netloc` attribute will raise a
				126	:exc:`ValueError`.
				127
Steve Dower	16e6f7d	2019-03-07 08:02:26 -0800	[diff] [blame]	128	Characters in the :attr:`netloc` attribute that decompose under NFKC
				129	normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
				130	``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
				131	decomposed before parsing, no error will be raised.
				132
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	133	As is the case with all named tuples, the subclass has a few additional methods
				134	and attributes that are particularly useful. One such method is :meth:`_replace`.
				135	The :meth:`_replace` method will return a new ParseResult object replacing specified
				136	fields with new values.
				137
				138	.. doctest::
				139	:options: +NORMALIZE_WHITESPACE
				140
				141	>>> from urllib.parse import urlparse
				142	>>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
				143	>>> u
				144	ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
				145	params='', query='', fragment='')
				146	>>> u._replace(scheme='http')
				147	ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
				148	params='', query='', fragment='')
				149
				150
Senthil Kumaran	7a1e09f	2010-04-22 12:19:46 +0000	[diff] [blame]	151	.. versionchanged:: 3.2
				152	Added IPv6 URL parsing capabilities.
				153
Georg Brandl	a79b8dc	2012-09-29 08:59:23 +0200	[diff] [blame]	154	.. versionchanged:: 3.3
				155	The fragment is now parsed for all URL schemes (unless allow_fragment is
				156	false), in accordance with :rfc:`3986`. Previously, a whitelist of
				157	schemes that support fragments existed.
				158
Robert Collins	dfa95c9	2015-08-10 09:53:30 +1200	[diff] [blame]	159	.. versionchanged:: 3.6
				160	Out-of-range port numbers now raise :exc:`ValueError`, instead of
				161	returning :const:`None`.
				162
Steve Dower	16e6f7d	2019-03-07 08:02:26 -0800	[diff] [blame]	163	.. versionchanged:: 3.8
				164	Characters that affect netloc parsing under NFKC normalization will
				165	now raise :exc:`ValueError`.
				166
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	167
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	168	.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	169
				170	Parse a query string given as a string argument (data of type
				171	:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
				172	dictionary. The dictionary keys are the unique query variable names and the
				173	values are lists of values for each name.
				174
				175	The optional argument keep_blank_values is a flag indicating whether blank
Senthil Kumaran	f0769e8	2010-08-09 19:53:52 +0000	[diff] [blame]	176	values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	177	indicates that blanks should be retained as blank strings. The default false
				178	value indicates that blank values are to be ignored and treated as if they were
				179	not included.
				180
				181	The optional argument strict_parsing is a flag indicating what to do with
				182	parsing errors. If false (the default), errors are silently ignored. If true,
				183	errors raise a :exc:`ValueError` exception.
				184
Victor Stinner	ac71c54	2011-01-14 12:52:12 +0000	[diff] [blame]	185	The optional encoding and errors parameters specify how to decode
				186	percent-encoded sequences into Unicode characters, as accepted by the
				187	:meth:`bytes.decode` method.
				188
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	189	The optional argument max_num_fields is the maximum number of fields to
				190	read. If set, then throws a :exc:`ValueError` if there are more than
				191	max_num_fields fields read.
				192
Michael Foord	207d229	2012-09-28 14:40:44 +0100	[diff] [blame]	193	Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
				194	parameter set to ``True``) to convert such dictionaries into query
				195	strings.
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	196
Senthil Kumaran	2933312	2011-02-11 11:25:47 +0000	[diff] [blame]	197
Victor Stinner	c58be2d	2011-01-14 13:31:45 +0000	[diff] [blame]	198	.. versionchanged:: 3.2
				199	Add encoding and errors parameters.
				200
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	201	.. versionchanged:: 3.8
				202	Added max_num_fields parameter.
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	203
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	204
				205	.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	206
				207	Parse a query string given as a string argument (data of type
				208	:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
				209	name, value pairs.
				210
				211	The optional argument keep_blank_values is a flag indicating whether blank
Senthil Kumaran	f0769e8	2010-08-09 19:53:52 +0000	[diff] [blame]	212	values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	213	indicates that blanks should be retained as blank strings. The default false
				214	value indicates that blank values are to be ignored and treated as if they were
				215	not included.
				216
				217	The optional argument strict_parsing is a flag indicating what to do with
				218	parsing errors. If false (the default), errors are silently ignored. If true,
				219	errors raise a :exc:`ValueError` exception.
				220
Victor Stinner	ac71c54	2011-01-14 12:52:12 +0000	[diff] [blame]	221	The optional encoding and errors parameters specify how to decode
				222	percent-encoded sequences into Unicode characters, as accepted by the
				223	:meth:`bytes.decode` method.
				224
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	225	The optional argument max_num_fields is the maximum number of fields to
				226	read. If set, then throws a :exc:`ValueError` if there are more than
				227	max_num_fields fields read.
				228
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	229	Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
				230	query strings.
				231
Victor Stinner	c58be2d	2011-01-14 13:31:45 +0000	[diff] [blame]	232	.. versionchanged:: 3.2
				233	Add encoding and errors parameters.
				234
matthewbelisle-wf	68f3237	2018-10-30 15:30:19 -0500	[diff] [blame]	235	.. versionchanged:: 3.8
				236	Added max_num_fields parameter.
				237
Facundo Batista	c469d4c	2008-09-03 22:49:01 +0000	[diff] [blame]	238
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	239	.. function:: urlunparse(parts)
				240
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	241	Construct a URL from a tuple as returned by ``urlparse()``. The parts
				242	argument can be any six-item iterable. This may result in a slightly
				243	different, but equivalent URL, if the URL that was parsed originally had
				244	unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
				245	states that these are equivalent).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	246
				247
R. David Murray	f5077aa	2010-05-25 15:36:46 +0000	[diff] [blame]	248	.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	249
				250	This is similar to :func:`urlparse`, but does not split the params from the URL.
				251	This should generally be used instead of :func:`urlparse` if the more recent URL
				252	syntax allowing parameters to be applied to each segment of the path portion
				253	of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	254	separate the path segments and parameters. This function returns a 5-item
				255	:term:`named tuple`::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	256
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	257	(addressing scheme, network location, path, query, fragment identifier).
				258
				259	The return value is a :term:`named tuple`, its items can be accessed by index
				260	or as named attributes:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	261
				262	+------------------+-------+-------------------------+----------------------+
				263	\| Attribute \| Index \| Value \| Value if not present \|
				264	+==================+=======+=========================+======================+
Berker Peksag	89584c9	2015-06-25 23:38:48 +0300	[diff] [blame]	265	\| :attr:`scheme` \| 0 \| URL scheme specifier \| scheme parameter \|
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	266	+------------------+-------+-------------------------+----------------------+
				267	\| :attr:`netloc` \| 1 \| Network location part \| empty string \|
				268	+------------------+-------+-------------------------+----------------------+
				269	\| :attr:`path` \| 2 \| Hierarchical path \| empty string \|
				270	+------------------+-------+-------------------------+----------------------+
				271	\| :attr:`query` \| 3 \| Query component \| empty string \|
				272	+------------------+-------+-------------------------+----------------------+
				273	\| :attr:`fragment` \| 4 \| Fragment identifier \| empty string \|
				274	+------------------+-------+-------------------------+----------------------+
				275	\| :attr:`username` \| \| User name \| :const:`None` \|
				276	+------------------+-------+-------------------------+----------------------+
				277	\| :attr:`password` \| \| Password \| :const:`None` \|
				278	+------------------+-------+-------------------------+----------------------+
				279	\| :attr:`hostname` \| \| Host name (lower case) \| :const:`None` \|
				280	+------------------+-------+-------------------------+----------------------+
				281	\| :attr:`port` \| \| Port number as integer, \| :const:`None` \|
				282	\| \| \| if present \| \|
				283	+------------------+-------+-------------------------+----------------------+
				284
Robert Collins	dfa95c9	2015-08-10 09:53:30 +1200	[diff] [blame]	285	Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
				286	an invalid port is specified in the URL. See section
				287	:ref:`urlparse-result-object` for more information on the result object.
				288
Howie Benefiel	f6e863d	2017-05-15 23:48:16 -0500	[diff] [blame]	289	Unmatched square brackets in the :attr:`netloc` attribute will raise a
				290	:exc:`ValueError`.
				291
Steve Dower	16e6f7d	2019-03-07 08:02:26 -0800	[diff] [blame]	292	Characters in the :attr:`netloc` attribute that decompose under NFKC
				293	normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
				294	``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
				295	decomposed before parsing, no error will be raised.
				296
Robert Collins	dfa95c9	2015-08-10 09:53:30 +1200	[diff] [blame]	297	.. versionchanged:: 3.6
				298	Out-of-range port numbers now raise :exc:`ValueError`, instead of
				299	returning :const:`None`.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	300
Steve Dower	16e6f7d	2019-03-07 08:02:26 -0800	[diff] [blame]	301	.. versionchanged:: 3.8
				302	Characters that affect netloc parsing under NFKC normalization will
				303	now raise :exc:`ValueError`.
				304
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	305
				306	.. function:: urlunsplit(parts)
				307
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	308	Combine the elements of a tuple as returned by :func:`urlsplit` into a
				309	complete URL as a string. The parts argument can be any five-item
				310	iterable. This may result in a slightly different, but equivalent URL, if the
				311	URL that was parsed originally had unnecessary delimiters (for example, a ?
				312	with an empty query; the RFC states that these are equivalent).
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	313
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	314
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	315	.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	316
				317	Construct a full ("absolute") URL by combining a "base URL" (base) with
				318	another URL (url). Informally, this uses components of the base URL, in
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	319	particular the addressing scheme, the network location and (part of) the
				320	path, to provide missing components in the relative URL. For example:
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	321
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	322	>>> from urllib.parse import urljoin
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	323	>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
				324	'http://www.cwi.nl/%7Eguido/FAQ.html'
				325
				326	The allow_fragments argument has the same meaning and default as for
				327	:func:`urlparse`.
				328
				329	.. note::
				330
				331	If url is an absolute URL (that is, starting with ``//`` or ``scheme://``),
				332	the url's host name and/or scheme will be present in the result. For example:
				333
Christian Heimes	fe337bf	2008-03-23 21:54:12 +0000	[diff] [blame]	334	.. doctest::
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	335
				336	>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
				337	... '//www.python.org/%7Eguido')
				338	'http://www.python.org/%7Eguido'
				339
				340	If you do not want that behavior, preprocess the url with :func:`urlsplit` and
				341	:func:`urlunsplit`, removing possible scheme and netloc parts.
				342
				343
Antoine Pitrou	55ac5b3	2014-08-21 19:16:17 -0400	[diff] [blame]	344	.. versionchanged:: 3.5
				345
				346	Behaviour updated to match the semantics defined in :rfc:`3986`.
				347
				348
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	349	.. function:: urldefrag(url)
				350
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	351	If url contains a fragment identifier, return a modified version of url
				352	with no fragment identifier, and the fragment identifier as a separate
				353	string. If there is no fragment identifier in url, return url unmodified
				354	and an empty string.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	355
Lisa Roach	13c1f72	2019-03-24 14:28:48 -0700	[diff] [blame]	356	The return value is a :term:`named tuple`, its items can be accessed by index
				357	or as named attributes:
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	358
				359	+------------------+-------+-------------------------+----------------------+
				360	\| Attribute \| Index \| Value \| Value if not present \|
				361	+==================+=======+=========================+======================+
				362	\| :attr:`url` \| 0 \| URL with no fragment \| empty string \|
				363	+------------------+-------+-------------------------+----------------------+
				364	\| :attr:`fragment` \| 1 \| Fragment identifier \| empty string \|
				365	+------------------+-------+-------------------------+----------------------+
				366
				367	See section :ref:`urlparse-result-object` for more information on the result
				368	object.
				369
				370	.. versionchanged:: 3.2
Raymond Hettinger	9a236b0	2011-01-24 09:01:27 +0000	[diff] [blame]	371	Result is a structured object rather than a simple 2-tuple.
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	372
Rémi Lapeyre	674ee12	2019-05-27 15:43:45 +0200	[diff] [blame]	373	.. function:: unwrap(url)
				374
				375	Extract the url from a wrapped URL (that is, a string formatted as
				376	``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
				377	or ``scheme://host/path``). If url is not a wrapped URL, it is returned
				378	without changes.
				379
Georg Brandl	009a6bd	2011-01-24 19:59:08 +0000	[diff] [blame]	380	.. _parsing-ascii-encoded-bytes:
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	381
				382	Parsing ASCII Encoded Bytes
				383	---------------------------
				384
				385	The URL parsing functions were originally designed to operate on character
				386	strings only. In practice, it is useful to be able to manipulate properly
				387	quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
				388	URL parsing functions in this module all operate on :class:`bytes` and
				389	:class:`bytearray` objects in addition to :class:`str` objects.
				390
				391	If :class:`str` data is passed in, the result will also contain only
				392	:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
				393	passed in, the result will contain only :class:`bytes` data.
				394
				395	Attempting to mix :class:`str` data with :class:`bytes` or
				396	:class:`bytearray` in a single function call will result in a
Éric Araujo	ff2a4ba	2010-11-30 17:20:31 +0000	[diff] [blame]	397	:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	398	byte values will trigger :exc:`UnicodeDecodeError`.
				399
				400	To support easier conversion of result objects between :class:`str` and
				401	:class:`bytes`, all return values from URL parsing functions provide
				402	either an :meth:`encode` method (when the result contains :class:`str`
				403	data) or a :meth:`decode` method (when the result contains :class:`bytes`
				404	data). The signatures of these methods match those of the corresponding
				405	:class:`str` and :class:`bytes` methods (except that the default encoding
				406	is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
				407	corresponding type that contains either :class:`bytes` data (for
				408	:meth:`encode` methods) or :class:`str` data (for
				409	:meth:`decode` methods).
				410
				411	Applications that need to operate on potentially improperly quoted URLs
				412	that may contain non-ASCII data will need to do their own decoding from
				413	bytes to characters before invoking the URL parsing methods.
				414
				415	The behaviour described in this section applies only to the URL parsing
				416	functions. The URL quoting functions use their own rules when producing
				417	or consuming byte sequences as detailed in the documentation of the
				418	individual URL quoting functions.
				419
				420	.. versionchanged:: 3.2
				421	URL parsing functions now accept ASCII encoded byte sequences
				422
				423
				424	.. _urlparse-result-object:
				425
				426	Structured Parse Results
				427	------------------------
				428
				429	The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl	4640237	2010-12-04 19:06:18 +0000	[diff] [blame]	430	:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	431	These subclasses add the attributes listed in the documentation for
				432	those functions, the encoding and decoding support described in the
				433	previous section, as well as an additional method:
				434
				435	.. method:: urllib.parse.SplitResult.geturl()
				436
				437	Return the re-combined version of the original URL as a string. This may
				438	differ from the original URL in that the scheme may be normalized to lower
				439	case and empty components may be dropped. Specifically, empty parameters,
				440	queries, and fragment identifiers will be removed.
				441
				442	For :func:`urldefrag` results, only empty fragment identifiers will be removed.
				443	For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
				444	made to the URL returned by this method.
				445
				446	The result of this method remains unchanged if passed back through the original
				447	parsing function:
				448
				449	>>> from urllib.parse import urlsplit
				450	>>> url = 'HTTP://www.Python.org/doc/#'
				451	>>> r1 = urlsplit(url)
				452	>>> r1.geturl()
				453	'http://www.Python.org/doc/'
				454	>>> r2 = urlsplit(r1.geturl())
				455	>>> r2.geturl()
				456	'http://www.Python.org/doc/'
				457
				458
				459	The following classes provide the implementations of the structured parse
				460	results when operating on :class:`str` objects:
				461
				462	.. class:: DefragResult(url, fragment)
				463
				464	Concrete class for :func:`urldefrag` results containing :class:`str`
				465	data. The :meth:`encode` method returns a :class:`DefragResultBytes`
				466	instance.
				467
				468	.. versionadded:: 3.2
				469
				470	.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
				471
				472	Concrete class for :func:`urlparse` results containing :class:`str`
				473	data. The :meth:`encode` method returns a :class:`ParseResultBytes`
				474	instance.
				475
				476	.. class:: SplitResult(scheme, netloc, path, query, fragment)
				477
				478	Concrete class for :func:`urlsplit` results containing :class:`str`
				479	data. The :meth:`encode` method returns a :class:`SplitResultBytes`
				480	instance.
				481
				482
				483	The following classes provide the implementations of the parse results when
				484	operating on :class:`bytes` or :class:`bytearray` objects:
				485
				486	.. class:: DefragResultBytes(url, fragment)
				487
				488	Concrete class for :func:`urldefrag` results containing :class:`bytes`
				489	data. The :meth:`decode` method returns a :class:`DefragResult`
				490	instance.
				491
				492	.. versionadded:: 3.2
				493
				494	.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
				495
				496	Concrete class for :func:`urlparse` results containing :class:`bytes`
				497	data. The :meth:`decode` method returns a :class:`ParseResult`
				498	instance.
				499
				500	.. versionadded:: 3.2
				501
				502	.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
				503
				504	Concrete class for :func:`urlsplit` results containing :class:`bytes`
				505	data. The :meth:`decode` method returns a :class:`SplitResult`
				506	instance.
				507
				508	.. versionadded:: 3.2
				509
				510
				511	URL Quoting
				512	-----------
				513
				514	The URL quoting functions focus on taking program data and making it safe
				515	for use as URL components by quoting special characters and appropriately
				516	encoding non-ASCII text. They also support reversing these operations to
				517	recreate the original data from the contents of a URL component if that
				518	task isn't already covered by the URL parsing functions above.
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	519
				520	.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	521
				522	Replace special characters in string using the ``%xx`` escape. Letters,
Ratnadeep Debnath	21024f0	2017-02-25 14:30:28 +0530	[diff] [blame]	523	digits, and the characters ``'_.-~'`` are never quoted. By default, this
Senthil Kumaran	8aa8bbe	2009-08-31 16:43:45 +0000	[diff] [blame]	524	function is intended for quoting the path section of URL. The optional safe
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	525	parameter specifies additional ASCII characters that should not be quoted
				526	--- its default value is ``'/'``.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	527
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	528	string may be either a :class:`str` or a :class:`bytes`.
				529
Ratnadeep Debnath	21024f0	2017-02-25 14:30:28 +0530	[diff] [blame]	530	.. versionchanged:: 3.7
Serhiy Storchaka	0a36ac1	2018-05-31 07:39:00 +0300	[diff] [blame]	531	Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
Ratnadeep Debnath	21024f0	2017-02-25 14:30:28 +0530	[diff] [blame]	532	included in the set of reserved characters.
				533
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	534	The optional encoding and errors parameters specify how to deal with
				535	non-ASCII characters, as accepted by the :meth:`str.encode` method.
				536	encoding defaults to ``'utf-8'``.
				537	errors defaults to ``'strict'``, meaning unsupported characters raise a
				538	:class:`UnicodeEncodeError`.
				539	encoding and errors must not be supplied if string is a
				540	:class:`bytes`, or a :class:`TypeError` is raised.
				541
				542	Note that ``quote(string, safe, encoding, errors)`` is equivalent to
				543	``quote_from_bytes(string.encode(encoding, errors), safe)``.
				544
				545	Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	546
				547
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	548	.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	549
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	550	Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl	81c09db	2009-07-29 07:27:08 +0000	[diff] [blame]	551	quoting HTML form values when building up a query string to go into a URL.
				552	Plus signs in the original string are escaped unless they are included in
				553	safe. It also does not have safe default to ``'/'``.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	554
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	555	Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	556
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	557
				558	.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	559
				560	Like :func:`quote`, but accepts a :class:`bytes` object rather than a
				561	:class:`str`, and does not perform string-to-bytes encoding.
				562
				563	Example: ``quote_from_bytes(b'a&\xef')`` yields
				564	``'a%26%EF'``.
				565
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	566
				567	.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	568
				569	Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	570	The optional encoding and errors parameters specify how to decode
				571	percent-encoded sequences into Unicode characters, as accepted by the
				572	:meth:`bytes.decode` method.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	573
Stein Karlsen	aad2ee0	2019-10-14 12:36:29 +0200	[diff] [blame]	574	string may be either a :class:`str` or a :class:`bytes`.
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	575
				576	encoding defaults to ``'utf-8'``.
				577	errors defaults to ``'replace'``, meaning invalid sequences are replaced
				578	by a placeholder character.
				579
				580	Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	581
Stein Karlsen	aad2ee0	2019-10-14 12:36:29 +0200	[diff] [blame]	582	.. versionchanged:: 3.9
				583	string parameter supports bytes and str objects (previously only str).
				584
				585
				586
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	587
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	588	.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	589
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	590	Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	591	unquoting HTML form values.
				592
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	593	string must be a :class:`str`.
				594
				595	Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
				596
Georg Brandl	7f01a13	2009-09-16 15:58:14 +0000	[diff] [blame]	597
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	598	.. function:: unquote_to_bytes(string)
				599
				600	Replace ``%xx`` escapes by their single-octet equivalent, and return a
				601	:class:`bytes` object.
				602
				603	string may be either a :class:`str` or a :class:`bytes`.
				604
				605	If it is a :class:`str`, unescaped non-ASCII characters in string
				606	are encoded into UTF-8 bytes.
				607
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	608	Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum	52dbbb9	2008-08-18 21:44:30 +0000	[diff] [blame]	609
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	610
R David Murray	c17686f	2015-05-17 20:44:50 -0400	[diff] [blame]	611	.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
				612	errors=None, quote_via=quote_plus)
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	613
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	614	Convert a mapping object or a sequence of two-element tuples, which may
Martin Panter	cda85a0	2015-11-24 22:33:18 +0000	[diff] [blame]	615	contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
				616	text string. If the resultant string is to be used as a data for POST
				617	operation with the :func:`~urllib.request.urlopen` function, then
				618	it should be encoded to bytes, otherwise it would result in a
				619	:exc:`TypeError`.
Senthil Kumaran	6b3434a	2012-03-15 18:11:16 -0700	[diff] [blame]	620
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	621	The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murray	c17686f	2015-05-17 20:44:50 -0400	[diff] [blame]	622	characters, where both key and value are quoted using the quote_via
				623	function. By default, :func:`quote_plus` is used to quote the values, which
				624	means spaces are quoted as a ``'+'`` character and '/' characters are
				625	encoded as ``%2F``, which follows the standard for GET requests
				626	(``application/x-www-form-urlencoded``). An alternate function that can be
				627	passed as quote_via is :func:`quote`, which will encode spaces as ``%20``
				628	and not encode '/' characters. For maximum control of what is quoted, use
				629	``quote`` and specify a value for safe.
				630
				631	When a sequence of two-element tuples is used as the query
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	632	argument, the first element of each tuple is a key and the second is a
				633	value. The value element in itself can be a sequence and in that case, if
Serhiy Storchaka	a97cd2e	2016-10-19 16:43:42 +0300	[diff] [blame]	634	the optional parameter doseq is evaluates to ``True``, individual
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	635	``key=value`` pairs separated by ``'&'`` are generated for each element of
				636	the value sequence for the key. The order of parameters in the encoded
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	637	string will match the order of parameter tuples in the sequence.
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	638
R David Murray	8c4e112	2014-12-24 21:23:18 -0500	[diff] [blame]	639	The safe, encoding, and errors parameters are passed down to
R David Murray	c17686f	2015-05-17 20:44:50 -0400	[diff] [blame]	640	quote_via (the encoding and errors parameters are only passed
R David Murray	8c4e112	2014-12-24 21:23:18 -0500	[diff] [blame]	641	when a query element is a :class:`str`).
Nick Coghlan	9fc443c	2010-11-30 15:48:08 +0000	[diff] [blame]	642
				643	To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
				644	provided in this module to parse query strings into Python data structures.
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	645
Senthil Kumaran	2933312	2011-02-11 11:25:47 +0000	[diff] [blame]	646	Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
				647	method can be used for generating query string for a URL or data for POST.
				648
Senthil Kumaran	df022da	2010-07-03 17:48:22 +0000	[diff] [blame]	649	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	650	Query parameter supports bytes and string objects.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	651
R David Murray	c17686f	2015-05-17 20:44:50 -0400	[diff] [blame]	652	.. versionadded:: 3.5
				653	quote_via parameter.
				654
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	655
				656	.. seealso::
				657
Senthil Kumaran	6257bdd	2010-04-22 05:53:18 +0000	[diff] [blame]	658	:rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaran	fe9230a	2011-06-19 13:52:49 -0700	[diff] [blame]	659	This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran	6257bdd	2010-04-22 05:53:18 +0000	[diff] [blame]	660	should conform to this. Certain deviations could be observed, which are
Georg Brandl	6faee4e	2010-09-21 14:48:28 +0000	[diff] [blame]	661	mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran	6257bdd	2010-04-22 05:53:18 +0000	[diff] [blame]	662	parsing requirements as commonly observed in major browsers.
				663
				664	:rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
				665	This specifies the parsing requirements of IPv6 URLs.
				666
				667	:rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
				668	Document describing the generic syntactic requirements for both Uniform Resource
				669	Names (URNs) and Uniform Resource Locators (URLs).
				670
				671	:rfc:`2368` - The mailto URL scheme.
Martin Panter	fe289c0	2016-05-28 02:20:39 +0000	[diff] [blame]	672	Parsing requirements for mailto URL schemes.
Georg Brandl	116aa62	2007-08-15 14:28:22 +0000	[diff] [blame]	673
				674	:rfc:`1808` - Relative Uniform Resource Locators
				675	This Request For Comments includes the rules for joining an absolute and a
				676	relative URL, including a fair number of "Abnormal Examples" which govern the
				677	treatment of border cases.
				678
Senthil Kumaran	6257bdd	2010-04-22 05:53:18 +0000	[diff] [blame]	679	:rfc:`1738` - Uniform Resource Locators (URL)
				680	This specifies the formal syntax and semantics of absolute URLs.