blob: 7c075adf2afab361a393bbf04db94a64c21afc71 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080072 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070073 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
Berker Peksag89584c92015-06-25 23:38:48 +030079 The *scheme* argument gives the default addressing scheme, to be
80 used only if the URL does not specify one. It should be the same type
81 (text or bytes) as *urlstring*, except that the default value ``''`` is
82 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030085 recognized. Instead, they are parsed as part of the path, parameters
86 or query component, and :attr:`fragment` is set to the empty string in
87 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000088
89 The return value is actually an instance of a subclass of :class:`tuple`. This
90 class has the following additional read-only convenience attributes:
91
92 +------------------+-------+--------------------------+----------------------+
93 | Attribute | Index | Value | Value if not present |
94 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030095 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000096 +------------------+-------+--------------------------+----------------------+
97 | :attr:`netloc` | 1 | Network location part | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`path` | 2 | Hierarchical path | empty string |
100 +------------------+-------+--------------------------+----------------------+
101 | :attr:`params` | 3 | Parameters for last path | empty string |
102 | | | element | |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`query` | 4 | Query component | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`fragment` | 5 | Fragment identifier | empty string |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`username` | | User name | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`password` | | Password | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
113 +------------------+-------+--------------------------+----------------------+
114 | :attr:`port` | | Port number as integer, | :const:`None` |
115 | | | if present | |
116 +------------------+-------+--------------------------+----------------------+
117
Robert Collinsdfa95c92015-08-10 09:53:30 +1200118 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
119 an invalid port is specified in the URL. See section
120 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000121
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000122 .. versionchanged:: 3.2
123 Added IPv6 URL parsing capabilities.
124
Georg Brandla79b8dc2012-09-29 08:59:23 +0200125 .. versionchanged:: 3.3
126 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
127 false), in accordance with :rfc:`3986`. Previously, a whitelist of
128 schemes that support fragments existed.
129
Robert Collinsdfa95c92015-08-10 09:53:30 +1200130 .. versionchanged:: 3.6
131 Out-of-range port numbers now raise :exc:`ValueError`, instead of
132 returning :const:`None`.
133
Georg Brandl116aa622007-08-15 14:28:22 +0000134
Victor Stinnerac71c542011-01-14 12:52:12 +0000135.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000136
137 Parse a query string given as a string argument (data of type
138 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
139 dictionary. The dictionary keys are the unique query variable names and the
140 values are lists of values for each name.
141
142 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000143 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000144 indicates that blanks should be retained as blank strings. The default false
145 value indicates that blank values are to be ignored and treated as if they were
146 not included.
147
148 The optional argument *strict_parsing* is a flag indicating what to do with
149 parsing errors. If false (the default), errors are silently ignored. If true,
150 errors raise a :exc:`ValueError` exception.
151
Victor Stinnerac71c542011-01-14 12:52:12 +0000152 The optional *encoding* and *errors* parameters specify how to decode
153 percent-encoded sequences into Unicode characters, as accepted by the
154 :meth:`bytes.decode` method.
155
Michael Foord207d2292012-09-28 14:40:44 +0100156 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
157 parameter set to ``True``) to convert such dictionaries into query
158 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000159
Senthil Kumaran29333122011-02-11 11:25:47 +0000160
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000161 .. versionchanged:: 3.2
162 Add *encoding* and *errors* parameters.
163
Facundo Batistac469d4c2008-09-03 22:49:01 +0000164
Victor Stinnerac71c542011-01-14 12:52:12 +0000165.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000166
167 Parse a query string given as a string argument (data of type
168 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
169 name, value pairs.
170
171 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000172 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000173 indicates that blanks should be retained as blank strings. The default false
174 value indicates that blank values are to be ignored and treated as if they were
175 not included.
176
177 The optional argument *strict_parsing* is a flag indicating what to do with
178 parsing errors. If false (the default), errors are silently ignored. If true,
179 errors raise a :exc:`ValueError` exception.
180
Victor Stinnerac71c542011-01-14 12:52:12 +0000181 The optional *encoding* and *errors* parameters specify how to decode
182 percent-encoded sequences into Unicode characters, as accepted by the
183 :meth:`bytes.decode` method.
184
Facundo Batistac469d4c2008-09-03 22:49:01 +0000185 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
186 query strings.
187
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000188 .. versionchanged:: 3.2
189 Add *encoding* and *errors* parameters.
190
Facundo Batistac469d4c2008-09-03 22:49:01 +0000191
Georg Brandl116aa622007-08-15 14:28:22 +0000192.. function:: urlunparse(parts)
193
Georg Brandl0f7ede42008-06-23 11:23:31 +0000194 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
195 argument can be any six-item iterable. This may result in a slightly
196 different, but equivalent URL, if the URL that was parsed originally had
197 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
198 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000199
200
R. David Murrayf5077aa2010-05-25 15:36:46 +0000201.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000202
203 This is similar to :func:`urlparse`, but does not split the params from the URL.
204 This should generally be used instead of :func:`urlparse` if the more recent URL
205 syntax allowing parameters to be applied to each segment of the *path* portion
206 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
207 separate the path segments and parameters. This function returns a 5-tuple:
208 (addressing scheme, network location, path, query, fragment identifier).
209
210 The return value is actually an instance of a subclass of :class:`tuple`. This
211 class has the following additional read-only convenience attributes:
212
213 +------------------+-------+-------------------------+----------------------+
214 | Attribute | Index | Value | Value if not present |
215 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300216 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000217 +------------------+-------+-------------------------+----------------------+
218 | :attr:`netloc` | 1 | Network location part | empty string |
219 +------------------+-------+-------------------------+----------------------+
220 | :attr:`path` | 2 | Hierarchical path | empty string |
221 +------------------+-------+-------------------------+----------------------+
222 | :attr:`query` | 3 | Query component | empty string |
223 +------------------+-------+-------------------------+----------------------+
224 | :attr:`fragment` | 4 | Fragment identifier | empty string |
225 +------------------+-------+-------------------------+----------------------+
226 | :attr:`username` | | User name | :const:`None` |
227 +------------------+-------+-------------------------+----------------------+
228 | :attr:`password` | | Password | :const:`None` |
229 +------------------+-------+-------------------------+----------------------+
230 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
231 +------------------+-------+-------------------------+----------------------+
232 | :attr:`port` | | Port number as integer, | :const:`None` |
233 | | | if present | |
234 +------------------+-------+-------------------------+----------------------+
235
Robert Collinsdfa95c92015-08-10 09:53:30 +1200236 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
237 an invalid port is specified in the URL. See section
238 :ref:`urlparse-result-object` for more information on the result object.
239
240 .. versionchanged:: 3.6
241 Out-of-range port numbers now raise :exc:`ValueError`, instead of
242 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Georg Brandl116aa622007-08-15 14:28:22 +0000244
245.. function:: urlunsplit(parts)
246
Georg Brandl0f7ede42008-06-23 11:23:31 +0000247 Combine the elements of a tuple as returned by :func:`urlsplit` into a
248 complete URL as a string. The *parts* argument can be any five-item
249 iterable. This may result in a slightly different, but equivalent URL, if the
250 URL that was parsed originally had unnecessary delimiters (for example, a ?
251 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Georg Brandl116aa622007-08-15 14:28:22 +0000253
Georg Brandl7f01a132009-09-16 15:58:14 +0000254.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000255
256 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
257 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000258 particular the addressing scheme, the network location and (part of) the
259 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000260
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000261 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000262 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
263 'http://www.cwi.nl/%7Eguido/FAQ.html'
264
265 The *allow_fragments* argument has the same meaning and default as for
266 :func:`urlparse`.
267
268 .. note::
269
270 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
271 the *url*'s host name and/or scheme will be present in the result. For example:
272
Christian Heimesfe337bf2008-03-23 21:54:12 +0000273 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000274
275 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
276 ... '//www.python.org/%7Eguido')
277 'http://www.python.org/%7Eguido'
278
279 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
280 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
281
282
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400283 .. versionchanged:: 3.5
284
285 Behaviour updated to match the semantics defined in :rfc:`3986`.
286
287
Georg Brandl116aa622007-08-15 14:28:22 +0000288.. function:: urldefrag(url)
289
Georg Brandl0f7ede42008-06-23 11:23:31 +0000290 If *url* contains a fragment identifier, return a modified version of *url*
291 with no fragment identifier, and the fragment identifier as a separate
292 string. If there is no fragment identifier in *url*, return *url* unmodified
293 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000294
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000295 The return value is actually an instance of a subclass of :class:`tuple`. This
296 class has the following additional read-only convenience attributes:
297
298 +------------------+-------+-------------------------+----------------------+
299 | Attribute | Index | Value | Value if not present |
300 +==================+=======+=========================+======================+
301 | :attr:`url` | 0 | URL with no fragment | empty string |
302 +------------------+-------+-------------------------+----------------------+
303 | :attr:`fragment` | 1 | Fragment identifier | empty string |
304 +------------------+-------+-------------------------+----------------------+
305
306 See section :ref:`urlparse-result-object` for more information on the result
307 object.
308
309 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000310 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000311
Georg Brandl009a6bd2011-01-24 19:59:08 +0000312.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000313
314Parsing ASCII Encoded Bytes
315---------------------------
316
317The URL parsing functions were originally designed to operate on character
318strings only. In practice, it is useful to be able to manipulate properly
319quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
320URL parsing functions in this module all operate on :class:`bytes` and
321:class:`bytearray` objects in addition to :class:`str` objects.
322
323If :class:`str` data is passed in, the result will also contain only
324:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
325passed in, the result will contain only :class:`bytes` data.
326
327Attempting to mix :class:`str` data with :class:`bytes` or
328:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000329:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000330byte values will trigger :exc:`UnicodeDecodeError`.
331
332To support easier conversion of result objects between :class:`str` and
333:class:`bytes`, all return values from URL parsing functions provide
334either an :meth:`encode` method (when the result contains :class:`str`
335data) or a :meth:`decode` method (when the result contains :class:`bytes`
336data). The signatures of these methods match those of the corresponding
337:class:`str` and :class:`bytes` methods (except that the default encoding
338is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
339corresponding type that contains either :class:`bytes` data (for
340:meth:`encode` methods) or :class:`str` data (for
341:meth:`decode` methods).
342
343Applications that need to operate on potentially improperly quoted URLs
344that may contain non-ASCII data will need to do their own decoding from
345bytes to characters before invoking the URL parsing methods.
346
347The behaviour described in this section applies only to the URL parsing
348functions. The URL quoting functions use their own rules when producing
349or consuming byte sequences as detailed in the documentation of the
350individual URL quoting functions.
351
352.. versionchanged:: 3.2
353 URL parsing functions now accept ASCII encoded byte sequences
354
355
356.. _urlparse-result-object:
357
358Structured Parse Results
359------------------------
360
361The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000362:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000363These subclasses add the attributes listed in the documentation for
364those functions, the encoding and decoding support described in the
365previous section, as well as an additional method:
366
367.. method:: urllib.parse.SplitResult.geturl()
368
369 Return the re-combined version of the original URL as a string. This may
370 differ from the original URL in that the scheme may be normalized to lower
371 case and empty components may be dropped. Specifically, empty parameters,
372 queries, and fragment identifiers will be removed.
373
374 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
375 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
376 made to the URL returned by this method.
377
378 The result of this method remains unchanged if passed back through the original
379 parsing function:
380
381 >>> from urllib.parse import urlsplit
382 >>> url = 'HTTP://www.Python.org/doc/#'
383 >>> r1 = urlsplit(url)
384 >>> r1.geturl()
385 'http://www.Python.org/doc/'
386 >>> r2 = urlsplit(r1.geturl())
387 >>> r2.geturl()
388 'http://www.Python.org/doc/'
389
390
391The following classes provide the implementations of the structured parse
392results when operating on :class:`str` objects:
393
394.. class:: DefragResult(url, fragment)
395
396 Concrete class for :func:`urldefrag` results containing :class:`str`
397 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
398 instance.
399
400 .. versionadded:: 3.2
401
402.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
403
404 Concrete class for :func:`urlparse` results containing :class:`str`
405 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
406 instance.
407
408.. class:: SplitResult(scheme, netloc, path, query, fragment)
409
410 Concrete class for :func:`urlsplit` results containing :class:`str`
411 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
412 instance.
413
414
415The following classes provide the implementations of the parse results when
416operating on :class:`bytes` or :class:`bytearray` objects:
417
418.. class:: DefragResultBytes(url, fragment)
419
420 Concrete class for :func:`urldefrag` results containing :class:`bytes`
421 data. The :meth:`decode` method returns a :class:`DefragResult`
422 instance.
423
424 .. versionadded:: 3.2
425
426.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
427
428 Concrete class for :func:`urlparse` results containing :class:`bytes`
429 data. The :meth:`decode` method returns a :class:`ParseResult`
430 instance.
431
432 .. versionadded:: 3.2
433
434.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
435
436 Concrete class for :func:`urlsplit` results containing :class:`bytes`
437 data. The :meth:`decode` method returns a :class:`SplitResult`
438 instance.
439
440 .. versionadded:: 3.2
441
442
443URL Quoting
444-----------
445
446The URL quoting functions focus on taking program data and making it safe
447for use as URL components by quoting special characters and appropriately
448encoding non-ASCII text. They also support reversing these operations to
449recreate the original data from the contents of a URL component if that
450task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000451
452.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000453
454 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000455 digits, and the characters ``'_.-'`` are never quoted. By default, this
456 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000457 parameter specifies additional ASCII characters that should not be quoted
458 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000459
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000460 *string* may be either a :class:`str` or a :class:`bytes`.
461
462 The optional *encoding* and *errors* parameters specify how to deal with
463 non-ASCII characters, as accepted by the :meth:`str.encode` method.
464 *encoding* defaults to ``'utf-8'``.
465 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
466 :class:`UnicodeEncodeError`.
467 *encoding* and *errors* must not be supplied if *string* is a
468 :class:`bytes`, or a :class:`TypeError` is raised.
469
470 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
471 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
472
473 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000474
475
Georg Brandl7f01a132009-09-16 15:58:14 +0000476.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000477
Georg Brandl0f7ede42008-06-23 11:23:31 +0000478 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000479 quoting HTML form values when building up a query string to go into a URL.
480 Plus signs in the original string are escaped unless they are included in
481 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000482
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000483 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000484
Georg Brandl7f01a132009-09-16 15:58:14 +0000485
486.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000487
488 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
489 :class:`str`, and does not perform string-to-bytes encoding.
490
491 Example: ``quote_from_bytes(b'a&\xef')`` yields
492 ``'a%26%EF'``.
493
Georg Brandl7f01a132009-09-16 15:58:14 +0000494
495.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000496
497 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000498 The optional *encoding* and *errors* parameters specify how to decode
499 percent-encoded sequences into Unicode characters, as accepted by the
500 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000501
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000502 *string* must be a :class:`str`.
503
504 *encoding* defaults to ``'utf-8'``.
505 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
506 by a placeholder character.
507
508 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000509
510
Georg Brandl7f01a132009-09-16 15:58:14 +0000511.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000512
Georg Brandl0f7ede42008-06-23 11:23:31 +0000513 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000514 unquoting HTML form values.
515
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000516 *string* must be a :class:`str`.
517
518 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
519
Georg Brandl7f01a132009-09-16 15:58:14 +0000520
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000521.. function:: unquote_to_bytes(string)
522
523 Replace ``%xx`` escapes by their single-octet equivalent, and return a
524 :class:`bytes` object.
525
526 *string* may be either a :class:`str` or a :class:`bytes`.
527
528 If it is a :class:`str`, unescaped non-ASCII characters in *string*
529 are encoded into UTF-8 bytes.
530
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000531 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000532
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000533
R David Murrayc17686f2015-05-17 20:44:50 -0400534.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
535 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000536
Senthil Kumarandf022da2010-07-03 17:48:22 +0000537 Convert a mapping object or a sequence of two-element tuples, which may
R David Murray8c4e1122014-12-24 21:23:18 -0500538 contain :class:`str` or :class:`bytes` objects, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700539 string. If the resultant string is to be used as a *data* for POST
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300540 operation with :func:`~urllib.request.urlopen` function, then it should be
541 properly encoded to bytes, otherwise it would result in a :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700542
Senthil Kumarandf022da2010-07-03 17:48:22 +0000543 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400544 characters, where both *key* and *value* are quoted using the *quote_via*
545 function. By default, :func:`quote_plus` is used to quote the values, which
546 means spaces are quoted as a ``'+'`` character and '/' characters are
547 encoded as ``%2F``, which follows the standard for GET requests
548 (``application/x-www-form-urlencoded``). An alternate function that can be
549 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
550 and not encode '/' characters. For maximum control of what is quoted, use
551 ``quote`` and specify a value for *safe*.
552
553 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000554 argument, the first element of each tuple is a key and the second is a
555 value. The value element in itself can be a sequence and in that case, if
556 the optional parameter *doseq* is evaluates to *True*, individual
557 ``key=value`` pairs separated by ``'&'`` are generated for each element of
558 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000559 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000560
R David Murray8c4e1122014-12-24 21:23:18 -0500561 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400562 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500563 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000564
565 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
566 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000567
Senthil Kumaran29333122011-02-11 11:25:47 +0000568 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
569 method can be used for generating query string for a URL or data for POST.
570
Senthil Kumarandf022da2010-07-03 17:48:22 +0000571 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000572 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000573
R David Murrayc17686f2015-05-17 20:44:50 -0400574 .. versionadded:: 3.5
575 *quote_via* parameter.
576
Georg Brandl116aa622007-08-15 14:28:22 +0000577
578.. seealso::
579
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000580 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700581 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000582 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000583 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000584 parsing requirements as commonly observed in major browsers.
585
586 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
587 This specifies the parsing requirements of IPv6 URLs.
588
589 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
590 Document describing the generic syntactic requirements for both Uniform Resource
591 Names (URNs) and Uniform Resource Locators (URLs).
592
593 :rfc:`2368` - The mailto URL scheme.
594 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000595
596 :rfc:`1808` - Relative Uniform Resource Locators
597 This Request For Comments includes the rules for joining an absolute and a
598 relative URL, including a fair number of "Abnormal Examples" which govern the
599 treatment of border cases.
600
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000601 :rfc:`1738` - Uniform Resource Locators (URL)
602 This specifies the formal syntax and semantics of absolute URLs.