blob: b95142042d87c37604a02fa9192616561704e201 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080072 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070073 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
R. David Murrayf5077aa2010-05-25 15:36:46 +000079 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000080 scheme, to be used only if the URL does not specify one. The default value for
81 this argument is the empty string.
82
83 If the *allow_fragments* argument is false, fragment identifiers are not
Georg Brandla79b8dc2012-09-29 08:59:23 +020084 allowed. The default value for this argument is :const:`True`.
Georg Brandl116aa622007-08-15 14:28:22 +000085
86 The return value is actually an instance of a subclass of :class:`tuple`. This
87 class has the following additional read-only convenience attributes:
88
89 +------------------+-------+--------------------------+----------------------+
90 | Attribute | Index | Value | Value if not present |
91 +==================+=======+==========================+======================+
92 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
93 +------------------+-------+--------------------------+----------------------+
94 | :attr:`netloc` | 1 | Network location part | empty string |
95 +------------------+-------+--------------------------+----------------------+
96 | :attr:`path` | 2 | Hierarchical path | empty string |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`params` | 3 | Parameters for last path | empty string |
99 | | | element | |
100 +------------------+-------+--------------------------+----------------------+
101 | :attr:`query` | 4 | Query component | empty string |
102 +------------------+-------+--------------------------+----------------------+
103 | :attr:`fragment` | 5 | Fragment identifier | empty string |
104 +------------------+-------+--------------------------+----------------------+
105 | :attr:`username` | | User name | :const:`None` |
106 +------------------+-------+--------------------------+----------------------+
107 | :attr:`password` | | Password | :const:`None` |
108 +------------------+-------+--------------------------+----------------------+
109 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
110 +------------------+-------+--------------------------+----------------------+
111 | :attr:`port` | | Port number as integer, | :const:`None` |
112 | | | if present | |
113 +------------------+-------+--------------------------+----------------------+
114
115 See section :ref:`urlparse-result-object` for more information on the result
116 object.
117
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000118 .. versionchanged:: 3.2
119 Added IPv6 URL parsing capabilities.
120
Georg Brandla79b8dc2012-09-29 08:59:23 +0200121 .. versionchanged:: 3.3
122 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
123 false), in accordance with :rfc:`3986`. Previously, a whitelist of
124 schemes that support fragments existed.
125
Georg Brandl116aa622007-08-15 14:28:22 +0000126
Victor Stinnerac71c542011-01-14 12:52:12 +0000127.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000128
129 Parse a query string given as a string argument (data of type
130 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
131 dictionary. The dictionary keys are the unique query variable names and the
132 values are lists of values for each name.
133
134 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000135 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000136 indicates that blanks should be retained as blank strings. The default false
137 value indicates that blank values are to be ignored and treated as if they were
138 not included.
139
140 The optional argument *strict_parsing* is a flag indicating what to do with
141 parsing errors. If false (the default), errors are silently ignored. If true,
142 errors raise a :exc:`ValueError` exception.
143
Victor Stinnerac71c542011-01-14 12:52:12 +0000144 The optional *encoding* and *errors* parameters specify how to decode
145 percent-encoded sequences into Unicode characters, as accepted by the
146 :meth:`bytes.decode` method.
147
Michael Foord207d2292012-09-28 14:40:44 +0100148 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
149 parameter set to ``True``) to convert such dictionaries into query
150 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000151
Senthil Kumaran29333122011-02-11 11:25:47 +0000152
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000153 .. versionchanged:: 3.2
154 Add *encoding* and *errors* parameters.
155
Facundo Batistac469d4c2008-09-03 22:49:01 +0000156
Victor Stinnerac71c542011-01-14 12:52:12 +0000157.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000158
159 Parse a query string given as a string argument (data of type
160 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
161 name, value pairs.
162
163 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000164 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000165 indicates that blanks should be retained as blank strings. The default false
166 value indicates that blank values are to be ignored and treated as if they were
167 not included.
168
169 The optional argument *strict_parsing* is a flag indicating what to do with
170 parsing errors. If false (the default), errors are silently ignored. If true,
171 errors raise a :exc:`ValueError` exception.
172
Victor Stinnerac71c542011-01-14 12:52:12 +0000173 The optional *encoding* and *errors* parameters specify how to decode
174 percent-encoded sequences into Unicode characters, as accepted by the
175 :meth:`bytes.decode` method.
176
Facundo Batistac469d4c2008-09-03 22:49:01 +0000177 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
178 query strings.
179
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000180 .. versionchanged:: 3.2
181 Add *encoding* and *errors* parameters.
182
Facundo Batistac469d4c2008-09-03 22:49:01 +0000183
Georg Brandl116aa622007-08-15 14:28:22 +0000184.. function:: urlunparse(parts)
185
Georg Brandl0f7ede42008-06-23 11:23:31 +0000186 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
187 argument can be any six-item iterable. This may result in a slightly
188 different, but equivalent URL, if the URL that was parsed originally had
189 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
190 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000191
192
R. David Murrayf5077aa2010-05-25 15:36:46 +0000193.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000194
195 This is similar to :func:`urlparse`, but does not split the params from the URL.
196 This should generally be used instead of :func:`urlparse` if the more recent URL
197 syntax allowing parameters to be applied to each segment of the *path* portion
198 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
199 separate the path segments and parameters. This function returns a 5-tuple:
200 (addressing scheme, network location, path, query, fragment identifier).
201
202 The return value is actually an instance of a subclass of :class:`tuple`. This
203 class has the following additional read-only convenience attributes:
204
205 +------------------+-------+-------------------------+----------------------+
206 | Attribute | Index | Value | Value if not present |
207 +==================+=======+=========================+======================+
208 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
209 +------------------+-------+-------------------------+----------------------+
210 | :attr:`netloc` | 1 | Network location part | empty string |
211 +------------------+-------+-------------------------+----------------------+
212 | :attr:`path` | 2 | Hierarchical path | empty string |
213 +------------------+-------+-------------------------+----------------------+
214 | :attr:`query` | 3 | Query component | empty string |
215 +------------------+-------+-------------------------+----------------------+
216 | :attr:`fragment` | 4 | Fragment identifier | empty string |
217 +------------------+-------+-------------------------+----------------------+
218 | :attr:`username` | | User name | :const:`None` |
219 +------------------+-------+-------------------------+----------------------+
220 | :attr:`password` | | Password | :const:`None` |
221 +------------------+-------+-------------------------+----------------------+
222 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
223 +------------------+-------+-------------------------+----------------------+
224 | :attr:`port` | | Port number as integer, | :const:`None` |
225 | | | if present | |
226 +------------------+-------+-------------------------+----------------------+
227
228 See section :ref:`urlparse-result-object` for more information on the result
229 object.
230
Georg Brandl116aa622007-08-15 14:28:22 +0000231
232.. function:: urlunsplit(parts)
233
Georg Brandl0f7ede42008-06-23 11:23:31 +0000234 Combine the elements of a tuple as returned by :func:`urlsplit` into a
235 complete URL as a string. The *parts* argument can be any five-item
236 iterable. This may result in a slightly different, but equivalent URL, if the
237 URL that was parsed originally had unnecessary delimiters (for example, a ?
238 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000239
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Georg Brandl7f01a132009-09-16 15:58:14 +0000241.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000242
243 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
244 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000245 particular the addressing scheme, the network location and (part of) the
246 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000247
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000248 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000249 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
250 'http://www.cwi.nl/%7Eguido/FAQ.html'
251
252 The *allow_fragments* argument has the same meaning and default as for
253 :func:`urlparse`.
254
255 .. note::
256
257 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
258 the *url*'s host name and/or scheme will be present in the result. For example:
259
Christian Heimesfe337bf2008-03-23 21:54:12 +0000260 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000261
262 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
263 ... '//www.python.org/%7Eguido')
264 'http://www.python.org/%7Eguido'
265
266 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
267 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
268
269
270.. function:: urldefrag(url)
271
Georg Brandl0f7ede42008-06-23 11:23:31 +0000272 If *url* contains a fragment identifier, return a modified version of *url*
273 with no fragment identifier, and the fragment identifier as a separate
274 string. If there is no fragment identifier in *url*, return *url* unmodified
275 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000276
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000277 The return value is actually an instance of a subclass of :class:`tuple`. This
278 class has the following additional read-only convenience attributes:
279
280 +------------------+-------+-------------------------+----------------------+
281 | Attribute | Index | Value | Value if not present |
282 +==================+=======+=========================+======================+
283 | :attr:`url` | 0 | URL with no fragment | empty string |
284 +------------------+-------+-------------------------+----------------------+
285 | :attr:`fragment` | 1 | Fragment identifier | empty string |
286 +------------------+-------+-------------------------+----------------------+
287
288 See section :ref:`urlparse-result-object` for more information on the result
289 object.
290
291 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000292 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000293
Georg Brandl009a6bd2011-01-24 19:59:08 +0000294.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000295
296Parsing ASCII Encoded Bytes
297---------------------------
298
299The URL parsing functions were originally designed to operate on character
300strings only. In practice, it is useful to be able to manipulate properly
301quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
302URL parsing functions in this module all operate on :class:`bytes` and
303:class:`bytearray` objects in addition to :class:`str` objects.
304
305If :class:`str` data is passed in, the result will also contain only
306:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
307passed in, the result will contain only :class:`bytes` data.
308
309Attempting to mix :class:`str` data with :class:`bytes` or
310:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000311:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000312byte values will trigger :exc:`UnicodeDecodeError`.
313
314To support easier conversion of result objects between :class:`str` and
315:class:`bytes`, all return values from URL parsing functions provide
316either an :meth:`encode` method (when the result contains :class:`str`
317data) or a :meth:`decode` method (when the result contains :class:`bytes`
318data). The signatures of these methods match those of the corresponding
319:class:`str` and :class:`bytes` methods (except that the default encoding
320is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
321corresponding type that contains either :class:`bytes` data (for
322:meth:`encode` methods) or :class:`str` data (for
323:meth:`decode` methods).
324
325Applications that need to operate on potentially improperly quoted URLs
326that may contain non-ASCII data will need to do their own decoding from
327bytes to characters before invoking the URL parsing methods.
328
329The behaviour described in this section applies only to the URL parsing
330functions. The URL quoting functions use their own rules when producing
331or consuming byte sequences as detailed in the documentation of the
332individual URL quoting functions.
333
334.. versionchanged:: 3.2
335 URL parsing functions now accept ASCII encoded byte sequences
336
337
338.. _urlparse-result-object:
339
340Structured Parse Results
341------------------------
342
343The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000344:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000345These subclasses add the attributes listed in the documentation for
346those functions, the encoding and decoding support described in the
347previous section, as well as an additional method:
348
349.. method:: urllib.parse.SplitResult.geturl()
350
351 Return the re-combined version of the original URL as a string. This may
352 differ from the original URL in that the scheme may be normalized to lower
353 case and empty components may be dropped. Specifically, empty parameters,
354 queries, and fragment identifiers will be removed.
355
356 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
357 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
358 made to the URL returned by this method.
359
360 The result of this method remains unchanged if passed back through the original
361 parsing function:
362
363 >>> from urllib.parse import urlsplit
364 >>> url = 'HTTP://www.Python.org/doc/#'
365 >>> r1 = urlsplit(url)
366 >>> r1.geturl()
367 'http://www.Python.org/doc/'
368 >>> r2 = urlsplit(r1.geturl())
369 >>> r2.geturl()
370 'http://www.Python.org/doc/'
371
372
373The following classes provide the implementations of the structured parse
374results when operating on :class:`str` objects:
375
376.. class:: DefragResult(url, fragment)
377
378 Concrete class for :func:`urldefrag` results containing :class:`str`
379 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
380 instance.
381
382 .. versionadded:: 3.2
383
384.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
385
386 Concrete class for :func:`urlparse` results containing :class:`str`
387 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
388 instance.
389
390.. class:: SplitResult(scheme, netloc, path, query, fragment)
391
392 Concrete class for :func:`urlsplit` results containing :class:`str`
393 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
394 instance.
395
396
397The following classes provide the implementations of the parse results when
398operating on :class:`bytes` or :class:`bytearray` objects:
399
400.. class:: DefragResultBytes(url, fragment)
401
402 Concrete class for :func:`urldefrag` results containing :class:`bytes`
403 data. The :meth:`decode` method returns a :class:`DefragResult`
404 instance.
405
406 .. versionadded:: 3.2
407
408.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
409
410 Concrete class for :func:`urlparse` results containing :class:`bytes`
411 data. The :meth:`decode` method returns a :class:`ParseResult`
412 instance.
413
414 .. versionadded:: 3.2
415
416.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
417
418 Concrete class for :func:`urlsplit` results containing :class:`bytes`
419 data. The :meth:`decode` method returns a :class:`SplitResult`
420 instance.
421
422 .. versionadded:: 3.2
423
424
425URL Quoting
426-----------
427
428The URL quoting functions focus on taking program data and making it safe
429for use as URL components by quoting special characters and appropriately
430encoding non-ASCII text. They also support reversing these operations to
431recreate the original data from the contents of a URL component if that
432task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000433
434.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000435
436 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000437 digits, and the characters ``'_.-'`` are never quoted. By default, this
438 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000439 parameter specifies additional ASCII characters that should not be quoted
440 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000441
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000442 *string* may be either a :class:`str` or a :class:`bytes`.
443
444 The optional *encoding* and *errors* parameters specify how to deal with
445 non-ASCII characters, as accepted by the :meth:`str.encode` method.
446 *encoding* defaults to ``'utf-8'``.
447 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
448 :class:`UnicodeEncodeError`.
449 *encoding* and *errors* must not be supplied if *string* is a
450 :class:`bytes`, or a :class:`TypeError` is raised.
451
452 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
453 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
454
455 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000456
457
Georg Brandl7f01a132009-09-16 15:58:14 +0000458.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000459
Georg Brandl0f7ede42008-06-23 11:23:31 +0000460 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000461 quoting HTML form values when building up a query string to go into a URL.
462 Plus signs in the original string are escaped unless they are included in
463 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000464
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000465 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000466
Georg Brandl7f01a132009-09-16 15:58:14 +0000467
468.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000469
470 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
471 :class:`str`, and does not perform string-to-bytes encoding.
472
473 Example: ``quote_from_bytes(b'a&\xef')`` yields
474 ``'a%26%EF'``.
475
Georg Brandl7f01a132009-09-16 15:58:14 +0000476
477.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000478
479 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000480 The optional *encoding* and *errors* parameters specify how to decode
481 percent-encoded sequences into Unicode characters, as accepted by the
482 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000483
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000484 *string* must be a :class:`str`.
485
486 *encoding* defaults to ``'utf-8'``.
487 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
488 by a placeholder character.
489
490 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000491
492
Georg Brandl7f01a132009-09-16 15:58:14 +0000493.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000494
Georg Brandl0f7ede42008-06-23 11:23:31 +0000495 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000496 unquoting HTML form values.
497
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000498 *string* must be a :class:`str`.
499
500 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
501
Georg Brandl7f01a132009-09-16 15:58:14 +0000502
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000503.. function:: unquote_to_bytes(string)
504
505 Replace ``%xx`` escapes by their single-octet equivalent, and return a
506 :class:`bytes` object.
507
508 *string* may be either a :class:`str` or a :class:`bytes`.
509
510 If it is a :class:`str`, unescaped non-ASCII characters in *string*
511 are encoded into UTF-8 bytes.
512
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000513 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000514
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000515
Senthil Kumarandf022da2010-07-03 17:48:22 +0000516.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000517
Senthil Kumarandf022da2010-07-03 17:48:22 +0000518 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaran29333122011-02-11 11:25:47 +0000519 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700520 string. If the resultant string is to be used as a *data* for POST
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300521 operation with :func:`~urllib.request.urlopen` function, then it should be
522 properly encoded to bytes, otherwise it would result in a :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700523
Senthil Kumarandf022da2010-07-03 17:48:22 +0000524 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
525 characters, where both *key* and *value* are quoted using :func:`quote_plus`
526 above. When a sequence of two-element tuples is used as the *query*
527 argument, the first element of each tuple is a key and the second is a
528 value. The value element in itself can be a sequence and in that case, if
529 the optional parameter *doseq* is evaluates to *True*, individual
530 ``key=value`` pairs separated by ``'&'`` are generated for each element of
531 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000532 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000533
534 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000535 parameters are passed down to :func:`quote_plus` for encoding.
536
537 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
538 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000539
Senthil Kumaran29333122011-02-11 11:25:47 +0000540 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
541 method can be used for generating query string for a URL or data for POST.
542
Senthil Kumarandf022da2010-07-03 17:48:22 +0000543 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000544 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000545
Georg Brandl116aa622007-08-15 14:28:22 +0000546
547.. seealso::
548
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000549 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700550 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000551 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000552 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000553 parsing requirements as commonly observed in major browsers.
554
555 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
556 This specifies the parsing requirements of IPv6 URLs.
557
558 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
559 Document describing the generic syntactic requirements for both Uniform Resource
560 Names (URNs) and Uniform Resource Locators (URLs).
561
562 :rfc:`2368` - The mailto URL scheme.
563 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000564
565 :rfc:`1808` - Relative Uniform Resource Locators
566 This Request For Comments includes the rules for joining an absolute and a
567 relative URL, including a fair number of "Abnormal Examples" which govern the
568 treatment of border cases.
569
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000570 :rfc:`1738` - Uniform Resource Locators (URL)
571 This specifies the formal syntax and semantics of absolute URLs.