blob: b6f241c287b04b088cab98d45e4af938a40f4e59 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
72 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
73 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
74 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
R. David Murrayf5077aa2010-05-25 15:36:46 +000079 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000080 scheme, to be used only if the URL does not specify one. The default value for
81 this argument is the empty string.
82
83 If the *allow_fragments* argument is false, fragment identifiers are not
84 allowed, even if the URL's addressing scheme normally does support them. The
85 default value for this argument is :const:`True`.
86
87 The return value is actually an instance of a subclass of :class:`tuple`. This
88 class has the following additional read-only convenience attributes:
89
90 +------------------+-------+--------------------------+----------------------+
91 | Attribute | Index | Value | Value if not present |
92 +==================+=======+==========================+======================+
93 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`netloc` | 1 | Network location part | empty string |
96 +------------------+-------+--------------------------+----------------------+
97 | :attr:`path` | 2 | Hierarchical path | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`params` | 3 | Parameters for last path | empty string |
100 | | | element | |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`query` | 4 | Query component | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`fragment` | 5 | Fragment identifier | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`username` | | User name | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`password` | | Password | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`port` | | Port number as integer, | :const:`None` |
113 | | | if present | |
114 +------------------+-------+--------------------------+----------------------+
115
116 See section :ref:`urlparse-result-object` for more information on the result
117 object.
118
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000119 .. versionchanged:: 3.2
120 Added IPv6 URL parsing capabilities.
121
Georg Brandl116aa622007-08-15 14:28:22 +0000122
Victor Stinnerac71c542011-01-14 12:52:12 +0000123.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000124
125 Parse a query string given as a string argument (data of type
126 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
127 dictionary. The dictionary keys are the unique query variable names and the
128 values are lists of values for each name.
129
130 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000131 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000132 indicates that blanks should be retained as blank strings. The default false
133 value indicates that blank values are to be ignored and treated as if they were
134 not included.
135
136 The optional argument *strict_parsing* is a flag indicating what to do with
137 parsing errors. If false (the default), errors are silently ignored. If true,
138 errors raise a :exc:`ValueError` exception.
139
Victor Stinnerac71c542011-01-14 12:52:12 +0000140 The optional *encoding* and *errors* parameters specify how to decode
141 percent-encoded sequences into Unicode characters, as accepted by the
142 :meth:`bytes.decode` method.
143
Michael Foord207d2292012-09-28 14:40:44 +0100144 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
145 parameter set to ``True``) to convert such dictionaries into query
146 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000147
Senthil Kumaran29333122011-02-11 11:25:47 +0000148
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000149 .. versionchanged:: 3.2
150 Add *encoding* and *errors* parameters.
151
Facundo Batistac469d4c2008-09-03 22:49:01 +0000152
Victor Stinnerac71c542011-01-14 12:52:12 +0000153.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000154
155 Parse a query string given as a string argument (data of type
156 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
157 name, value pairs.
158
159 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000160 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000161 indicates that blanks should be retained as blank strings. The default false
162 value indicates that blank values are to be ignored and treated as if they were
163 not included.
164
165 The optional argument *strict_parsing* is a flag indicating what to do with
166 parsing errors. If false (the default), errors are silently ignored. If true,
167 errors raise a :exc:`ValueError` exception.
168
Victor Stinnerac71c542011-01-14 12:52:12 +0000169 The optional *encoding* and *errors* parameters specify how to decode
170 percent-encoded sequences into Unicode characters, as accepted by the
171 :meth:`bytes.decode` method.
172
Facundo Batistac469d4c2008-09-03 22:49:01 +0000173 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
174 query strings.
175
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000176 .. versionchanged:: 3.2
177 Add *encoding* and *errors* parameters.
178
Facundo Batistac469d4c2008-09-03 22:49:01 +0000179
Georg Brandl116aa622007-08-15 14:28:22 +0000180.. function:: urlunparse(parts)
181
Georg Brandl0f7ede42008-06-23 11:23:31 +0000182 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
183 argument can be any six-item iterable. This may result in a slightly
184 different, but equivalent URL, if the URL that was parsed originally had
185 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
186 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000187
188
R. David Murrayf5077aa2010-05-25 15:36:46 +0000189.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000190
191 This is similar to :func:`urlparse`, but does not split the params from the URL.
192 This should generally be used instead of :func:`urlparse` if the more recent URL
193 syntax allowing parameters to be applied to each segment of the *path* portion
194 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
195 separate the path segments and parameters. This function returns a 5-tuple:
196 (addressing scheme, network location, path, query, fragment identifier).
197
198 The return value is actually an instance of a subclass of :class:`tuple`. This
199 class has the following additional read-only convenience attributes:
200
201 +------------------+-------+-------------------------+----------------------+
202 | Attribute | Index | Value | Value if not present |
203 +==================+=======+=========================+======================+
204 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
205 +------------------+-------+-------------------------+----------------------+
206 | :attr:`netloc` | 1 | Network location part | empty string |
207 +------------------+-------+-------------------------+----------------------+
208 | :attr:`path` | 2 | Hierarchical path | empty string |
209 +------------------+-------+-------------------------+----------------------+
210 | :attr:`query` | 3 | Query component | empty string |
211 +------------------+-------+-------------------------+----------------------+
212 | :attr:`fragment` | 4 | Fragment identifier | empty string |
213 +------------------+-------+-------------------------+----------------------+
214 | :attr:`username` | | User name | :const:`None` |
215 +------------------+-------+-------------------------+----------------------+
216 | :attr:`password` | | Password | :const:`None` |
217 +------------------+-------+-------------------------+----------------------+
218 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
219 +------------------+-------+-------------------------+----------------------+
220 | :attr:`port` | | Port number as integer, | :const:`None` |
221 | | | if present | |
222 +------------------+-------+-------------------------+----------------------+
223
224 See section :ref:`urlparse-result-object` for more information on the result
225 object.
226
Georg Brandl116aa622007-08-15 14:28:22 +0000227
228.. function:: urlunsplit(parts)
229
Georg Brandl0f7ede42008-06-23 11:23:31 +0000230 Combine the elements of a tuple as returned by :func:`urlsplit` into a
231 complete URL as a string. The *parts* argument can be any five-item
232 iterable. This may result in a slightly different, but equivalent URL, if the
233 URL that was parsed originally had unnecessary delimiters (for example, a ?
234 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000235
Georg Brandl116aa622007-08-15 14:28:22 +0000236
Georg Brandl7f01a132009-09-16 15:58:14 +0000237.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000238
239 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
240 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000241 particular the addressing scheme, the network location and (part of) the
242 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000244 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000245 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
246 'http://www.cwi.nl/%7Eguido/FAQ.html'
247
248 The *allow_fragments* argument has the same meaning and default as for
249 :func:`urlparse`.
250
251 .. note::
252
253 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
254 the *url*'s host name and/or scheme will be present in the result. For example:
255
Christian Heimesfe337bf2008-03-23 21:54:12 +0000256 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000257
258 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
259 ... '//www.python.org/%7Eguido')
260 'http://www.python.org/%7Eguido'
261
262 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
263 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
264
265
266.. function:: urldefrag(url)
267
Georg Brandl0f7ede42008-06-23 11:23:31 +0000268 If *url* contains a fragment identifier, return a modified version of *url*
269 with no fragment identifier, and the fragment identifier as a separate
270 string. If there is no fragment identifier in *url*, return *url* unmodified
271 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000272
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000273 The return value is actually an instance of a subclass of :class:`tuple`. This
274 class has the following additional read-only convenience attributes:
275
276 +------------------+-------+-------------------------+----------------------+
277 | Attribute | Index | Value | Value if not present |
278 +==================+=======+=========================+======================+
279 | :attr:`url` | 0 | URL with no fragment | empty string |
280 +------------------+-------+-------------------------+----------------------+
281 | :attr:`fragment` | 1 | Fragment identifier | empty string |
282 +------------------+-------+-------------------------+----------------------+
283
284 See section :ref:`urlparse-result-object` for more information on the result
285 object.
286
287 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000288 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000289
Georg Brandl009a6bd2011-01-24 19:59:08 +0000290.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000291
292Parsing ASCII Encoded Bytes
293---------------------------
294
295The URL parsing functions were originally designed to operate on character
296strings only. In practice, it is useful to be able to manipulate properly
297quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
298URL parsing functions in this module all operate on :class:`bytes` and
299:class:`bytearray` objects in addition to :class:`str` objects.
300
301If :class:`str` data is passed in, the result will also contain only
302:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
303passed in, the result will contain only :class:`bytes` data.
304
305Attempting to mix :class:`str` data with :class:`bytes` or
306:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000307:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000308byte values will trigger :exc:`UnicodeDecodeError`.
309
310To support easier conversion of result objects between :class:`str` and
311:class:`bytes`, all return values from URL parsing functions provide
312either an :meth:`encode` method (when the result contains :class:`str`
313data) or a :meth:`decode` method (when the result contains :class:`bytes`
314data). The signatures of these methods match those of the corresponding
315:class:`str` and :class:`bytes` methods (except that the default encoding
316is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
317corresponding type that contains either :class:`bytes` data (for
318:meth:`encode` methods) or :class:`str` data (for
319:meth:`decode` methods).
320
321Applications that need to operate on potentially improperly quoted URLs
322that may contain non-ASCII data will need to do their own decoding from
323bytes to characters before invoking the URL parsing methods.
324
325The behaviour described in this section applies only to the URL parsing
326functions. The URL quoting functions use their own rules when producing
327or consuming byte sequences as detailed in the documentation of the
328individual URL quoting functions.
329
330.. versionchanged:: 3.2
331 URL parsing functions now accept ASCII encoded byte sequences
332
333
334.. _urlparse-result-object:
335
336Structured Parse Results
337------------------------
338
339The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000340:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000341These subclasses add the attributes listed in the documentation for
342those functions, the encoding and decoding support described in the
343previous section, as well as an additional method:
344
345.. method:: urllib.parse.SplitResult.geturl()
346
347 Return the re-combined version of the original URL as a string. This may
348 differ from the original URL in that the scheme may be normalized to lower
349 case and empty components may be dropped. Specifically, empty parameters,
350 queries, and fragment identifiers will be removed.
351
352 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
353 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
354 made to the URL returned by this method.
355
356 The result of this method remains unchanged if passed back through the original
357 parsing function:
358
359 >>> from urllib.parse import urlsplit
360 >>> url = 'HTTP://www.Python.org/doc/#'
361 >>> r1 = urlsplit(url)
362 >>> r1.geturl()
363 'http://www.Python.org/doc/'
364 >>> r2 = urlsplit(r1.geturl())
365 >>> r2.geturl()
366 'http://www.Python.org/doc/'
367
368
369The following classes provide the implementations of the structured parse
370results when operating on :class:`str` objects:
371
372.. class:: DefragResult(url, fragment)
373
374 Concrete class for :func:`urldefrag` results containing :class:`str`
375 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
376 instance.
377
378 .. versionadded:: 3.2
379
380.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
381
382 Concrete class for :func:`urlparse` results containing :class:`str`
383 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
384 instance.
385
386.. class:: SplitResult(scheme, netloc, path, query, fragment)
387
388 Concrete class for :func:`urlsplit` results containing :class:`str`
389 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
390 instance.
391
392
393The following classes provide the implementations of the parse results when
394operating on :class:`bytes` or :class:`bytearray` objects:
395
396.. class:: DefragResultBytes(url, fragment)
397
398 Concrete class for :func:`urldefrag` results containing :class:`bytes`
399 data. The :meth:`decode` method returns a :class:`DefragResult`
400 instance.
401
402 .. versionadded:: 3.2
403
404.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
405
406 Concrete class for :func:`urlparse` results containing :class:`bytes`
407 data. The :meth:`decode` method returns a :class:`ParseResult`
408 instance.
409
410 .. versionadded:: 3.2
411
412.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
413
414 Concrete class for :func:`urlsplit` results containing :class:`bytes`
415 data. The :meth:`decode` method returns a :class:`SplitResult`
416 instance.
417
418 .. versionadded:: 3.2
419
420
421URL Quoting
422-----------
423
424The URL quoting functions focus on taking program data and making it safe
425for use as URL components by quoting special characters and appropriately
426encoding non-ASCII text. They also support reversing these operations to
427recreate the original data from the contents of a URL component if that
428task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000429
430.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000431
432 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000433 digits, and the characters ``'_.-'`` are never quoted. By default, this
434 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000435 parameter specifies additional ASCII characters that should not be quoted
436 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000437
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000438 *string* may be either a :class:`str` or a :class:`bytes`.
439
440 The optional *encoding* and *errors* parameters specify how to deal with
441 non-ASCII characters, as accepted by the :meth:`str.encode` method.
442 *encoding* defaults to ``'utf-8'``.
443 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
444 :class:`UnicodeEncodeError`.
445 *encoding* and *errors* must not be supplied if *string* is a
446 :class:`bytes`, or a :class:`TypeError` is raised.
447
448 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
449 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
450
451 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000452
453
Georg Brandl7f01a132009-09-16 15:58:14 +0000454.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000455
Georg Brandl0f7ede42008-06-23 11:23:31 +0000456 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000457 quoting HTML form values when building up a query string to go into a URL.
458 Plus signs in the original string are escaped unless they are included in
459 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000460
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000461 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000462
Georg Brandl7f01a132009-09-16 15:58:14 +0000463
464.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000465
466 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
467 :class:`str`, and does not perform string-to-bytes encoding.
468
469 Example: ``quote_from_bytes(b'a&\xef')`` yields
470 ``'a%26%EF'``.
471
Georg Brandl7f01a132009-09-16 15:58:14 +0000472
473.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000474
475 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000476 The optional *encoding* and *errors* parameters specify how to decode
477 percent-encoded sequences into Unicode characters, as accepted by the
478 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000479
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000480 *string* must be a :class:`str`.
481
482 *encoding* defaults to ``'utf-8'``.
483 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
484 by a placeholder character.
485
486 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000487
488
Georg Brandl7f01a132009-09-16 15:58:14 +0000489.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000490
Georg Brandl0f7ede42008-06-23 11:23:31 +0000491 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000492 unquoting HTML form values.
493
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000494 *string* must be a :class:`str`.
495
496 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
497
Georg Brandl7f01a132009-09-16 15:58:14 +0000498
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000499.. function:: unquote_to_bytes(string)
500
501 Replace ``%xx`` escapes by their single-octet equivalent, and return a
502 :class:`bytes` object.
503
504 *string* may be either a :class:`str` or a :class:`bytes`.
505
506 If it is a :class:`str`, unescaped non-ASCII characters in *string*
507 are encoded into UTF-8 bytes.
508
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000509 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000510
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000511
Senthil Kumarandf022da2010-07-03 17:48:22 +0000512.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000513
Senthil Kumarandf022da2010-07-03 17:48:22 +0000514 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaran29333122011-02-11 11:25:47 +0000515 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700516 string. If the resultant string is to be used as a *data* for POST
517 operation with :func:`urlopen` function, then it should be properly encoded
518 to bytes, otherwise it would result in a :exc:`TypeError`.
519
Senthil Kumarandf022da2010-07-03 17:48:22 +0000520 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
521 characters, where both *key* and *value* are quoted using :func:`quote_plus`
522 above. When a sequence of two-element tuples is used as the *query*
523 argument, the first element of each tuple is a key and the second is a
524 value. The value element in itself can be a sequence and in that case, if
525 the optional parameter *doseq* is evaluates to *True*, individual
526 ``key=value`` pairs separated by ``'&'`` are generated for each element of
527 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000528 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000529
530 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000531 parameters are passed down to :func:`quote_plus` for encoding.
532
533 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
534 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000535
Senthil Kumaran29333122011-02-11 11:25:47 +0000536 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
537 method can be used for generating query string for a URL or data for POST.
538
Senthil Kumarandf022da2010-07-03 17:48:22 +0000539 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000540 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000541
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543.. seealso::
544
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000545 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700546 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000547 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000548 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000549 parsing requirements as commonly observed in major browsers.
550
551 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
552 This specifies the parsing requirements of IPv6 URLs.
553
554 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
555 Document describing the generic syntactic requirements for both Uniform Resource
556 Names (URNs) and Uniform Resource Locators (URLs).
557
558 :rfc:`2368` - The mailto URL scheme.
559 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000560
561 :rfc:`1808` - Relative Uniform Resource Locators
562 This Request For Comments includes the rules for joining an absolute and a
563 relative URL, including a fair number of "Abnormal Examples" which govern the
564 treatment of border cases.
565
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000566 :rfc:`1738` - Uniform Resource Locators (URL)
567 This specifies the formal syntax and semantics of absolute URLs.