Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 1 | :mod:`http.cookiejar` --- Cookie handling for HTTP clients |
| 2 | ========================================================== |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 4 | .. module:: http.cookiejar |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 | :synopsis: Classes for automatic handling of HTTP cookies. |
| 6 | .. moduleauthor:: John J. Lee <jjl@pobox.com> |
| 7 | .. sectionauthor:: John J. Lee <jjl@pobox.com> |
| 8 | |
Raymond Hettinger | 469271d | 2011-01-27 20:38:46 +0000 | [diff] [blame] | 9 | **Source code:** :source:`Lib/http/cookiejar.py` |
| 10 | |
| 11 | -------------- |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 12 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 13 | The :mod:`http.cookiejar` module defines classes for automatic handling of HTTP |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | cookies. It is useful for accessing web sites that require small pieces of data |
| 15 | -- :dfn:`cookies` -- to be set on the client machine by an HTTP response from a |
| 16 | web server, and then returned to the server in later HTTP requests. |
| 17 | |
| 18 | Both the regular Netscape cookie protocol and the protocol defined by |
| 19 | :rfc:`2965` are handled. RFC 2965 handling is switched off by default. |
| 20 | :rfc:`2109` cookies are parsed as Netscape cookies and subsequently treated |
| 21 | either as Netscape or RFC 2965 cookies according to the 'policy' in effect. |
| 22 | Note that the great majority of cookies on the Internet are Netscape cookies. |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 23 | :mod:`http.cookiejar` attempts to follow the de-facto Netscape cookie protocol (which |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 24 | differs substantially from that set out in the original Netscape specification), |
| 25 | including taking note of the ``max-age`` and ``port`` cookie-attributes |
| 26 | introduced with RFC 2965. |
| 27 | |
| 28 | .. note:: |
| 29 | |
| 30 | The various named parameters found in :mailheader:`Set-Cookie` and |
| 31 | :mailheader:`Set-Cookie2` headers (eg. ``domain`` and ``expires``) are |
| 32 | conventionally referred to as :dfn:`attributes`. To distinguish them from |
| 33 | Python attributes, the documentation for this module uses the term |
| 34 | :dfn:`cookie-attribute` instead. |
| 35 | |
| 36 | |
| 37 | The module defines the following exception: |
| 38 | |
| 39 | |
| 40 | .. exception:: LoadError |
| 41 | |
| 42 | Instances of :class:`FileCookieJar` raise this exception on failure to load |
Antoine Pitrou | 62ab10a0 | 2011-10-12 20:10:51 +0200 | [diff] [blame] | 43 | cookies from a file. :exc:`LoadError` is a subclass of :exc:`OSError`. |
| 44 | |
| 45 | .. versionchanged:: 3.3 |
| 46 | LoadError was made a subclass of :exc:`OSError` instead of |
| 47 | :exc:`IOError`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 48 | |
| 49 | |
| 50 | The following classes are provided: |
| 51 | |
| 52 | |
| 53 | .. class:: CookieJar(policy=None) |
| 54 | |
| 55 | *policy* is an object implementing the :class:`CookiePolicy` interface. |
| 56 | |
| 57 | The :class:`CookieJar` class stores HTTP cookies. It extracts cookies from HTTP |
| 58 | requests, and returns them in HTTP responses. :class:`CookieJar` instances |
| 59 | automatically expire contained cookies when necessary. Subclasses are also |
| 60 | responsible for storing and retrieving cookies from a file or database. |
| 61 | |
| 62 | |
| 63 | .. class:: FileCookieJar(filename, delayload=None, policy=None) |
| 64 | |
| 65 | *policy* is an object implementing the :class:`CookiePolicy` interface. For the |
| 66 | other arguments, see the documentation for the corresponding attributes. |
| 67 | |
| 68 | A :class:`CookieJar` which can load cookies from, and perhaps save cookies to, a |
| 69 | file on disk. Cookies are **NOT** loaded from the named file until either the |
| 70 | :meth:`load` or :meth:`revert` method is called. Subclasses of this class are |
| 71 | documented in section :ref:`file-cookie-jar-classes`. |
| 72 | |
| 73 | |
| 74 | .. class:: CookiePolicy() |
| 75 | |
| 76 | This class is responsible for deciding whether each cookie should be accepted |
| 77 | from / returned to the server. |
| 78 | |
| 79 | |
| 80 | .. class:: DefaultCookiePolicy( blocked_domains=None, allowed_domains=None, netscape=True, rfc2965=False, rfc2109_as_netscape=None, hide_cookie2=False, strict_domain=False, strict_rfc2965_unverifiable=True, strict_ns_unverifiable=False, strict_ns_domain=DefaultCookiePolicy.DomainLiberal, strict_ns_set_initial_dollar=False, strict_ns_set_path=False ) |
| 81 | |
| 82 | Constructor arguments should be passed as keyword arguments only. |
| 83 | *blocked_domains* is a sequence of domain names that we never accept cookies |
| 84 | from, nor return cookies to. *allowed_domains* if not :const:`None`, this is a |
| 85 | sequence of the only domains for which we accept and return cookies. For all |
| 86 | other arguments, see the documentation for :class:`CookiePolicy` and |
| 87 | :class:`DefaultCookiePolicy` objects. |
| 88 | |
| 89 | :class:`DefaultCookiePolicy` implements the standard accept / reject rules for |
| 90 | Netscape and RFC 2965 cookies. By default, RFC 2109 cookies (ie. cookies |
| 91 | received in a :mailheader:`Set-Cookie` header with a version cookie-attribute of |
| 92 | 1) are treated according to the RFC 2965 rules. However, if RFC 2965 handling |
| 93 | is turned off or :attr:`rfc2109_as_netscape` is True, RFC 2109 cookies are |
| 94 | 'downgraded' by the :class:`CookieJar` instance to Netscape cookies, by |
| 95 | setting the :attr:`version` attribute of the :class:`Cookie` instance to 0. |
| 96 | :class:`DefaultCookiePolicy` also provides some parameters to allow some |
| 97 | fine-tuning of policy. |
| 98 | |
| 99 | |
| 100 | .. class:: Cookie() |
| 101 | |
| 102 | This class represents Netscape, RFC 2109 and RFC 2965 cookies. It is not |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 103 | expected that users of :mod:`http.cookiejar` construct their own :class:`Cookie` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 104 | instances. Instead, if necessary, call :meth:`make_cookies` on a |
| 105 | :class:`CookieJar` instance. |
| 106 | |
| 107 | |
| 108 | .. seealso:: |
| 109 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 110 | Module :mod:`urllib.request` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 111 | URL opening with automatic cookie handling. |
| 112 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 113 | Module :mod:`http.cookies` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 114 | HTTP cookie classes, principally useful for server-side code. The |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 115 | :mod:`http.cookiejar` and :mod:`http.cookies` modules do not depend on each |
| 116 | other. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 117 | |
Christian Heimes | dd15f6c | 2008-03-16 00:07:10 +0000 | [diff] [blame] | 118 | http://wp.netscape.com/newsref/std/cookie_spec.html |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 119 | The specification of the original Netscape cookie protocol. Though this is |
| 120 | still the dominant protocol, the 'Netscape cookie protocol' implemented by all |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 121 | the major browsers (and :mod:`http.cookiejar`) only bears a passing resemblance to |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 122 | the one sketched out in ``cookie_spec.html``. |
| 123 | |
| 124 | :rfc:`2109` - HTTP State Management Mechanism |
| 125 | Obsoleted by RFC 2965. Uses :mailheader:`Set-Cookie` with version=1. |
| 126 | |
| 127 | :rfc:`2965` - HTTP State Management Mechanism |
| 128 | The Netscape protocol with the bugs fixed. Uses :mailheader:`Set-Cookie2` in |
| 129 | place of :mailheader:`Set-Cookie`. Not widely used. |
| 130 | |
| 131 | http://kristol.org/cookie/errata.html |
| 132 | Unfinished errata to RFC 2965. |
| 133 | |
| 134 | :rfc:`2964` - Use of HTTP State Management |
| 135 | |
| 136 | .. _cookie-jar-objects: |
| 137 | |
| 138 | CookieJar and FileCookieJar Objects |
| 139 | ----------------------------------- |
| 140 | |
Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 141 | :class:`CookieJar` objects support the :term:`iterator` protocol for iterating over |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 142 | contained :class:`Cookie` objects. |
| 143 | |
| 144 | :class:`CookieJar` has the following methods: |
| 145 | |
| 146 | |
| 147 | .. method:: CookieJar.add_cookie_header(request) |
| 148 | |
| 149 | Add correct :mailheader:`Cookie` header to *request*. |
| 150 | |
| 151 | If policy allows (ie. the :attr:`rfc2965` and :attr:`hide_cookie2` attributes of |
| 152 | the :class:`CookieJar`'s :class:`CookiePolicy` instance are true and false |
| 153 | respectively), the :mailheader:`Cookie2` header is also added when appropriate. |
| 154 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 155 | The *request* object (usually a :class:`urllib.request..Request` instance) |
| 156 | must support the methods :meth:`get_full_url`, :meth:`get_host`, |
| 157 | :meth:`get_type`, :meth:`unverifiable`, :meth:`get_origin_req_host`, |
| 158 | :meth:`has_header`, :meth:`get_header`, :meth:`header_items`, and |
| 159 | :meth:`add_unredirected_header`, as documented by :mod:`urllib.request`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 | |
| 161 | |
| 162 | .. method:: CookieJar.extract_cookies(response, request) |
| 163 | |
| 164 | Extract cookies from HTTP *response* and store them in the :class:`CookieJar`, |
| 165 | where allowed by policy. |
| 166 | |
| 167 | The :class:`CookieJar` will look for allowable :mailheader:`Set-Cookie` and |
| 168 | :mailheader:`Set-Cookie2` headers in the *response* argument, and store cookies |
| 169 | as appropriate (subject to the :meth:`CookiePolicy.set_ok` method's approval). |
| 170 | |
Georg Brandl | 83e9f4c | 2008-06-12 18:52:31 +0000 | [diff] [blame] | 171 | The *response* object (usually the result of a call to |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 172 | :meth:`urllib.request.urlopen`, or similar) should support an :meth:`info` |
| 173 | method, which returns a :class:`email.message.Message` instance. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 174 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 175 | The *request* object (usually a :class:`urllib.request.Request` instance) |
| 176 | must support the methods :meth:`get_full_url`, :meth:`get_host`, |
| 177 | :meth:`unverifiable`, and :meth:`get_origin_req_host`, as documented by |
| 178 | :mod:`urllib.request`. The request is used to set default values for |
| 179 | cookie-attributes as well as for checking that the cookie is allowed to be |
| 180 | set. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 181 | |
| 182 | |
| 183 | .. method:: CookieJar.set_policy(policy) |
| 184 | |
| 185 | Set the :class:`CookiePolicy` instance to be used. |
| 186 | |
| 187 | |
| 188 | .. method:: CookieJar.make_cookies(response, request) |
| 189 | |
| 190 | Return sequence of :class:`Cookie` objects extracted from *response* object. |
| 191 | |
| 192 | See the documentation for :meth:`extract_cookies` for the interfaces required of |
| 193 | the *response* and *request* arguments. |
| 194 | |
| 195 | |
| 196 | .. method:: CookieJar.set_cookie_if_ok(cookie, request) |
| 197 | |
| 198 | Set a :class:`Cookie` if policy says it's OK to do so. |
| 199 | |
| 200 | |
| 201 | .. method:: CookieJar.set_cookie(cookie) |
| 202 | |
| 203 | Set a :class:`Cookie`, without checking with policy to see whether or not it |
| 204 | should be set. |
| 205 | |
| 206 | |
| 207 | .. method:: CookieJar.clear([domain[, path[, name]]]) |
| 208 | |
| 209 | Clear some cookies. |
| 210 | |
| 211 | If invoked without arguments, clear all cookies. If given a single argument, |
| 212 | only cookies belonging to that *domain* will be removed. If given two arguments, |
| 213 | cookies belonging to the specified *domain* and URL *path* are removed. If |
| 214 | given three arguments, then the cookie with the specified *domain*, *path* and |
| 215 | *name* is removed. |
| 216 | |
| 217 | Raises :exc:`KeyError` if no matching cookie exists. |
| 218 | |
| 219 | |
| 220 | .. method:: CookieJar.clear_session_cookies() |
| 221 | |
| 222 | Discard all session cookies. |
| 223 | |
| 224 | Discards all contained cookies that have a true :attr:`discard` attribute |
| 225 | (usually because they had either no ``max-age`` or ``expires`` cookie-attribute, |
| 226 | or an explicit ``discard`` cookie-attribute). For interactive browsers, the end |
| 227 | of a session usually corresponds to closing the browser window. |
| 228 | |
| 229 | Note that the :meth:`save` method won't save session cookies anyway, unless you |
| 230 | ask otherwise by passing a true *ignore_discard* argument. |
| 231 | |
| 232 | :class:`FileCookieJar` implements the following additional methods: |
| 233 | |
| 234 | |
| 235 | .. method:: FileCookieJar.save(filename=None, ignore_discard=False, ignore_expires=False) |
| 236 | |
| 237 | Save cookies to a file. |
| 238 | |
| 239 | This base class raises :exc:`NotImplementedError`. Subclasses may leave this |
| 240 | method unimplemented. |
| 241 | |
| 242 | *filename* is the name of file in which to save cookies. If *filename* is not |
| 243 | specified, :attr:`self.filename` is used (whose default is the value passed to |
| 244 | the constructor, if any); if :attr:`self.filename` is :const:`None`, |
| 245 | :exc:`ValueError` is raised. |
| 246 | |
| 247 | *ignore_discard*: save even cookies set to be discarded. *ignore_expires*: save |
| 248 | even cookies that have expired |
| 249 | |
| 250 | The file is overwritten if it already exists, thus wiping all the cookies it |
| 251 | contains. Saved cookies can be restored later using the :meth:`load` or |
| 252 | :meth:`revert` methods. |
| 253 | |
| 254 | |
| 255 | .. method:: FileCookieJar.load(filename=None, ignore_discard=False, ignore_expires=False) |
| 256 | |
| 257 | Load cookies from a file. |
| 258 | |
| 259 | Old cookies are kept unless overwritten by newly loaded ones. |
| 260 | |
| 261 | Arguments are as for :meth:`save`. |
| 262 | |
| 263 | The named file must be in the format understood by the class, or |
Antoine Pitrou | 62ab10a0 | 2011-10-12 20:10:51 +0200 | [diff] [blame] | 264 | :exc:`LoadError` will be raised. Also, :exc:`OSError` may be raised, for |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 265 | example if the file does not exist. |
| 266 | |
Antoine Pitrou | 62ab10a0 | 2011-10-12 20:10:51 +0200 | [diff] [blame] | 267 | .. versionchanged:: 3.3 |
| 268 | :exc:`IOError` used to be raised, it is now an alias of :exc:`OSError`. |
| 269 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 270 | |
| 271 | .. method:: FileCookieJar.revert(filename=None, ignore_discard=False, ignore_expires=False) |
| 272 | |
| 273 | Clear all cookies and reload cookies from a saved file. |
| 274 | |
| 275 | :meth:`revert` can raise the same exceptions as :meth:`load`. If there is a |
| 276 | failure, the object's state will not be altered. |
| 277 | |
| 278 | :class:`FileCookieJar` instances have the following public attributes: |
| 279 | |
| 280 | |
| 281 | .. attribute:: FileCookieJar.filename |
| 282 | |
| 283 | Filename of default file in which to keep cookies. This attribute may be |
| 284 | assigned to. |
| 285 | |
| 286 | |
| 287 | .. attribute:: FileCookieJar.delayload |
| 288 | |
| 289 | If true, load cookies lazily from disk. This attribute should not be assigned |
| 290 | to. This is only a hint, since this only affects performance, not behaviour |
| 291 | (unless the cookies on disk are changing). A :class:`CookieJar` object may |
| 292 | ignore it. None of the :class:`FileCookieJar` classes included in the standard |
| 293 | library lazily loads cookies. |
| 294 | |
| 295 | |
| 296 | .. _file-cookie-jar-classes: |
| 297 | |
| 298 | FileCookieJar subclasses and co-operation with web browsers |
| 299 | ----------------------------------------------------------- |
| 300 | |
Senthil Kumaran | aba088e | 2010-07-11 05:01:52 +0000 | [diff] [blame] | 301 | The following :class:`CookieJar` subclasses are provided for reading and |
| 302 | writing . |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 303 | |
| 304 | .. class:: MozillaCookieJar(filename, delayload=None, policy=None) |
| 305 | |
| 306 | A :class:`FileCookieJar` that can load from and save cookies to disk in the |
| 307 | Mozilla ``cookies.txt`` file format (which is also used by the Lynx and Netscape |
| 308 | browsers). |
| 309 | |
| 310 | .. note:: |
| 311 | |
| 312 | This loses information about RFC 2965 cookies, and also about newer or |
| 313 | non-standard cookie-attributes such as ``port``. |
| 314 | |
| 315 | .. warning:: |
| 316 | |
| 317 | Back up your cookies before saving if you have cookies whose loss / corruption |
| 318 | would be inconvenient (there are some subtleties which may lead to slight |
| 319 | changes in the file over a load / save round-trip). |
| 320 | |
| 321 | Also note that cookies saved while Mozilla is running will get clobbered by |
| 322 | Mozilla. |
| 323 | |
| 324 | |
| 325 | .. class:: LWPCookieJar(filename, delayload=None, policy=None) |
| 326 | |
| 327 | A :class:`FileCookieJar` that can load from and save cookies to disk in format |
| 328 | compatible with the libwww-perl library's ``Set-Cookie3`` file format. This is |
| 329 | convenient if you want to store cookies in a human-readable file. |
| 330 | |
| 331 | |
| 332 | .. _cookie-policy-objects: |
| 333 | |
| 334 | CookiePolicy Objects |
| 335 | -------------------- |
| 336 | |
| 337 | Objects implementing the :class:`CookiePolicy` interface have the following |
| 338 | methods: |
| 339 | |
| 340 | |
| 341 | .. method:: CookiePolicy.set_ok(cookie, request) |
| 342 | |
| 343 | Return boolean value indicating whether cookie should be accepted from server. |
| 344 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 345 | *cookie* is a :class:`Cookie` instance. *request* is an object |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 346 | implementing the interface defined by the documentation for |
| 347 | :meth:`CookieJar.extract_cookies`. |
| 348 | |
| 349 | |
| 350 | .. method:: CookiePolicy.return_ok(cookie, request) |
| 351 | |
| 352 | Return boolean value indicating whether cookie should be returned to server. |
| 353 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 354 | *cookie* is a :class:`Cookie` instance. *request* is an object |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 355 | implementing the interface defined by the documentation for |
| 356 | :meth:`CookieJar.add_cookie_header`. |
| 357 | |
| 358 | |
| 359 | .. method:: CookiePolicy.domain_return_ok(domain, request) |
| 360 | |
| 361 | Return false if cookies should not be returned, given cookie domain. |
| 362 | |
| 363 | This method is an optimization. It removes the need for checking every cookie |
| 364 | with a particular domain (which might involve reading many files). Returning |
| 365 | true from :meth:`domain_return_ok` and :meth:`path_return_ok` leaves all the |
| 366 | work to :meth:`return_ok`. |
| 367 | |
| 368 | If :meth:`domain_return_ok` returns true for the cookie domain, |
| 369 | :meth:`path_return_ok` is called for the cookie path. Otherwise, |
| 370 | :meth:`path_return_ok` and :meth:`return_ok` are never called for that cookie |
| 371 | domain. If :meth:`path_return_ok` returns true, :meth:`return_ok` is called |
| 372 | with the :class:`Cookie` object itself for a full check. Otherwise, |
| 373 | :meth:`return_ok` is never called for that cookie path. |
| 374 | |
| 375 | Note that :meth:`domain_return_ok` is called for every *cookie* domain, not just |
| 376 | for the *request* domain. For example, the function might be called with both |
| 377 | ``".example.com"`` and ``"www.example.com"`` if the request domain is |
| 378 | ``"www.example.com"``. The same goes for :meth:`path_return_ok`. |
| 379 | |
| 380 | The *request* argument is as documented for :meth:`return_ok`. |
| 381 | |
| 382 | |
| 383 | .. method:: CookiePolicy.path_return_ok(path, request) |
| 384 | |
| 385 | Return false if cookies should not be returned, given cookie path. |
| 386 | |
| 387 | See the documentation for :meth:`domain_return_ok`. |
| 388 | |
| 389 | In addition to implementing the methods above, implementations of the |
| 390 | :class:`CookiePolicy` interface must also supply the following attributes, |
| 391 | indicating which protocols should be used, and how. All of these attributes may |
| 392 | be assigned to. |
| 393 | |
| 394 | |
| 395 | .. attribute:: CookiePolicy.netscape |
| 396 | |
| 397 | Implement Netscape protocol. |
| 398 | |
| 399 | |
| 400 | .. attribute:: CookiePolicy.rfc2965 |
| 401 | |
| 402 | Implement RFC 2965 protocol. |
| 403 | |
| 404 | |
| 405 | .. attribute:: CookiePolicy.hide_cookie2 |
| 406 | |
| 407 | Don't add :mailheader:`Cookie2` header to requests (the presence of this header |
| 408 | indicates to the server that we understand RFC 2965 cookies). |
| 409 | |
| 410 | The most useful way to define a :class:`CookiePolicy` class is by subclassing |
| 411 | from :class:`DefaultCookiePolicy` and overriding some or all of the methods |
| 412 | above. :class:`CookiePolicy` itself may be used as a 'null policy' to allow |
| 413 | setting and receiving any and all cookies (this is unlikely to be useful). |
| 414 | |
| 415 | |
| 416 | .. _default-cookie-policy-objects: |
| 417 | |
| 418 | DefaultCookiePolicy Objects |
| 419 | --------------------------- |
| 420 | |
| 421 | Implements the standard rules for accepting and returning cookies. |
| 422 | |
| 423 | Both RFC 2965 and Netscape cookies are covered. RFC 2965 handling is switched |
| 424 | off by default. |
| 425 | |
| 426 | The easiest way to provide your own policy is to override this class and call |
| 427 | its methods in your overridden implementations before adding your own additional |
| 428 | checks:: |
| 429 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 430 | import http.cookiejar |
| 431 | class MyCookiePolicy(http.cookiejar.DefaultCookiePolicy): |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 432 | def set_ok(self, cookie, request): |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 433 | if not http.cookiejar.DefaultCookiePolicy.set_ok(self, cookie, request): |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 434 | return False |
| 435 | if i_dont_want_to_store_this_cookie(cookie): |
| 436 | return False |
| 437 | return True |
| 438 | |
| 439 | In addition to the features required to implement the :class:`CookiePolicy` |
| 440 | interface, this class allows you to block and allow domains from setting and |
| 441 | receiving cookies. There are also some strictness switches that allow you to |
| 442 | tighten up the rather loose Netscape protocol rules a little bit (at the cost of |
| 443 | blocking some benign cookies). |
| 444 | |
| 445 | A domain blacklist and whitelist is provided (both off by default). Only domains |
| 446 | not in the blacklist and present in the whitelist (if the whitelist is active) |
| 447 | participate in cookie setting and returning. Use the *blocked_domains* |
| 448 | constructor argument, and :meth:`blocked_domains` and |
| 449 | :meth:`set_blocked_domains` methods (and the corresponding argument and methods |
| 450 | for *allowed_domains*). If you set a whitelist, you can turn it off again by |
| 451 | setting it to :const:`None`. |
| 452 | |
| 453 | Domains in block or allow lists that do not start with a dot must equal the |
| 454 | cookie domain to be matched. For example, ``"example.com"`` matches a blacklist |
| 455 | entry of ``"example.com"``, but ``"www.example.com"`` does not. Domains that do |
| 456 | start with a dot are matched by more specific domains too. For example, both |
| 457 | ``"www.example.com"`` and ``"www.coyote.example.com"`` match ``".example.com"`` |
| 458 | (but ``"example.com"`` itself does not). IP addresses are an exception, and |
| 459 | must match exactly. For example, if blocked_domains contains ``"192.168.1.2"`` |
| 460 | and ``".168.1.2"``, 192.168.1.2 is blocked, but 193.168.1.2 is not. |
| 461 | |
| 462 | :class:`DefaultCookiePolicy` implements the following additional methods: |
| 463 | |
| 464 | |
| 465 | .. method:: DefaultCookiePolicy.blocked_domains() |
| 466 | |
| 467 | Return the sequence of blocked domains (as a tuple). |
| 468 | |
| 469 | |
| 470 | .. method:: DefaultCookiePolicy.set_blocked_domains(blocked_domains) |
| 471 | |
| 472 | Set the sequence of blocked domains. |
| 473 | |
| 474 | |
| 475 | .. method:: DefaultCookiePolicy.is_blocked(domain) |
| 476 | |
| 477 | Return whether *domain* is on the blacklist for setting or receiving cookies. |
| 478 | |
| 479 | |
| 480 | .. method:: DefaultCookiePolicy.allowed_domains() |
| 481 | |
| 482 | Return :const:`None`, or the sequence of allowed domains (as a tuple). |
| 483 | |
| 484 | |
| 485 | .. method:: DefaultCookiePolicy.set_allowed_domains(allowed_domains) |
| 486 | |
| 487 | Set the sequence of allowed domains, or :const:`None`. |
| 488 | |
| 489 | |
| 490 | .. method:: DefaultCookiePolicy.is_not_allowed(domain) |
| 491 | |
| 492 | Return whether *domain* is not on the whitelist for setting or receiving |
| 493 | cookies. |
| 494 | |
| 495 | :class:`DefaultCookiePolicy` instances have the following attributes, which are |
| 496 | all initialised from the constructor arguments of the same name, and which may |
| 497 | all be assigned to. |
| 498 | |
| 499 | |
| 500 | .. attribute:: DefaultCookiePolicy.rfc2109_as_netscape |
| 501 | |
| 502 | If true, request that the :class:`CookieJar` instance downgrade RFC 2109 cookies |
| 503 | (ie. cookies received in a :mailheader:`Set-Cookie` header with a version |
| 504 | cookie-attribute of 1) to Netscape cookies by setting the version attribute of |
| 505 | the :class:`Cookie` instance to 0. The default value is :const:`None`, in which |
| 506 | case RFC 2109 cookies are downgraded if and only if RFC 2965 handling is turned |
| 507 | off. Therefore, RFC 2109 cookies are downgraded by default. |
| 508 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 509 | |
| 510 | General strictness switches: |
| 511 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 512 | .. attribute:: DefaultCookiePolicy.strict_domain |
| 513 | |
| 514 | Don't allow sites to set two-component domains with country-code top-level |
| 515 | domains like ``.co.uk``, ``.gov.uk``, ``.co.nz``.etc. This is far from perfect |
| 516 | and isn't guaranteed to work! |
| 517 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 518 | |
Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 519 | RFC 2965 protocol strictness switches: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 520 | |
| 521 | .. attribute:: DefaultCookiePolicy.strict_rfc2965_unverifiable |
| 522 | |
| 523 | Follow RFC 2965 rules on unverifiable transactions (usually, an unverifiable |
| 524 | transaction is one resulting from a redirect or a request for an image hosted on |
| 525 | another site). If this is false, cookies are *never* blocked on the basis of |
| 526 | verifiability |
| 527 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 528 | |
Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 529 | Netscape protocol strictness switches: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 530 | |
| 531 | .. attribute:: DefaultCookiePolicy.strict_ns_unverifiable |
| 532 | |
| 533 | apply RFC 2965 rules on unverifiable transactions even to Netscape cookies |
| 534 | |
| 535 | |
| 536 | .. attribute:: DefaultCookiePolicy.strict_ns_domain |
| 537 | |
| 538 | Flags indicating how strict to be with domain-matching rules for Netscape |
| 539 | cookies. See below for acceptable values. |
| 540 | |
| 541 | |
| 542 | .. attribute:: DefaultCookiePolicy.strict_ns_set_initial_dollar |
| 543 | |
| 544 | Ignore cookies in Set-Cookie: headers that have names starting with ``'$'``. |
| 545 | |
| 546 | |
| 547 | .. attribute:: DefaultCookiePolicy.strict_ns_set_path |
| 548 | |
| 549 | Don't allow setting cookies whose path doesn't path-match request URI. |
| 550 | |
| 551 | :attr:`strict_ns_domain` is a collection of flags. Its value is constructed by |
| 552 | or-ing together (for example, ``DomainStrictNoDots|DomainStrictNonDomain`` means |
| 553 | both flags are set). |
| 554 | |
| 555 | |
| 556 | .. attribute:: DefaultCookiePolicy.DomainStrictNoDots |
| 557 | |
| 558 | When setting cookies, the 'host prefix' must not contain a dot (eg. |
| 559 | ``www.foo.bar.com`` can't set a cookie for ``.bar.com``, because ``www.foo`` |
| 560 | contains a dot). |
| 561 | |
| 562 | |
| 563 | .. attribute:: DefaultCookiePolicy.DomainStrictNonDomain |
| 564 | |
| 565 | Cookies that did not explicitly specify a ``domain`` cookie-attribute can only |
| 566 | be returned to a domain equal to the domain that set the cookie (eg. |
| 567 | ``spam.example.com`` won't be returned cookies from ``example.com`` that had no |
| 568 | ``domain`` cookie-attribute). |
| 569 | |
| 570 | |
| 571 | .. attribute:: DefaultCookiePolicy.DomainRFC2965Match |
| 572 | |
| 573 | When setting cookies, require a full RFC 2965 domain-match. |
| 574 | |
| 575 | The following attributes are provided for convenience, and are the most useful |
| 576 | combinations of the above flags: |
| 577 | |
| 578 | |
| 579 | .. attribute:: DefaultCookiePolicy.DomainLiberal |
| 580 | |
| 581 | Equivalent to 0 (ie. all of the above Netscape domain strictness flags switched |
| 582 | off). |
| 583 | |
| 584 | |
| 585 | .. attribute:: DefaultCookiePolicy.DomainStrict |
| 586 | |
| 587 | Equivalent to ``DomainStrictNoDots|DomainStrictNonDomain``. |
| 588 | |
| 589 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 590 | Cookie Objects |
| 591 | -------------- |
| 592 | |
| 593 | :class:`Cookie` instances have Python attributes roughly corresponding to the |
| 594 | standard cookie-attributes specified in the various cookie standards. The |
| 595 | correspondence is not one-to-one, because there are complicated rules for |
| 596 | assigning default values, because the ``max-age`` and ``expires`` |
| 597 | cookie-attributes contain equivalent information, and because RFC 2109 cookies |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 598 | may be 'downgraded' by :mod:`http.cookiejar` from version 1 to version 0 (Netscape) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 599 | cookies. |
| 600 | |
| 601 | Assignment to these attributes should not be necessary other than in rare |
| 602 | circumstances in a :class:`CookiePolicy` method. The class does not enforce |
| 603 | internal consistency, so you should know what you're doing if you do that. |
| 604 | |
| 605 | |
| 606 | .. attribute:: Cookie.version |
| 607 | |
| 608 | Integer or :const:`None`. Netscape cookies have :attr:`version` 0. RFC 2965 and |
| 609 | RFC 2109 cookies have a ``version`` cookie-attribute of 1. However, note that |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 610 | :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in which |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 611 | case :attr:`version` is 0. |
| 612 | |
| 613 | |
| 614 | .. attribute:: Cookie.name |
| 615 | |
| 616 | Cookie name (a string). |
| 617 | |
| 618 | |
| 619 | .. attribute:: Cookie.value |
| 620 | |
| 621 | Cookie value (a string), or :const:`None`. |
| 622 | |
| 623 | |
| 624 | .. attribute:: Cookie.port |
| 625 | |
| 626 | String representing a port or a set of ports (eg. '80', or '80,8080'), or |
| 627 | :const:`None`. |
| 628 | |
| 629 | |
| 630 | .. attribute:: Cookie.path |
| 631 | |
| 632 | Cookie path (a string, eg. ``'/acme/rocket_launchers'``). |
| 633 | |
| 634 | |
| 635 | .. attribute:: Cookie.secure |
| 636 | |
| 637 | True if cookie should only be returned over a secure connection. |
| 638 | |
| 639 | |
| 640 | .. attribute:: Cookie.expires |
| 641 | |
| 642 | Integer expiry date in seconds since epoch, or :const:`None`. See also the |
| 643 | :meth:`is_expired` method. |
| 644 | |
| 645 | |
| 646 | .. attribute:: Cookie.discard |
| 647 | |
| 648 | True if this is a session cookie. |
| 649 | |
| 650 | |
| 651 | .. attribute:: Cookie.comment |
| 652 | |
| 653 | String comment from the server explaining the function of this cookie, or |
| 654 | :const:`None`. |
| 655 | |
| 656 | |
| 657 | .. attribute:: Cookie.comment_url |
| 658 | |
| 659 | URL linking to a comment from the server explaining the function of this cookie, |
| 660 | or :const:`None`. |
| 661 | |
| 662 | |
| 663 | .. attribute:: Cookie.rfc2109 |
| 664 | |
| 665 | True if this cookie was received as an RFC 2109 cookie (ie. the cookie |
| 666 | arrived in a :mailheader:`Set-Cookie` header, and the value of the Version |
| 667 | cookie-attribute in that header was 1). This attribute is provided because |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 668 | :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 669 | which case :attr:`version` is 0. |
| 670 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 671 | |
| 672 | .. attribute:: Cookie.port_specified |
| 673 | |
| 674 | True if a port or set of ports was explicitly specified by the server (in the |
| 675 | :mailheader:`Set-Cookie` / :mailheader:`Set-Cookie2` header). |
| 676 | |
| 677 | |
| 678 | .. attribute:: Cookie.domain_specified |
| 679 | |
| 680 | True if a domain was explicitly specified by the server. |
| 681 | |
| 682 | |
| 683 | .. attribute:: Cookie.domain_initial_dot |
| 684 | |
| 685 | True if the domain explicitly specified by the server began with a dot |
| 686 | (``'.'``). |
| 687 | |
| 688 | Cookies may have additional non-standard cookie-attributes. These may be |
| 689 | accessed using the following methods: |
| 690 | |
| 691 | |
| 692 | .. method:: Cookie.has_nonstandard_attr(name) |
| 693 | |
| 694 | Return true if cookie has the named cookie-attribute. |
| 695 | |
| 696 | |
| 697 | .. method:: Cookie.get_nonstandard_attr(name, default=None) |
| 698 | |
| 699 | If cookie has the named cookie-attribute, return its value. Otherwise, return |
| 700 | *default*. |
| 701 | |
| 702 | |
| 703 | .. method:: Cookie.set_nonstandard_attr(name, value) |
| 704 | |
| 705 | Set the value of the named cookie-attribute. |
| 706 | |
| 707 | The :class:`Cookie` class also defines the following method: |
| 708 | |
| 709 | |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 710 | .. method:: Cookie.is_expired([now=None]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 711 | |
| 712 | True if cookie has passed the time at which the server requested it should |
| 713 | expire. If *now* is given (in seconds since the epoch), return whether the |
| 714 | cookie has expired at the specified time. |
| 715 | |
| 716 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 717 | Examples |
| 718 | -------- |
| 719 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 720 | The first example shows the most common usage of :mod:`http.cookiejar`:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 721 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 722 | import http.cookiejar, urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 723 | cj = http.cookiejar.CookieJar() |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 724 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 725 | r = opener.open("http://example.com/") |
| 726 | |
| 727 | This example illustrates how to open a URL using your Netscape, Mozilla, or Lynx |
| 728 | cookies (assumes Unix/Netscape convention for location of the cookies file):: |
| 729 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 730 | import os, http.cookiejar, urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 731 | cj = http.cookiejar.MozillaCookieJar() |
Éric Araujo | 4dcf502 | 2011-03-25 20:31:50 +0100 | [diff] [blame] | 732 | cj.load(os.path.join(os.path.expanduser("~"), ".netscape", "cookies.txt")) |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 733 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 734 | r = opener.open("http://example.com/") |
| 735 | |
| 736 | The next example illustrates the use of :class:`DefaultCookiePolicy`. Turn on |
| 737 | RFC 2965 cookies, be more strict about domains when setting and returning |
| 738 | Netscape cookies, and block some domains from setting cookies or having them |
| 739 | returned:: |
| 740 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 741 | import urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 742 | from http.cookiejar import CookieJar, DefaultCookiePolicy |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 743 | policy = DefaultCookiePolicy( |
| 744 | rfc2965=True, strict_ns_domain=Policy.DomainStrict, |
| 745 | blocked_domains=["ads.net", ".ads.net"]) |
| 746 | cj = CookieJar(policy) |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 747 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 748 | r = opener.open("http://example.com/") |
| 749 | |