Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 1 | :mod:`http.cookiejar` --- Cookie handling for HTTP clients |
| 2 | ========================================================== |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 4 | .. module:: http.cookiejar |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 | :synopsis: Classes for automatic handling of HTTP cookies. |
| 6 | .. moduleauthor:: John J. Lee <jjl@pobox.com> |
| 7 | .. sectionauthor:: John J. Lee <jjl@pobox.com> |
| 8 | |
| 9 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 10 | The :mod:`http.cookiejar` module defines classes for automatic handling of HTTP |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 | cookies. It is useful for accessing web sites that require small pieces of data |
| 12 | -- :dfn:`cookies` -- to be set on the client machine by an HTTP response from a |
| 13 | web server, and then returned to the server in later HTTP requests. |
| 14 | |
| 15 | Both the regular Netscape cookie protocol and the protocol defined by |
| 16 | :rfc:`2965` are handled. RFC 2965 handling is switched off by default. |
| 17 | :rfc:`2109` cookies are parsed as Netscape cookies and subsequently treated |
| 18 | either as Netscape or RFC 2965 cookies according to the 'policy' in effect. |
| 19 | Note that the great majority of cookies on the Internet are Netscape cookies. |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 20 | :mod:`http.cookiejar` attempts to follow the de-facto Netscape cookie protocol (which |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 21 | differs substantially from that set out in the original Netscape specification), |
| 22 | including taking note of the ``max-age`` and ``port`` cookie-attributes |
| 23 | introduced with RFC 2965. |
| 24 | |
| 25 | .. note:: |
| 26 | |
| 27 | The various named parameters found in :mailheader:`Set-Cookie` and |
| 28 | :mailheader:`Set-Cookie2` headers (eg. ``domain`` and ``expires``) are |
| 29 | conventionally referred to as :dfn:`attributes`. To distinguish them from |
| 30 | Python attributes, the documentation for this module uses the term |
| 31 | :dfn:`cookie-attribute` instead. |
| 32 | |
| 33 | |
| 34 | The module defines the following exception: |
| 35 | |
| 36 | |
| 37 | .. exception:: LoadError |
| 38 | |
| 39 | Instances of :class:`FileCookieJar` raise this exception on failure to load |
Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 40 | cookies from a file. :exc:`LoadError` is a subclass of :exc:`IOError`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 41 | |
| 42 | |
| 43 | The following classes are provided: |
| 44 | |
| 45 | |
| 46 | .. class:: CookieJar(policy=None) |
| 47 | |
| 48 | *policy* is an object implementing the :class:`CookiePolicy` interface. |
| 49 | |
| 50 | The :class:`CookieJar` class stores HTTP cookies. It extracts cookies from HTTP |
| 51 | requests, and returns them in HTTP responses. :class:`CookieJar` instances |
| 52 | automatically expire contained cookies when necessary. Subclasses are also |
| 53 | responsible for storing and retrieving cookies from a file or database. |
| 54 | |
| 55 | |
| 56 | .. class:: FileCookieJar(filename, delayload=None, policy=None) |
| 57 | |
| 58 | *policy* is an object implementing the :class:`CookiePolicy` interface. For the |
| 59 | other arguments, see the documentation for the corresponding attributes. |
| 60 | |
| 61 | A :class:`CookieJar` which can load cookies from, and perhaps save cookies to, a |
| 62 | file on disk. Cookies are **NOT** loaded from the named file until either the |
| 63 | :meth:`load` or :meth:`revert` method is called. Subclasses of this class are |
| 64 | documented in section :ref:`file-cookie-jar-classes`. |
| 65 | |
| 66 | |
| 67 | .. class:: CookiePolicy() |
| 68 | |
| 69 | This class is responsible for deciding whether each cookie should be accepted |
| 70 | from / returned to the server. |
| 71 | |
| 72 | |
| 73 | .. class:: DefaultCookiePolicy( blocked_domains=None, allowed_domains=None, netscape=True, rfc2965=False, rfc2109_as_netscape=None, hide_cookie2=False, strict_domain=False, strict_rfc2965_unverifiable=True, strict_ns_unverifiable=False, strict_ns_domain=DefaultCookiePolicy.DomainLiberal, strict_ns_set_initial_dollar=False, strict_ns_set_path=False ) |
| 74 | |
| 75 | Constructor arguments should be passed as keyword arguments only. |
| 76 | *blocked_domains* is a sequence of domain names that we never accept cookies |
| 77 | from, nor return cookies to. *allowed_domains* if not :const:`None`, this is a |
| 78 | sequence of the only domains for which we accept and return cookies. For all |
| 79 | other arguments, see the documentation for :class:`CookiePolicy` and |
| 80 | :class:`DefaultCookiePolicy` objects. |
| 81 | |
| 82 | :class:`DefaultCookiePolicy` implements the standard accept / reject rules for |
| 83 | Netscape and RFC 2965 cookies. By default, RFC 2109 cookies (ie. cookies |
| 84 | received in a :mailheader:`Set-Cookie` header with a version cookie-attribute of |
| 85 | 1) are treated according to the RFC 2965 rules. However, if RFC 2965 handling |
| 86 | is turned off or :attr:`rfc2109_as_netscape` is True, RFC 2109 cookies are |
| 87 | 'downgraded' by the :class:`CookieJar` instance to Netscape cookies, by |
| 88 | setting the :attr:`version` attribute of the :class:`Cookie` instance to 0. |
| 89 | :class:`DefaultCookiePolicy` also provides some parameters to allow some |
| 90 | fine-tuning of policy. |
| 91 | |
| 92 | |
| 93 | .. class:: Cookie() |
| 94 | |
| 95 | This class represents Netscape, RFC 2109 and RFC 2965 cookies. It is not |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 96 | expected that users of :mod:`http.cookiejar` construct their own :class:`Cookie` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 97 | instances. Instead, if necessary, call :meth:`make_cookies` on a |
| 98 | :class:`CookieJar` instance. |
| 99 | |
| 100 | |
| 101 | .. seealso:: |
| 102 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 103 | Module :mod:`urllib.request` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 104 | URL opening with automatic cookie handling. |
| 105 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 106 | Module :mod:`http.cookies` |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 107 | HTTP cookie classes, principally useful for server-side code. The |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 108 | :mod:`http.cookiejar` and :mod:`http.cookies` modules do not depend on each |
| 109 | other. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 110 | |
Christian Heimes | dd15f6c | 2008-03-16 00:07:10 +0000 | [diff] [blame] | 111 | http://wp.netscape.com/newsref/std/cookie_spec.html |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 112 | The specification of the original Netscape cookie protocol. Though this is |
| 113 | still the dominant protocol, the 'Netscape cookie protocol' implemented by all |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 114 | the major browsers (and :mod:`http.cookiejar`) only bears a passing resemblance to |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 115 | the one sketched out in ``cookie_spec.html``. |
| 116 | |
| 117 | :rfc:`2109` - HTTP State Management Mechanism |
| 118 | Obsoleted by RFC 2965. Uses :mailheader:`Set-Cookie` with version=1. |
| 119 | |
| 120 | :rfc:`2965` - HTTP State Management Mechanism |
| 121 | The Netscape protocol with the bugs fixed. Uses :mailheader:`Set-Cookie2` in |
| 122 | place of :mailheader:`Set-Cookie`. Not widely used. |
| 123 | |
| 124 | http://kristol.org/cookie/errata.html |
| 125 | Unfinished errata to RFC 2965. |
| 126 | |
| 127 | :rfc:`2964` - Use of HTTP State Management |
| 128 | |
| 129 | .. _cookie-jar-objects: |
| 130 | |
| 131 | CookieJar and FileCookieJar Objects |
| 132 | ----------------------------------- |
| 133 | |
Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 134 | :class:`CookieJar` objects support the :term:`iterator` protocol for iterating over |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 | contained :class:`Cookie` objects. |
| 136 | |
| 137 | :class:`CookieJar` has the following methods: |
| 138 | |
| 139 | |
| 140 | .. method:: CookieJar.add_cookie_header(request) |
| 141 | |
| 142 | Add correct :mailheader:`Cookie` header to *request*. |
| 143 | |
| 144 | If policy allows (ie. the :attr:`rfc2965` and :attr:`hide_cookie2` attributes of |
| 145 | the :class:`CookieJar`'s :class:`CookiePolicy` instance are true and false |
| 146 | respectively), the :mailheader:`Cookie2` header is also added when appropriate. |
| 147 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 148 | The *request* object (usually a :class:`urllib.request..Request` instance) |
| 149 | must support the methods :meth:`get_full_url`, :meth:`get_host`, |
| 150 | :meth:`get_type`, :meth:`unverifiable`, :meth:`get_origin_req_host`, |
| 151 | :meth:`has_header`, :meth:`get_header`, :meth:`header_items`, and |
| 152 | :meth:`add_unredirected_header`, as documented by :mod:`urllib.request`. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 153 | |
| 154 | |
| 155 | .. method:: CookieJar.extract_cookies(response, request) |
| 156 | |
| 157 | Extract cookies from HTTP *response* and store them in the :class:`CookieJar`, |
| 158 | where allowed by policy. |
| 159 | |
| 160 | The :class:`CookieJar` will look for allowable :mailheader:`Set-Cookie` and |
| 161 | :mailheader:`Set-Cookie2` headers in the *response* argument, and store cookies |
| 162 | as appropriate (subject to the :meth:`CookiePolicy.set_ok` method's approval). |
| 163 | |
Georg Brandl | 83e9f4c | 2008-06-12 18:52:31 +0000 | [diff] [blame] | 164 | The *response* object (usually the result of a call to |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 165 | :meth:`urllib.request.urlopen`, or similar) should support an :meth:`info` |
| 166 | method, which returns a :class:`email.message.Message` instance. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 167 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 168 | The *request* object (usually a :class:`urllib.request.Request` instance) |
| 169 | must support the methods :meth:`get_full_url`, :meth:`get_host`, |
| 170 | :meth:`unverifiable`, and :meth:`get_origin_req_host`, as documented by |
| 171 | :mod:`urllib.request`. The request is used to set default values for |
| 172 | cookie-attributes as well as for checking that the cookie is allowed to be |
| 173 | set. |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 174 | |
| 175 | |
| 176 | .. method:: CookieJar.set_policy(policy) |
| 177 | |
| 178 | Set the :class:`CookiePolicy` instance to be used. |
| 179 | |
| 180 | |
| 181 | .. method:: CookieJar.make_cookies(response, request) |
| 182 | |
| 183 | Return sequence of :class:`Cookie` objects extracted from *response* object. |
| 184 | |
| 185 | See the documentation for :meth:`extract_cookies` for the interfaces required of |
| 186 | the *response* and *request* arguments. |
| 187 | |
| 188 | |
| 189 | .. method:: CookieJar.set_cookie_if_ok(cookie, request) |
| 190 | |
| 191 | Set a :class:`Cookie` if policy says it's OK to do so. |
| 192 | |
| 193 | |
| 194 | .. method:: CookieJar.set_cookie(cookie) |
| 195 | |
| 196 | Set a :class:`Cookie`, without checking with policy to see whether or not it |
| 197 | should be set. |
| 198 | |
| 199 | |
| 200 | .. method:: CookieJar.clear([domain[, path[, name]]]) |
| 201 | |
| 202 | Clear some cookies. |
| 203 | |
| 204 | If invoked without arguments, clear all cookies. If given a single argument, |
| 205 | only cookies belonging to that *domain* will be removed. If given two arguments, |
| 206 | cookies belonging to the specified *domain* and URL *path* are removed. If |
| 207 | given three arguments, then the cookie with the specified *domain*, *path* and |
| 208 | *name* is removed. |
| 209 | |
| 210 | Raises :exc:`KeyError` if no matching cookie exists. |
| 211 | |
| 212 | |
| 213 | .. method:: CookieJar.clear_session_cookies() |
| 214 | |
| 215 | Discard all session cookies. |
| 216 | |
| 217 | Discards all contained cookies that have a true :attr:`discard` attribute |
| 218 | (usually because they had either no ``max-age`` or ``expires`` cookie-attribute, |
| 219 | or an explicit ``discard`` cookie-attribute). For interactive browsers, the end |
| 220 | of a session usually corresponds to closing the browser window. |
| 221 | |
| 222 | Note that the :meth:`save` method won't save session cookies anyway, unless you |
| 223 | ask otherwise by passing a true *ignore_discard* argument. |
| 224 | |
| 225 | :class:`FileCookieJar` implements the following additional methods: |
| 226 | |
| 227 | |
| 228 | .. method:: FileCookieJar.save(filename=None, ignore_discard=False, ignore_expires=False) |
| 229 | |
| 230 | Save cookies to a file. |
| 231 | |
| 232 | This base class raises :exc:`NotImplementedError`. Subclasses may leave this |
| 233 | method unimplemented. |
| 234 | |
| 235 | *filename* is the name of file in which to save cookies. If *filename* is not |
| 236 | specified, :attr:`self.filename` is used (whose default is the value passed to |
| 237 | the constructor, if any); if :attr:`self.filename` is :const:`None`, |
| 238 | :exc:`ValueError` is raised. |
| 239 | |
| 240 | *ignore_discard*: save even cookies set to be discarded. *ignore_expires*: save |
| 241 | even cookies that have expired |
| 242 | |
| 243 | The file is overwritten if it already exists, thus wiping all the cookies it |
| 244 | contains. Saved cookies can be restored later using the :meth:`load` or |
| 245 | :meth:`revert` methods. |
| 246 | |
| 247 | |
| 248 | .. method:: FileCookieJar.load(filename=None, ignore_discard=False, ignore_expires=False) |
| 249 | |
| 250 | Load cookies from a file. |
| 251 | |
| 252 | Old cookies are kept unless overwritten by newly loaded ones. |
| 253 | |
| 254 | Arguments are as for :meth:`save`. |
| 255 | |
| 256 | The named file must be in the format understood by the class, or |
| 257 | :exc:`LoadError` will be raised. Also, :exc:`IOError` may be raised, for |
| 258 | example if the file does not exist. |
| 259 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 260 | |
| 261 | .. method:: FileCookieJar.revert(filename=None, ignore_discard=False, ignore_expires=False) |
| 262 | |
| 263 | Clear all cookies and reload cookies from a saved file. |
| 264 | |
| 265 | :meth:`revert` can raise the same exceptions as :meth:`load`. If there is a |
| 266 | failure, the object's state will not be altered. |
| 267 | |
| 268 | :class:`FileCookieJar` instances have the following public attributes: |
| 269 | |
| 270 | |
| 271 | .. attribute:: FileCookieJar.filename |
| 272 | |
| 273 | Filename of default file in which to keep cookies. This attribute may be |
| 274 | assigned to. |
| 275 | |
| 276 | |
| 277 | .. attribute:: FileCookieJar.delayload |
| 278 | |
| 279 | If true, load cookies lazily from disk. This attribute should not be assigned |
| 280 | to. This is only a hint, since this only affects performance, not behaviour |
| 281 | (unless the cookies on disk are changing). A :class:`CookieJar` object may |
| 282 | ignore it. None of the :class:`FileCookieJar` classes included in the standard |
| 283 | library lazily loads cookies. |
| 284 | |
| 285 | |
| 286 | .. _file-cookie-jar-classes: |
| 287 | |
| 288 | FileCookieJar subclasses and co-operation with web browsers |
| 289 | ----------------------------------------------------------- |
| 290 | |
Senthil Kumaran | aba088e | 2010-07-11 05:01:52 +0000 | [diff] [blame] | 291 | The following :class:`CookieJar` subclasses are provided for reading and |
| 292 | writing . |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 293 | |
| 294 | .. class:: MozillaCookieJar(filename, delayload=None, policy=None) |
| 295 | |
| 296 | A :class:`FileCookieJar` that can load from and save cookies to disk in the |
| 297 | Mozilla ``cookies.txt`` file format (which is also used by the Lynx and Netscape |
| 298 | browsers). |
| 299 | |
| 300 | .. note:: |
| 301 | |
| 302 | This loses information about RFC 2965 cookies, and also about newer or |
| 303 | non-standard cookie-attributes such as ``port``. |
| 304 | |
| 305 | .. warning:: |
| 306 | |
| 307 | Back up your cookies before saving if you have cookies whose loss / corruption |
| 308 | would be inconvenient (there are some subtleties which may lead to slight |
| 309 | changes in the file over a load / save round-trip). |
| 310 | |
| 311 | Also note that cookies saved while Mozilla is running will get clobbered by |
| 312 | Mozilla. |
| 313 | |
| 314 | |
| 315 | .. class:: LWPCookieJar(filename, delayload=None, policy=None) |
| 316 | |
| 317 | A :class:`FileCookieJar` that can load from and save cookies to disk in format |
| 318 | compatible with the libwww-perl library's ``Set-Cookie3`` file format. This is |
| 319 | convenient if you want to store cookies in a human-readable file. |
| 320 | |
| 321 | |
| 322 | .. _cookie-policy-objects: |
| 323 | |
| 324 | CookiePolicy Objects |
| 325 | -------------------- |
| 326 | |
| 327 | Objects implementing the :class:`CookiePolicy` interface have the following |
| 328 | methods: |
| 329 | |
| 330 | |
| 331 | .. method:: CookiePolicy.set_ok(cookie, request) |
| 332 | |
| 333 | Return boolean value indicating whether cookie should be accepted from server. |
| 334 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 335 | *cookie* is a :class:`Cookie` instance. *request* is an object |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 336 | implementing the interface defined by the documentation for |
| 337 | :meth:`CookieJar.extract_cookies`. |
| 338 | |
| 339 | |
| 340 | .. method:: CookiePolicy.return_ok(cookie, request) |
| 341 | |
| 342 | Return boolean value indicating whether cookie should be returned to server. |
| 343 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 344 | *cookie* is a :class:`Cookie` instance. *request* is an object |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 345 | implementing the interface defined by the documentation for |
| 346 | :meth:`CookieJar.add_cookie_header`. |
| 347 | |
| 348 | |
| 349 | .. method:: CookiePolicy.domain_return_ok(domain, request) |
| 350 | |
| 351 | Return false if cookies should not be returned, given cookie domain. |
| 352 | |
| 353 | This method is an optimization. It removes the need for checking every cookie |
| 354 | with a particular domain (which might involve reading many files). Returning |
| 355 | true from :meth:`domain_return_ok` and :meth:`path_return_ok` leaves all the |
| 356 | work to :meth:`return_ok`. |
| 357 | |
| 358 | If :meth:`domain_return_ok` returns true for the cookie domain, |
| 359 | :meth:`path_return_ok` is called for the cookie path. Otherwise, |
| 360 | :meth:`path_return_ok` and :meth:`return_ok` are never called for that cookie |
| 361 | domain. If :meth:`path_return_ok` returns true, :meth:`return_ok` is called |
| 362 | with the :class:`Cookie` object itself for a full check. Otherwise, |
| 363 | :meth:`return_ok` is never called for that cookie path. |
| 364 | |
| 365 | Note that :meth:`domain_return_ok` is called for every *cookie* domain, not just |
| 366 | for the *request* domain. For example, the function might be called with both |
| 367 | ``".example.com"`` and ``"www.example.com"`` if the request domain is |
| 368 | ``"www.example.com"``. The same goes for :meth:`path_return_ok`. |
| 369 | |
| 370 | The *request* argument is as documented for :meth:`return_ok`. |
| 371 | |
| 372 | |
| 373 | .. method:: CookiePolicy.path_return_ok(path, request) |
| 374 | |
| 375 | Return false if cookies should not be returned, given cookie path. |
| 376 | |
| 377 | See the documentation for :meth:`domain_return_ok`. |
| 378 | |
| 379 | In addition to implementing the methods above, implementations of the |
| 380 | :class:`CookiePolicy` interface must also supply the following attributes, |
| 381 | indicating which protocols should be used, and how. All of these attributes may |
| 382 | be assigned to. |
| 383 | |
| 384 | |
| 385 | .. attribute:: CookiePolicy.netscape |
| 386 | |
| 387 | Implement Netscape protocol. |
| 388 | |
| 389 | |
| 390 | .. attribute:: CookiePolicy.rfc2965 |
| 391 | |
| 392 | Implement RFC 2965 protocol. |
| 393 | |
| 394 | |
| 395 | .. attribute:: CookiePolicy.hide_cookie2 |
| 396 | |
| 397 | Don't add :mailheader:`Cookie2` header to requests (the presence of this header |
| 398 | indicates to the server that we understand RFC 2965 cookies). |
| 399 | |
| 400 | The most useful way to define a :class:`CookiePolicy` class is by subclassing |
| 401 | from :class:`DefaultCookiePolicy` and overriding some or all of the methods |
| 402 | above. :class:`CookiePolicy` itself may be used as a 'null policy' to allow |
| 403 | setting and receiving any and all cookies (this is unlikely to be useful). |
| 404 | |
| 405 | |
| 406 | .. _default-cookie-policy-objects: |
| 407 | |
| 408 | DefaultCookiePolicy Objects |
| 409 | --------------------------- |
| 410 | |
| 411 | Implements the standard rules for accepting and returning cookies. |
| 412 | |
| 413 | Both RFC 2965 and Netscape cookies are covered. RFC 2965 handling is switched |
| 414 | off by default. |
| 415 | |
| 416 | The easiest way to provide your own policy is to override this class and call |
| 417 | its methods in your overridden implementations before adding your own additional |
| 418 | checks:: |
| 419 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 420 | import http.cookiejar |
| 421 | class MyCookiePolicy(http.cookiejar.DefaultCookiePolicy): |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 422 | def set_ok(self, cookie, request): |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 423 | if not http.cookiejar.DefaultCookiePolicy.set_ok(self, cookie, request): |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 424 | return False |
| 425 | if i_dont_want_to_store_this_cookie(cookie): |
| 426 | return False |
| 427 | return True |
| 428 | |
| 429 | In addition to the features required to implement the :class:`CookiePolicy` |
| 430 | interface, this class allows you to block and allow domains from setting and |
| 431 | receiving cookies. There are also some strictness switches that allow you to |
| 432 | tighten up the rather loose Netscape protocol rules a little bit (at the cost of |
| 433 | blocking some benign cookies). |
| 434 | |
| 435 | A domain blacklist and whitelist is provided (both off by default). Only domains |
| 436 | not in the blacklist and present in the whitelist (if the whitelist is active) |
| 437 | participate in cookie setting and returning. Use the *blocked_domains* |
| 438 | constructor argument, and :meth:`blocked_domains` and |
| 439 | :meth:`set_blocked_domains` methods (and the corresponding argument and methods |
| 440 | for *allowed_domains*). If you set a whitelist, you can turn it off again by |
| 441 | setting it to :const:`None`. |
| 442 | |
| 443 | Domains in block or allow lists that do not start with a dot must equal the |
| 444 | cookie domain to be matched. For example, ``"example.com"`` matches a blacklist |
| 445 | entry of ``"example.com"``, but ``"www.example.com"`` does not. Domains that do |
| 446 | start with a dot are matched by more specific domains too. For example, both |
| 447 | ``"www.example.com"`` and ``"www.coyote.example.com"`` match ``".example.com"`` |
| 448 | (but ``"example.com"`` itself does not). IP addresses are an exception, and |
| 449 | must match exactly. For example, if blocked_domains contains ``"192.168.1.2"`` |
| 450 | and ``".168.1.2"``, 192.168.1.2 is blocked, but 193.168.1.2 is not. |
| 451 | |
| 452 | :class:`DefaultCookiePolicy` implements the following additional methods: |
| 453 | |
| 454 | |
| 455 | .. method:: DefaultCookiePolicy.blocked_domains() |
| 456 | |
| 457 | Return the sequence of blocked domains (as a tuple). |
| 458 | |
| 459 | |
| 460 | .. method:: DefaultCookiePolicy.set_blocked_domains(blocked_domains) |
| 461 | |
| 462 | Set the sequence of blocked domains. |
| 463 | |
| 464 | |
| 465 | .. method:: DefaultCookiePolicy.is_blocked(domain) |
| 466 | |
| 467 | Return whether *domain* is on the blacklist for setting or receiving cookies. |
| 468 | |
| 469 | |
| 470 | .. method:: DefaultCookiePolicy.allowed_domains() |
| 471 | |
| 472 | Return :const:`None`, or the sequence of allowed domains (as a tuple). |
| 473 | |
| 474 | |
| 475 | .. method:: DefaultCookiePolicy.set_allowed_domains(allowed_domains) |
| 476 | |
| 477 | Set the sequence of allowed domains, or :const:`None`. |
| 478 | |
| 479 | |
| 480 | .. method:: DefaultCookiePolicy.is_not_allowed(domain) |
| 481 | |
| 482 | Return whether *domain* is not on the whitelist for setting or receiving |
| 483 | cookies. |
| 484 | |
| 485 | :class:`DefaultCookiePolicy` instances have the following attributes, which are |
| 486 | all initialised from the constructor arguments of the same name, and which may |
| 487 | all be assigned to. |
| 488 | |
| 489 | |
| 490 | .. attribute:: DefaultCookiePolicy.rfc2109_as_netscape |
| 491 | |
| 492 | If true, request that the :class:`CookieJar` instance downgrade RFC 2109 cookies |
| 493 | (ie. cookies received in a :mailheader:`Set-Cookie` header with a version |
| 494 | cookie-attribute of 1) to Netscape cookies by setting the version attribute of |
| 495 | the :class:`Cookie` instance to 0. The default value is :const:`None`, in which |
| 496 | case RFC 2109 cookies are downgraded if and only if RFC 2965 handling is turned |
| 497 | off. Therefore, RFC 2109 cookies are downgraded by default. |
| 498 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 499 | |
| 500 | General strictness switches: |
| 501 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 502 | .. attribute:: DefaultCookiePolicy.strict_domain |
| 503 | |
| 504 | Don't allow sites to set two-component domains with country-code top-level |
| 505 | domains like ``.co.uk``, ``.gov.uk``, ``.co.nz``.etc. This is far from perfect |
| 506 | and isn't guaranteed to work! |
| 507 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 508 | |
Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 509 | RFC 2965 protocol strictness switches: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 510 | |
| 511 | .. attribute:: DefaultCookiePolicy.strict_rfc2965_unverifiable |
| 512 | |
| 513 | Follow RFC 2965 rules on unverifiable transactions (usually, an unverifiable |
| 514 | transaction is one resulting from a redirect or a request for an image hosted on |
| 515 | another site). If this is false, cookies are *never* blocked on the basis of |
| 516 | verifiability |
| 517 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 518 | |
Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 519 | Netscape protocol strictness switches: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 520 | |
| 521 | .. attribute:: DefaultCookiePolicy.strict_ns_unverifiable |
| 522 | |
| 523 | apply RFC 2965 rules on unverifiable transactions even to Netscape cookies |
| 524 | |
| 525 | |
| 526 | .. attribute:: DefaultCookiePolicy.strict_ns_domain |
| 527 | |
| 528 | Flags indicating how strict to be with domain-matching rules for Netscape |
| 529 | cookies. See below for acceptable values. |
| 530 | |
| 531 | |
| 532 | .. attribute:: DefaultCookiePolicy.strict_ns_set_initial_dollar |
| 533 | |
| 534 | Ignore cookies in Set-Cookie: headers that have names starting with ``'$'``. |
| 535 | |
| 536 | |
| 537 | .. attribute:: DefaultCookiePolicy.strict_ns_set_path |
| 538 | |
| 539 | Don't allow setting cookies whose path doesn't path-match request URI. |
| 540 | |
| 541 | :attr:`strict_ns_domain` is a collection of flags. Its value is constructed by |
| 542 | or-ing together (for example, ``DomainStrictNoDots|DomainStrictNonDomain`` means |
| 543 | both flags are set). |
| 544 | |
| 545 | |
| 546 | .. attribute:: DefaultCookiePolicy.DomainStrictNoDots |
| 547 | |
| 548 | When setting cookies, the 'host prefix' must not contain a dot (eg. |
| 549 | ``www.foo.bar.com`` can't set a cookie for ``.bar.com``, because ``www.foo`` |
| 550 | contains a dot). |
| 551 | |
| 552 | |
| 553 | .. attribute:: DefaultCookiePolicy.DomainStrictNonDomain |
| 554 | |
| 555 | Cookies that did not explicitly specify a ``domain`` cookie-attribute can only |
| 556 | be returned to a domain equal to the domain that set the cookie (eg. |
| 557 | ``spam.example.com`` won't be returned cookies from ``example.com`` that had no |
| 558 | ``domain`` cookie-attribute). |
| 559 | |
| 560 | |
| 561 | .. attribute:: DefaultCookiePolicy.DomainRFC2965Match |
| 562 | |
| 563 | When setting cookies, require a full RFC 2965 domain-match. |
| 564 | |
| 565 | The following attributes are provided for convenience, and are the most useful |
| 566 | combinations of the above flags: |
| 567 | |
| 568 | |
| 569 | .. attribute:: DefaultCookiePolicy.DomainLiberal |
| 570 | |
| 571 | Equivalent to 0 (ie. all of the above Netscape domain strictness flags switched |
| 572 | off). |
| 573 | |
| 574 | |
| 575 | .. attribute:: DefaultCookiePolicy.DomainStrict |
| 576 | |
| 577 | Equivalent to ``DomainStrictNoDots|DomainStrictNonDomain``. |
| 578 | |
| 579 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 580 | Cookie Objects |
| 581 | -------------- |
| 582 | |
| 583 | :class:`Cookie` instances have Python attributes roughly corresponding to the |
| 584 | standard cookie-attributes specified in the various cookie standards. The |
| 585 | correspondence is not one-to-one, because there are complicated rules for |
| 586 | assigning default values, because the ``max-age`` and ``expires`` |
| 587 | cookie-attributes contain equivalent information, and because RFC 2109 cookies |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 588 | may be 'downgraded' by :mod:`http.cookiejar` from version 1 to version 0 (Netscape) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 589 | cookies. |
| 590 | |
| 591 | Assignment to these attributes should not be necessary other than in rare |
| 592 | circumstances in a :class:`CookiePolicy` method. The class does not enforce |
| 593 | internal consistency, so you should know what you're doing if you do that. |
| 594 | |
| 595 | |
| 596 | .. attribute:: Cookie.version |
| 597 | |
| 598 | Integer or :const:`None`. Netscape cookies have :attr:`version` 0. RFC 2965 and |
| 599 | RFC 2109 cookies have a ``version`` cookie-attribute of 1. However, note that |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 600 | :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in which |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 601 | case :attr:`version` is 0. |
| 602 | |
| 603 | |
| 604 | .. attribute:: Cookie.name |
| 605 | |
| 606 | Cookie name (a string). |
| 607 | |
| 608 | |
| 609 | .. attribute:: Cookie.value |
| 610 | |
| 611 | Cookie value (a string), or :const:`None`. |
| 612 | |
| 613 | |
| 614 | .. attribute:: Cookie.port |
| 615 | |
| 616 | String representing a port or a set of ports (eg. '80', or '80,8080'), or |
| 617 | :const:`None`. |
| 618 | |
| 619 | |
| 620 | .. attribute:: Cookie.path |
| 621 | |
| 622 | Cookie path (a string, eg. ``'/acme/rocket_launchers'``). |
| 623 | |
| 624 | |
| 625 | .. attribute:: Cookie.secure |
| 626 | |
| 627 | True if cookie should only be returned over a secure connection. |
| 628 | |
| 629 | |
| 630 | .. attribute:: Cookie.expires |
| 631 | |
| 632 | Integer expiry date in seconds since epoch, or :const:`None`. See also the |
| 633 | :meth:`is_expired` method. |
| 634 | |
| 635 | |
| 636 | .. attribute:: Cookie.discard |
| 637 | |
| 638 | True if this is a session cookie. |
| 639 | |
| 640 | |
| 641 | .. attribute:: Cookie.comment |
| 642 | |
| 643 | String comment from the server explaining the function of this cookie, or |
| 644 | :const:`None`. |
| 645 | |
| 646 | |
| 647 | .. attribute:: Cookie.comment_url |
| 648 | |
| 649 | URL linking to a comment from the server explaining the function of this cookie, |
| 650 | or :const:`None`. |
| 651 | |
| 652 | |
| 653 | .. attribute:: Cookie.rfc2109 |
| 654 | |
| 655 | True if this cookie was received as an RFC 2109 cookie (ie. the cookie |
| 656 | arrived in a :mailheader:`Set-Cookie` header, and the value of the Version |
| 657 | cookie-attribute in that header was 1). This attribute is provided because |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 658 | :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 659 | which case :attr:`version` is 0. |
| 660 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 661 | |
| 662 | .. attribute:: Cookie.port_specified |
| 663 | |
| 664 | True if a port or set of ports was explicitly specified by the server (in the |
| 665 | :mailheader:`Set-Cookie` / :mailheader:`Set-Cookie2` header). |
| 666 | |
| 667 | |
| 668 | .. attribute:: Cookie.domain_specified |
| 669 | |
| 670 | True if a domain was explicitly specified by the server. |
| 671 | |
| 672 | |
| 673 | .. attribute:: Cookie.domain_initial_dot |
| 674 | |
| 675 | True if the domain explicitly specified by the server began with a dot |
| 676 | (``'.'``). |
| 677 | |
| 678 | Cookies may have additional non-standard cookie-attributes. These may be |
| 679 | accessed using the following methods: |
| 680 | |
| 681 | |
| 682 | .. method:: Cookie.has_nonstandard_attr(name) |
| 683 | |
| 684 | Return true if cookie has the named cookie-attribute. |
| 685 | |
| 686 | |
| 687 | .. method:: Cookie.get_nonstandard_attr(name, default=None) |
| 688 | |
| 689 | If cookie has the named cookie-attribute, return its value. Otherwise, return |
| 690 | *default*. |
| 691 | |
| 692 | |
| 693 | .. method:: Cookie.set_nonstandard_attr(name, value) |
| 694 | |
| 695 | Set the value of the named cookie-attribute. |
| 696 | |
| 697 | The :class:`Cookie` class also defines the following method: |
| 698 | |
| 699 | |
Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 700 | .. method:: Cookie.is_expired([now=None]) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 701 | |
| 702 | True if cookie has passed the time at which the server requested it should |
| 703 | expire. If *now* is given (in seconds since the epoch), return whether the |
| 704 | cookie has expired at the specified time. |
| 705 | |
| 706 | |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 707 | Examples |
| 708 | -------- |
| 709 | |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 710 | The first example shows the most common usage of :mod:`http.cookiejar`:: |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 711 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 712 | import http.cookiejar, urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 713 | cj = http.cookiejar.CookieJar() |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 714 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 715 | r = opener.open("http://example.com/") |
| 716 | |
| 717 | This example illustrates how to open a URL using your Netscape, Mozilla, or Lynx |
| 718 | cookies (assumes Unix/Netscape convention for location of the cookies file):: |
| 719 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 720 | import os, http.cookiejar, urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 721 | cj = http.cookiejar.MozillaCookieJar() |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 722 | cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt")) |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 723 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 724 | r = opener.open("http://example.com/") |
| 725 | |
| 726 | The next example illustrates the use of :class:`DefaultCookiePolicy`. Turn on |
| 727 | RFC 2965 cookies, be more strict about domains when setting and returning |
| 728 | Netscape cookies, and block some domains from setting cookies or having them |
| 729 | returned:: |
| 730 | |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 731 | import urllib.request |
Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 732 | from http.cookiejar import CookieJar, DefaultCookiePolicy |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 733 | policy = DefaultCookiePolicy( |
| 734 | rfc2965=True, strict_ns_domain=Policy.DomainStrict, |
| 735 | blocked_domains=["ads.net", ".ads.net"]) |
| 736 | cj = CookieJar(policy) |
Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 737 | opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) |
Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 738 | r = opener.open("http://example.com/") |
| 739 | |