| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 1 | :mod:`http.cookiejar` --- Cookie handling for HTTP clients | 
 | 2 | ========================================================== | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 4 | .. module:: http.cookiejar | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 |    :synopsis: Classes for automatic handling of HTTP cookies. | 
 | 6 | .. moduleauthor:: John J. Lee <jjl@pobox.com> | 
 | 7 | .. sectionauthor:: John J. Lee <jjl@pobox.com> | 
 | 8 |  | 
| Raymond Hettinger | 469271d | 2011-01-27 20:38:46 +0000 | [diff] [blame] | 9 | **Source code:** :source:`Lib/http/cookiejar.py` | 
 | 10 |  | 
 | 11 | -------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 12 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 13 | The :mod:`http.cookiejar` module defines classes for automatic handling of HTTP | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 14 | cookies.  It is useful for accessing web sites that require small pieces of data | 
 | 15 | -- :dfn:`cookies` -- to be set on the client machine by an HTTP response from a | 
 | 16 | web server, and then returned to the server in later HTTP requests. | 
 | 17 |  | 
 | 18 | Both the regular Netscape cookie protocol and the protocol defined by | 
 | 19 | :rfc:`2965` are handled.  RFC 2965 handling is switched off by default. | 
 | 20 | :rfc:`2109` cookies are parsed as Netscape cookies and subsequently treated | 
 | 21 | either as Netscape or RFC 2965 cookies according to the 'policy' in effect. | 
 | 22 | Note that the great majority of cookies on the Internet are Netscape cookies. | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 23 | :mod:`http.cookiejar` attempts to follow the de-facto Netscape cookie protocol (which | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 24 | differs substantially from that set out in the original Netscape specification), | 
 | 25 | including taking note of the ``max-age`` and ``port`` cookie-attributes | 
 | 26 | introduced with RFC 2965. | 
 | 27 |  | 
 | 28 | .. note:: | 
 | 29 |  | 
 | 30 |    The various named parameters found in :mailheader:`Set-Cookie` and | 
 | 31 |    :mailheader:`Set-Cookie2` headers (eg. ``domain`` and ``expires``) are | 
 | 32 |    conventionally referred to as :dfn:`attributes`.  To distinguish them from | 
 | 33 |    Python attributes, the documentation for this module uses the term | 
 | 34 |    :dfn:`cookie-attribute` instead. | 
 | 35 |  | 
 | 36 |  | 
 | 37 | The module defines the following exception: | 
 | 38 |  | 
 | 39 |  | 
 | 40 | .. exception:: LoadError | 
 | 41 |  | 
 | 42 |    Instances of :class:`FileCookieJar` raise this exception on failure to load | 
| Georg Brandl | e6bcc91 | 2008-05-12 18:05:20 +0000 | [diff] [blame] | 43 |    cookies from a file.  :exc:`LoadError` is a subclass of :exc:`IOError`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 44 |  | 
 | 45 |  | 
 | 46 | The following classes are provided: | 
 | 47 |  | 
 | 48 |  | 
 | 49 | .. class:: CookieJar(policy=None) | 
 | 50 |  | 
 | 51 |    *policy* is an object implementing the :class:`CookiePolicy` interface. | 
 | 52 |  | 
 | 53 |    The :class:`CookieJar` class stores HTTP cookies.  It extracts cookies from HTTP | 
 | 54 |    requests, and returns them in HTTP responses. :class:`CookieJar` instances | 
 | 55 |    automatically expire contained cookies when necessary.  Subclasses are also | 
 | 56 |    responsible for storing and retrieving cookies from a file or database. | 
 | 57 |  | 
 | 58 |  | 
 | 59 | .. class:: FileCookieJar(filename, delayload=None, policy=None) | 
 | 60 |  | 
 | 61 |    *policy* is an object implementing the :class:`CookiePolicy` interface.  For the | 
 | 62 |    other arguments, see the documentation for the corresponding attributes. | 
 | 63 |  | 
 | 64 |    A :class:`CookieJar` which can load cookies from, and perhaps save cookies to, a | 
 | 65 |    file on disk.  Cookies are **NOT** loaded from the named file until either the | 
 | 66 |    :meth:`load` or :meth:`revert` method is called.  Subclasses of this class are | 
 | 67 |    documented in section :ref:`file-cookie-jar-classes`. | 
 | 68 |  | 
 | 69 |  | 
 | 70 | .. class:: CookiePolicy() | 
 | 71 |  | 
 | 72 |    This class is responsible for deciding whether each cookie should be accepted | 
 | 73 |    from / returned to the server. | 
 | 74 |  | 
 | 75 |  | 
 | 76 | .. class:: DefaultCookiePolicy( blocked_domains=None, allowed_domains=None, netscape=True, rfc2965=False, rfc2109_as_netscape=None, hide_cookie2=False, strict_domain=False, strict_rfc2965_unverifiable=True, strict_ns_unverifiable=False, strict_ns_domain=DefaultCookiePolicy.DomainLiberal, strict_ns_set_initial_dollar=False, strict_ns_set_path=False ) | 
 | 77 |  | 
 | 78 |    Constructor arguments should be passed as keyword arguments only. | 
 | 79 |    *blocked_domains* is a sequence of domain names that we never accept cookies | 
 | 80 |    from, nor return cookies to. *allowed_domains* if not :const:`None`, this is a | 
 | 81 |    sequence of the only domains for which we accept and return cookies.  For all | 
 | 82 |    other arguments, see the documentation for :class:`CookiePolicy` and | 
 | 83 |    :class:`DefaultCookiePolicy` objects. | 
 | 84 |  | 
 | 85 |    :class:`DefaultCookiePolicy` implements the standard accept / reject rules for | 
 | 86 |    Netscape and RFC 2965 cookies.  By default, RFC 2109 cookies (ie. cookies | 
 | 87 |    received in a :mailheader:`Set-Cookie` header with a version cookie-attribute of | 
 | 88 |    1) are treated according to the RFC 2965 rules.  However, if RFC 2965 handling | 
 | 89 |    is turned off or :attr:`rfc2109_as_netscape` is True, RFC 2109 cookies are | 
 | 90 |    'downgraded' by the :class:`CookieJar` instance to Netscape cookies, by | 
 | 91 |    setting the :attr:`version` attribute of the :class:`Cookie` instance to 0. | 
 | 92 |    :class:`DefaultCookiePolicy` also provides some parameters to allow some | 
 | 93 |    fine-tuning of policy. | 
 | 94 |  | 
 | 95 |  | 
 | 96 | .. class:: Cookie() | 
 | 97 |  | 
 | 98 |    This class represents Netscape, RFC 2109 and RFC 2965 cookies.  It is not | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 99 |    expected that users of :mod:`http.cookiejar` construct their own :class:`Cookie` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 100 |    instances.  Instead, if necessary, call :meth:`make_cookies` on a | 
 | 101 |    :class:`CookieJar` instance. | 
 | 102 |  | 
 | 103 |  | 
 | 104 | .. seealso:: | 
 | 105 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 106 |    Module :mod:`urllib.request` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 107 |       URL opening with automatic cookie handling. | 
 | 108 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 109 |    Module :mod:`http.cookies` | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 110 |       HTTP cookie classes, principally useful for server-side code.  The | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 111 |       :mod:`http.cookiejar` and :mod:`http.cookies` modules do not depend on each | 
 | 112 |       other. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 113 |  | 
| Christian Heimes | dd15f6c | 2008-03-16 00:07:10 +0000 | [diff] [blame] | 114 |    http://wp.netscape.com/newsref/std/cookie_spec.html | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 115 |       The specification of the original Netscape cookie protocol.  Though this is | 
 | 116 |       still the dominant protocol, the 'Netscape cookie protocol' implemented by all | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 117 |       the major browsers (and :mod:`http.cookiejar`) only bears a passing resemblance to | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 118 |       the one sketched out in ``cookie_spec.html``. | 
 | 119 |  | 
 | 120 |    :rfc:`2109` - HTTP State Management Mechanism | 
 | 121 |       Obsoleted by RFC 2965. Uses :mailheader:`Set-Cookie` with version=1. | 
 | 122 |  | 
 | 123 |    :rfc:`2965` - HTTP State Management Mechanism | 
 | 124 |       The Netscape protocol with the bugs fixed.  Uses :mailheader:`Set-Cookie2` in | 
 | 125 |       place of :mailheader:`Set-Cookie`.  Not widely used. | 
 | 126 |  | 
 | 127 |    http://kristol.org/cookie/errata.html | 
 | 128 |       Unfinished errata to RFC 2965. | 
 | 129 |  | 
 | 130 |    :rfc:`2964` - Use of HTTP State Management | 
 | 131 |  | 
 | 132 | .. _cookie-jar-objects: | 
 | 133 |  | 
 | 134 | CookieJar and FileCookieJar Objects | 
 | 135 | ----------------------------------- | 
 | 136 |  | 
| Georg Brandl | 9afde1c | 2007-11-01 20:32:30 +0000 | [diff] [blame] | 137 | :class:`CookieJar` objects support the :term:`iterator` protocol for iterating over | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 138 | contained :class:`Cookie` objects. | 
 | 139 |  | 
 | 140 | :class:`CookieJar` has the following methods: | 
 | 141 |  | 
 | 142 |  | 
 | 143 | .. method:: CookieJar.add_cookie_header(request) | 
 | 144 |  | 
 | 145 |    Add correct :mailheader:`Cookie` header to *request*. | 
 | 146 |  | 
 | 147 |    If policy allows (ie. the :attr:`rfc2965` and :attr:`hide_cookie2` attributes of | 
 | 148 |    the :class:`CookieJar`'s :class:`CookiePolicy` instance are true and false | 
 | 149 |    respectively), the :mailheader:`Cookie2` header is also added when appropriate. | 
 | 150 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 151 |    The *request* object (usually a :class:`urllib.request..Request` instance) | 
 | 152 |    must support the methods :meth:`get_full_url`, :meth:`get_host`, | 
 | 153 |    :meth:`get_type`, :meth:`unverifiable`, :meth:`get_origin_req_host`, | 
 | 154 |    :meth:`has_header`, :meth:`get_header`, :meth:`header_items`, and | 
 | 155 |    :meth:`add_unredirected_header`, as documented by :mod:`urllib.request`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 156 |  | 
 | 157 |  | 
 | 158 | .. method:: CookieJar.extract_cookies(response, request) | 
 | 159 |  | 
 | 160 |    Extract cookies from HTTP *response* and store them in the :class:`CookieJar`, | 
 | 161 |    where allowed by policy. | 
 | 162 |  | 
 | 163 |    The :class:`CookieJar` will look for allowable :mailheader:`Set-Cookie` and | 
 | 164 |    :mailheader:`Set-Cookie2` headers in the *response* argument, and store cookies | 
 | 165 |    as appropriate (subject to the :meth:`CookiePolicy.set_ok` method's approval). | 
 | 166 |  | 
| Georg Brandl | 83e9f4c | 2008-06-12 18:52:31 +0000 | [diff] [blame] | 167 |    The *response* object (usually the result of a call to | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 168 |    :meth:`urllib.request.urlopen`, or similar) should support an :meth:`info` | 
 | 169 |    method, which returns a :class:`email.message.Message` instance. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 170 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 171 |    The *request* object (usually a :class:`urllib.request.Request` instance) | 
 | 172 |    must support the methods :meth:`get_full_url`, :meth:`get_host`, | 
 | 173 |    :meth:`unverifiable`, and :meth:`get_origin_req_host`, as documented by | 
 | 174 |    :mod:`urllib.request`.  The request is used to set default values for | 
 | 175 |    cookie-attributes as well as for checking that the cookie is allowed to be | 
 | 176 |    set. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 177 |  | 
 | 178 |  | 
 | 179 | .. method:: CookieJar.set_policy(policy) | 
 | 180 |  | 
 | 181 |    Set the :class:`CookiePolicy` instance to be used. | 
 | 182 |  | 
 | 183 |  | 
 | 184 | .. method:: CookieJar.make_cookies(response, request) | 
 | 185 |  | 
 | 186 |    Return sequence of :class:`Cookie` objects extracted from *response* object. | 
 | 187 |  | 
 | 188 |    See the documentation for :meth:`extract_cookies` for the interfaces required of | 
 | 189 |    the *response* and *request* arguments. | 
 | 190 |  | 
 | 191 |  | 
 | 192 | .. method:: CookieJar.set_cookie_if_ok(cookie, request) | 
 | 193 |  | 
 | 194 |    Set a :class:`Cookie` if policy says it's OK to do so. | 
 | 195 |  | 
 | 196 |  | 
 | 197 | .. method:: CookieJar.set_cookie(cookie) | 
 | 198 |  | 
 | 199 |    Set a :class:`Cookie`, without checking with policy to see whether or not it | 
 | 200 |    should be set. | 
 | 201 |  | 
 | 202 |  | 
 | 203 | .. method:: CookieJar.clear([domain[, path[, name]]]) | 
 | 204 |  | 
 | 205 |    Clear some cookies. | 
 | 206 |  | 
 | 207 |    If invoked without arguments, clear all cookies.  If given a single argument, | 
 | 208 |    only cookies belonging to that *domain* will be removed. If given two arguments, | 
 | 209 |    cookies belonging to the specified *domain* and URL *path* are removed.  If | 
 | 210 |    given three arguments, then the cookie with the specified *domain*, *path* and | 
 | 211 |    *name* is removed. | 
 | 212 |  | 
 | 213 |    Raises :exc:`KeyError` if no matching cookie exists. | 
 | 214 |  | 
 | 215 |  | 
 | 216 | .. method:: CookieJar.clear_session_cookies() | 
 | 217 |  | 
 | 218 |    Discard all session cookies. | 
 | 219 |  | 
 | 220 |    Discards all contained cookies that have a true :attr:`discard` attribute | 
 | 221 |    (usually because they had either no ``max-age`` or ``expires`` cookie-attribute, | 
 | 222 |    or an explicit ``discard`` cookie-attribute).  For interactive browsers, the end | 
 | 223 |    of a session usually corresponds to closing the browser window. | 
 | 224 |  | 
 | 225 |    Note that the :meth:`save` method won't save session cookies anyway, unless you | 
 | 226 |    ask otherwise by passing a true *ignore_discard* argument. | 
 | 227 |  | 
 | 228 | :class:`FileCookieJar` implements the following additional methods: | 
 | 229 |  | 
 | 230 |  | 
 | 231 | .. method:: FileCookieJar.save(filename=None, ignore_discard=False, ignore_expires=False) | 
 | 232 |  | 
 | 233 |    Save cookies to a file. | 
 | 234 |  | 
 | 235 |    This base class raises :exc:`NotImplementedError`.  Subclasses may leave this | 
 | 236 |    method unimplemented. | 
 | 237 |  | 
 | 238 |    *filename* is the name of file in which to save cookies.  If *filename* is not | 
 | 239 |    specified, :attr:`self.filename` is used (whose default is the value passed to | 
 | 240 |    the constructor, if any); if :attr:`self.filename` is :const:`None`, | 
 | 241 |    :exc:`ValueError` is raised. | 
 | 242 |  | 
 | 243 |    *ignore_discard*: save even cookies set to be discarded. *ignore_expires*: save | 
 | 244 |    even cookies that have expired | 
 | 245 |  | 
 | 246 |    The file is overwritten if it already exists, thus wiping all the cookies it | 
 | 247 |    contains.  Saved cookies can be restored later using the :meth:`load` or | 
 | 248 |    :meth:`revert` methods. | 
 | 249 |  | 
 | 250 |  | 
 | 251 | .. method:: FileCookieJar.load(filename=None, ignore_discard=False, ignore_expires=False) | 
 | 252 |  | 
 | 253 |    Load cookies from a file. | 
 | 254 |  | 
 | 255 |    Old cookies are kept unless overwritten by newly loaded ones. | 
 | 256 |  | 
 | 257 |    Arguments are as for :meth:`save`. | 
 | 258 |  | 
 | 259 |    The named file must be in the format understood by the class, or | 
 | 260 |    :exc:`LoadError` will be raised.  Also, :exc:`IOError` may be raised, for | 
 | 261 |    example if the file does not exist. | 
 | 262 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 263 |  | 
 | 264 | .. method:: FileCookieJar.revert(filename=None, ignore_discard=False, ignore_expires=False) | 
 | 265 |  | 
 | 266 |    Clear all cookies and reload cookies from a saved file. | 
 | 267 |  | 
 | 268 |    :meth:`revert` can raise the same exceptions as :meth:`load`. If there is a | 
 | 269 |    failure, the object's state will not be altered. | 
 | 270 |  | 
 | 271 | :class:`FileCookieJar` instances have the following public attributes: | 
 | 272 |  | 
 | 273 |  | 
 | 274 | .. attribute:: FileCookieJar.filename | 
 | 275 |  | 
 | 276 |    Filename of default file in which to keep cookies.  This attribute may be | 
 | 277 |    assigned to. | 
 | 278 |  | 
 | 279 |  | 
 | 280 | .. attribute:: FileCookieJar.delayload | 
 | 281 |  | 
 | 282 |    If true, load cookies lazily from disk.  This attribute should not be assigned | 
 | 283 |    to.  This is only a hint, since this only affects performance, not behaviour | 
 | 284 |    (unless the cookies on disk are changing). A :class:`CookieJar` object may | 
 | 285 |    ignore it.  None of the :class:`FileCookieJar` classes included in the standard | 
 | 286 |    library lazily loads cookies. | 
 | 287 |  | 
 | 288 |  | 
 | 289 | .. _file-cookie-jar-classes: | 
 | 290 |  | 
 | 291 | FileCookieJar subclasses and co-operation with web browsers | 
 | 292 | ----------------------------------------------------------- | 
 | 293 |  | 
| Senthil Kumaran | aba088e | 2010-07-11 05:01:52 +0000 | [diff] [blame] | 294 | The following :class:`CookieJar` subclasses are provided for reading and | 
 | 295 | writing . | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 296 |  | 
 | 297 | .. class:: MozillaCookieJar(filename, delayload=None, policy=None) | 
 | 298 |  | 
 | 299 |    A :class:`FileCookieJar` that can load from and save cookies to disk in the | 
 | 300 |    Mozilla ``cookies.txt`` file format (which is also used by the Lynx and Netscape | 
 | 301 |    browsers). | 
 | 302 |  | 
 | 303 |    .. note:: | 
 | 304 |  | 
 | 305 |       This loses information about RFC 2965 cookies, and also about newer or | 
 | 306 |       non-standard cookie-attributes such as ``port``. | 
 | 307 |  | 
 | 308 |    .. warning:: | 
 | 309 |  | 
 | 310 |       Back up your cookies before saving if you have cookies whose loss / corruption | 
 | 311 |       would be inconvenient (there are some subtleties which may lead to slight | 
 | 312 |       changes in the file over a load / save round-trip). | 
 | 313 |  | 
 | 314 |    Also note that cookies saved while Mozilla is running will get clobbered by | 
 | 315 |    Mozilla. | 
 | 316 |  | 
 | 317 |  | 
 | 318 | .. class:: LWPCookieJar(filename, delayload=None, policy=None) | 
 | 319 |  | 
 | 320 |    A :class:`FileCookieJar` that can load from and save cookies to disk in format | 
 | 321 |    compatible with the libwww-perl library's ``Set-Cookie3`` file format.  This is | 
 | 322 |    convenient if you want to store cookies in a human-readable file. | 
 | 323 |  | 
 | 324 |  | 
 | 325 | .. _cookie-policy-objects: | 
 | 326 |  | 
 | 327 | CookiePolicy Objects | 
 | 328 | -------------------- | 
 | 329 |  | 
 | 330 | Objects implementing the :class:`CookiePolicy` interface have the following | 
 | 331 | methods: | 
 | 332 |  | 
 | 333 |  | 
 | 334 | .. method:: CookiePolicy.set_ok(cookie, request) | 
 | 335 |  | 
 | 336 |    Return boolean value indicating whether cookie should be accepted from server. | 
 | 337 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 338 |    *cookie* is a :class:`Cookie` instance.  *request* is an object | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 339 |    implementing the interface defined by the documentation for | 
 | 340 |    :meth:`CookieJar.extract_cookies`. | 
 | 341 |  | 
 | 342 |  | 
 | 343 | .. method:: CookiePolicy.return_ok(cookie, request) | 
 | 344 |  | 
 | 345 |    Return boolean value indicating whether cookie should be returned to server. | 
 | 346 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 347 |    *cookie* is a :class:`Cookie` instance.  *request* is an object | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 348 |    implementing the interface defined by the documentation for | 
 | 349 |    :meth:`CookieJar.add_cookie_header`. | 
 | 350 |  | 
 | 351 |  | 
 | 352 | .. method:: CookiePolicy.domain_return_ok(domain, request) | 
 | 353 |  | 
 | 354 |    Return false if cookies should not be returned, given cookie domain. | 
 | 355 |  | 
 | 356 |    This method is an optimization.  It removes the need for checking every cookie | 
 | 357 |    with a particular domain (which might involve reading many files).  Returning | 
 | 358 |    true from :meth:`domain_return_ok` and :meth:`path_return_ok` leaves all the | 
 | 359 |    work to :meth:`return_ok`. | 
 | 360 |  | 
 | 361 |    If :meth:`domain_return_ok` returns true for the cookie domain, | 
 | 362 |    :meth:`path_return_ok` is called for the cookie path.  Otherwise, | 
 | 363 |    :meth:`path_return_ok` and :meth:`return_ok` are never called for that cookie | 
 | 364 |    domain.  If :meth:`path_return_ok` returns true, :meth:`return_ok` is called | 
 | 365 |    with the :class:`Cookie` object itself for a full check.  Otherwise, | 
 | 366 |    :meth:`return_ok` is never called for that cookie path. | 
 | 367 |  | 
 | 368 |    Note that :meth:`domain_return_ok` is called for every *cookie* domain, not just | 
 | 369 |    for the *request* domain.  For example, the function might be called with both | 
 | 370 |    ``".example.com"`` and ``"www.example.com"`` if the request domain is | 
 | 371 |    ``"www.example.com"``.  The same goes for :meth:`path_return_ok`. | 
 | 372 |  | 
 | 373 |    The *request* argument is as documented for :meth:`return_ok`. | 
 | 374 |  | 
 | 375 |  | 
 | 376 | .. method:: CookiePolicy.path_return_ok(path, request) | 
 | 377 |  | 
 | 378 |    Return false if cookies should not be returned, given cookie path. | 
 | 379 |  | 
 | 380 |    See the documentation for :meth:`domain_return_ok`. | 
 | 381 |  | 
 | 382 | In addition to implementing the methods above, implementations of the | 
 | 383 | :class:`CookiePolicy` interface must also supply the following attributes, | 
 | 384 | indicating which protocols should be used, and how.  All of these attributes may | 
 | 385 | be assigned to. | 
 | 386 |  | 
 | 387 |  | 
 | 388 | .. attribute:: CookiePolicy.netscape | 
 | 389 |  | 
 | 390 |    Implement Netscape protocol. | 
 | 391 |  | 
 | 392 |  | 
 | 393 | .. attribute:: CookiePolicy.rfc2965 | 
 | 394 |  | 
 | 395 |    Implement RFC 2965 protocol. | 
 | 396 |  | 
 | 397 |  | 
 | 398 | .. attribute:: CookiePolicy.hide_cookie2 | 
 | 399 |  | 
 | 400 |    Don't add :mailheader:`Cookie2` header to requests (the presence of this header | 
 | 401 |    indicates to the server that we understand RFC 2965 cookies). | 
 | 402 |  | 
 | 403 | The most useful way to define a :class:`CookiePolicy` class is by subclassing | 
 | 404 | from :class:`DefaultCookiePolicy` and overriding some or all of the methods | 
 | 405 | above.  :class:`CookiePolicy` itself may be used as a 'null policy' to allow | 
 | 406 | setting and receiving any and all cookies (this is unlikely to be useful). | 
 | 407 |  | 
 | 408 |  | 
 | 409 | .. _default-cookie-policy-objects: | 
 | 410 |  | 
 | 411 | DefaultCookiePolicy Objects | 
 | 412 | --------------------------- | 
 | 413 |  | 
 | 414 | Implements the standard rules for accepting and returning cookies. | 
 | 415 |  | 
 | 416 | Both RFC 2965 and Netscape cookies are covered.  RFC 2965 handling is switched | 
 | 417 | off by default. | 
 | 418 |  | 
 | 419 | The easiest way to provide your own policy is to override this class and call | 
 | 420 | its methods in your overridden implementations before adding your own additional | 
 | 421 | checks:: | 
 | 422 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 423 |    import http.cookiejar | 
 | 424 |    class MyCookiePolicy(http.cookiejar.DefaultCookiePolicy): | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 425 |        def set_ok(self, cookie, request): | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 426 |            if not http.cookiejar.DefaultCookiePolicy.set_ok(self, cookie, request): | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 427 |                return False | 
 | 428 |            if i_dont_want_to_store_this_cookie(cookie): | 
 | 429 |                return False | 
 | 430 |            return True | 
 | 431 |  | 
 | 432 | In addition to the features required to implement the :class:`CookiePolicy` | 
 | 433 | interface, this class allows you to block and allow domains from setting and | 
 | 434 | receiving cookies.  There are also some strictness switches that allow you to | 
 | 435 | tighten up the rather loose Netscape protocol rules a little bit (at the cost of | 
 | 436 | blocking some benign cookies). | 
 | 437 |  | 
 | 438 | A domain blacklist and whitelist is provided (both off by default). Only domains | 
 | 439 | not in the blacklist and present in the whitelist (if the whitelist is active) | 
 | 440 | participate in cookie setting and returning.  Use the *blocked_domains* | 
 | 441 | constructor argument, and :meth:`blocked_domains` and | 
 | 442 | :meth:`set_blocked_domains` methods (and the corresponding argument and methods | 
 | 443 | for *allowed_domains*).  If you set a whitelist, you can turn it off again by | 
 | 444 | setting it to :const:`None`. | 
 | 445 |  | 
 | 446 | Domains in block or allow lists that do not start with a dot must equal the | 
 | 447 | cookie domain to be matched.  For example, ``"example.com"`` matches a blacklist | 
 | 448 | entry of ``"example.com"``, but ``"www.example.com"`` does not.  Domains that do | 
 | 449 | start with a dot are matched by more specific domains too. For example, both | 
 | 450 | ``"www.example.com"`` and ``"www.coyote.example.com"`` match ``".example.com"`` | 
 | 451 | (but ``"example.com"`` itself does not).  IP addresses are an exception, and | 
 | 452 | must match exactly.  For example, if blocked_domains contains ``"192.168.1.2"`` | 
 | 453 | and ``".168.1.2"``, 192.168.1.2 is blocked, but 193.168.1.2 is not. | 
 | 454 |  | 
 | 455 | :class:`DefaultCookiePolicy` implements the following additional methods: | 
 | 456 |  | 
 | 457 |  | 
 | 458 | .. method:: DefaultCookiePolicy.blocked_domains() | 
 | 459 |  | 
 | 460 |    Return the sequence of blocked domains (as a tuple). | 
 | 461 |  | 
 | 462 |  | 
 | 463 | .. method:: DefaultCookiePolicy.set_blocked_domains(blocked_domains) | 
 | 464 |  | 
 | 465 |    Set the sequence of blocked domains. | 
 | 466 |  | 
 | 467 |  | 
 | 468 | .. method:: DefaultCookiePolicy.is_blocked(domain) | 
 | 469 |  | 
 | 470 |    Return whether *domain* is on the blacklist for setting or receiving cookies. | 
 | 471 |  | 
 | 472 |  | 
 | 473 | .. method:: DefaultCookiePolicy.allowed_domains() | 
 | 474 |  | 
 | 475 |    Return :const:`None`, or the sequence of allowed domains (as a tuple). | 
 | 476 |  | 
 | 477 |  | 
 | 478 | .. method:: DefaultCookiePolicy.set_allowed_domains(allowed_domains) | 
 | 479 |  | 
 | 480 |    Set the sequence of allowed domains, or :const:`None`. | 
 | 481 |  | 
 | 482 |  | 
 | 483 | .. method:: DefaultCookiePolicy.is_not_allowed(domain) | 
 | 484 |  | 
 | 485 |    Return whether *domain* is not on the whitelist for setting or receiving | 
 | 486 |    cookies. | 
 | 487 |  | 
 | 488 | :class:`DefaultCookiePolicy` instances have the following attributes, which are | 
 | 489 | all initialised from the constructor arguments of the same name, and which may | 
 | 490 | all be assigned to. | 
 | 491 |  | 
 | 492 |  | 
 | 493 | .. attribute:: DefaultCookiePolicy.rfc2109_as_netscape | 
 | 494 |  | 
 | 495 |    If true, request that the :class:`CookieJar` instance downgrade RFC 2109 cookies | 
 | 496 |    (ie. cookies received in a :mailheader:`Set-Cookie` header with a version | 
 | 497 |    cookie-attribute of 1) to Netscape cookies by setting the version attribute of | 
 | 498 |    the :class:`Cookie` instance to 0.  The default value is :const:`None`, in which | 
 | 499 |    case RFC 2109 cookies are downgraded if and only if RFC 2965 handling is turned | 
 | 500 |    off.  Therefore, RFC 2109 cookies are downgraded by default. | 
 | 501 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 502 |  | 
 | 503 | General strictness switches: | 
 | 504 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 505 | .. attribute:: DefaultCookiePolicy.strict_domain | 
 | 506 |  | 
 | 507 |    Don't allow sites to set two-component domains with country-code top-level | 
 | 508 |    domains like ``.co.uk``, ``.gov.uk``, ``.co.nz``.etc.  This is far from perfect | 
 | 509 |    and isn't guaranteed to work! | 
 | 510 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 511 |  | 
| Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 512 | RFC 2965 protocol strictness switches: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 513 |  | 
 | 514 | .. attribute:: DefaultCookiePolicy.strict_rfc2965_unverifiable | 
 | 515 |  | 
 | 516 |    Follow RFC 2965 rules on unverifiable transactions (usually, an unverifiable | 
 | 517 |    transaction is one resulting from a redirect or a request for an image hosted on | 
 | 518 |    another site).  If this is false, cookies are *never* blocked on the basis of | 
 | 519 |    verifiability | 
 | 520 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 521 |  | 
| Georg Brandl | 55ac8f0 | 2007-09-01 13:51:09 +0000 | [diff] [blame] | 522 | Netscape protocol strictness switches: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 523 |  | 
 | 524 | .. attribute:: DefaultCookiePolicy.strict_ns_unverifiable | 
 | 525 |  | 
 | 526 |    apply RFC 2965 rules on unverifiable transactions even to Netscape cookies | 
 | 527 |  | 
 | 528 |  | 
 | 529 | .. attribute:: DefaultCookiePolicy.strict_ns_domain | 
 | 530 |  | 
 | 531 |    Flags indicating how strict to be with domain-matching rules for Netscape | 
 | 532 |    cookies.  See below for acceptable values. | 
 | 533 |  | 
 | 534 |  | 
 | 535 | .. attribute:: DefaultCookiePolicy.strict_ns_set_initial_dollar | 
 | 536 |  | 
 | 537 |    Ignore cookies in Set-Cookie: headers that have names starting with ``'$'``. | 
 | 538 |  | 
 | 539 |  | 
 | 540 | .. attribute:: DefaultCookiePolicy.strict_ns_set_path | 
 | 541 |  | 
 | 542 |    Don't allow setting cookies whose path doesn't path-match request URI. | 
 | 543 |  | 
 | 544 | :attr:`strict_ns_domain` is a collection of flags.  Its value is constructed by | 
 | 545 | or-ing together (for example, ``DomainStrictNoDots|DomainStrictNonDomain`` means | 
 | 546 | both flags are set). | 
 | 547 |  | 
 | 548 |  | 
 | 549 | .. attribute:: DefaultCookiePolicy.DomainStrictNoDots | 
 | 550 |  | 
 | 551 |    When setting cookies, the 'host prefix' must not contain a dot (eg. | 
 | 552 |    ``www.foo.bar.com`` can't set a cookie for ``.bar.com``, because ``www.foo`` | 
 | 553 |    contains a dot). | 
 | 554 |  | 
 | 555 |  | 
 | 556 | .. attribute:: DefaultCookiePolicy.DomainStrictNonDomain | 
 | 557 |  | 
 | 558 |    Cookies that did not explicitly specify a ``domain`` cookie-attribute can only | 
 | 559 |    be returned to a domain equal to the domain that set the cookie (eg. | 
 | 560 |    ``spam.example.com`` won't be returned cookies from ``example.com`` that had no | 
 | 561 |    ``domain`` cookie-attribute). | 
 | 562 |  | 
 | 563 |  | 
 | 564 | .. attribute:: DefaultCookiePolicy.DomainRFC2965Match | 
 | 565 |  | 
 | 566 |    When setting cookies, require a full RFC 2965 domain-match. | 
 | 567 |  | 
 | 568 | The following attributes are provided for convenience, and are the most useful | 
 | 569 | combinations of the above flags: | 
 | 570 |  | 
 | 571 |  | 
 | 572 | .. attribute:: DefaultCookiePolicy.DomainLiberal | 
 | 573 |  | 
 | 574 |    Equivalent to 0 (ie. all of the above Netscape domain strictness flags switched | 
 | 575 |    off). | 
 | 576 |  | 
 | 577 |  | 
 | 578 | .. attribute:: DefaultCookiePolicy.DomainStrict | 
 | 579 |  | 
 | 580 |    Equivalent to ``DomainStrictNoDots|DomainStrictNonDomain``. | 
 | 581 |  | 
 | 582 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 583 | Cookie Objects | 
 | 584 | -------------- | 
 | 585 |  | 
 | 586 | :class:`Cookie` instances have Python attributes roughly corresponding to the | 
 | 587 | standard cookie-attributes specified in the various cookie standards.  The | 
 | 588 | correspondence is not one-to-one, because there are complicated rules for | 
 | 589 | assigning default values, because the ``max-age`` and ``expires`` | 
 | 590 | cookie-attributes contain equivalent information, and because RFC 2109 cookies | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 591 | may be 'downgraded' by :mod:`http.cookiejar` from version 1 to version 0 (Netscape) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 592 | cookies. | 
 | 593 |  | 
 | 594 | Assignment to these attributes should not be necessary other than in rare | 
 | 595 | circumstances in a :class:`CookiePolicy` method.  The class does not enforce | 
 | 596 | internal consistency, so you should know what you're doing if you do that. | 
 | 597 |  | 
 | 598 |  | 
 | 599 | .. attribute:: Cookie.version | 
 | 600 |  | 
 | 601 |    Integer or :const:`None`.  Netscape cookies have :attr:`version` 0. RFC 2965 and | 
 | 602 |    RFC 2109 cookies have a ``version`` cookie-attribute of 1.  However, note that | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 603 |    :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in which | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 604 |    case :attr:`version` is 0. | 
 | 605 |  | 
 | 606 |  | 
 | 607 | .. attribute:: Cookie.name | 
 | 608 |  | 
 | 609 |    Cookie name (a string). | 
 | 610 |  | 
 | 611 |  | 
 | 612 | .. attribute:: Cookie.value | 
 | 613 |  | 
 | 614 |    Cookie value (a string), or :const:`None`. | 
 | 615 |  | 
 | 616 |  | 
 | 617 | .. attribute:: Cookie.port | 
 | 618 |  | 
 | 619 |    String representing a port or a set of ports (eg. '80', or '80,8080'), or | 
 | 620 |    :const:`None`. | 
 | 621 |  | 
 | 622 |  | 
 | 623 | .. attribute:: Cookie.path | 
 | 624 |  | 
 | 625 |    Cookie path (a string, eg. ``'/acme/rocket_launchers'``). | 
 | 626 |  | 
 | 627 |  | 
 | 628 | .. attribute:: Cookie.secure | 
 | 629 |  | 
 | 630 |    True if cookie should only be returned over a secure connection. | 
 | 631 |  | 
 | 632 |  | 
 | 633 | .. attribute:: Cookie.expires | 
 | 634 |  | 
 | 635 |    Integer expiry date in seconds since epoch, or :const:`None`.  See also the | 
 | 636 |    :meth:`is_expired` method. | 
 | 637 |  | 
 | 638 |  | 
 | 639 | .. attribute:: Cookie.discard | 
 | 640 |  | 
 | 641 |    True if this is a session cookie. | 
 | 642 |  | 
 | 643 |  | 
 | 644 | .. attribute:: Cookie.comment | 
 | 645 |  | 
 | 646 |    String comment from the server explaining the function of this cookie, or | 
 | 647 |    :const:`None`. | 
 | 648 |  | 
 | 649 |  | 
 | 650 | .. attribute:: Cookie.comment_url | 
 | 651 |  | 
 | 652 |    URL linking to a comment from the server explaining the function of this cookie, | 
 | 653 |    or :const:`None`. | 
 | 654 |  | 
 | 655 |  | 
 | 656 | .. attribute:: Cookie.rfc2109 | 
 | 657 |  | 
 | 658 |    True if this cookie was received as an RFC 2109 cookie (ie. the cookie | 
 | 659 |    arrived in a :mailheader:`Set-Cookie` header, and the value of the Version | 
 | 660 |    cookie-attribute in that header was 1).  This attribute is provided because | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 661 |    :mod:`http.cookiejar` may 'downgrade' RFC 2109 cookies to Netscape cookies, in | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 662 |    which case :attr:`version` is 0. | 
 | 663 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 664 |  | 
 | 665 | .. attribute:: Cookie.port_specified | 
 | 666 |  | 
 | 667 |    True if a port or set of ports was explicitly specified by the server (in the | 
 | 668 |    :mailheader:`Set-Cookie` / :mailheader:`Set-Cookie2` header). | 
 | 669 |  | 
 | 670 |  | 
 | 671 | .. attribute:: Cookie.domain_specified | 
 | 672 |  | 
 | 673 |    True if a domain was explicitly specified by the server. | 
 | 674 |  | 
 | 675 |  | 
 | 676 | .. attribute:: Cookie.domain_initial_dot | 
 | 677 |  | 
 | 678 |    True if the domain explicitly specified by the server began with a dot | 
 | 679 |    (``'.'``). | 
 | 680 |  | 
 | 681 | Cookies may have additional non-standard cookie-attributes.  These may be | 
 | 682 | accessed using the following methods: | 
 | 683 |  | 
 | 684 |  | 
 | 685 | .. method:: Cookie.has_nonstandard_attr(name) | 
 | 686 |  | 
 | 687 |    Return true if cookie has the named cookie-attribute. | 
 | 688 |  | 
 | 689 |  | 
 | 690 | .. method:: Cookie.get_nonstandard_attr(name, default=None) | 
 | 691 |  | 
 | 692 |    If cookie has the named cookie-attribute, return its value. Otherwise, return | 
 | 693 |    *default*. | 
 | 694 |  | 
 | 695 |  | 
 | 696 | .. method:: Cookie.set_nonstandard_attr(name, value) | 
 | 697 |  | 
 | 698 |    Set the value of the named cookie-attribute. | 
 | 699 |  | 
 | 700 | The :class:`Cookie` class also defines the following method: | 
 | 701 |  | 
 | 702 |  | 
| Georg Brandl | 1f01deb | 2009-01-03 22:47:39 +0000 | [diff] [blame] | 703 | .. method:: Cookie.is_expired([now=None]) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 704 |  | 
 | 705 |    True if cookie has passed the time at which the server requested it should | 
 | 706 |    expire.  If *now* is given (in seconds since the epoch), return whether the | 
 | 707 |    cookie has expired at the specified time. | 
 | 708 |  | 
 | 709 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 710 | Examples | 
 | 711 | -------- | 
 | 712 |  | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 713 | The first example shows the most common usage of :mod:`http.cookiejar`:: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 714 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 715 |    import http.cookiejar, urllib.request | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 716 |    cj = http.cookiejar.CookieJar() | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 717 |    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 718 |    r = opener.open("http://example.com/") | 
 | 719 |  | 
 | 720 | This example illustrates how to open a URL using your Netscape, Mozilla, or Lynx | 
 | 721 | cookies (assumes Unix/Netscape convention for location of the cookies file):: | 
 | 722 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 723 |    import os, http.cookiejar, urllib.request | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 724 |    cj = http.cookiejar.MozillaCookieJar() | 
| Éric Araujo | 4dcf502 | 2011-03-25 20:31:50 +0100 | [diff] [blame] | 725 |    cj.load(os.path.join(os.path.expanduser("~"), ".netscape", "cookies.txt")) | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 726 |    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 727 |    r = opener.open("http://example.com/") | 
 | 728 |  | 
 | 729 | The next example illustrates the use of :class:`DefaultCookiePolicy`. Turn on | 
 | 730 | RFC 2965 cookies, be more strict about domains when setting and returning | 
 | 731 | Netscape cookies, and block some domains from setting cookies or having them | 
 | 732 | returned:: | 
 | 733 |  | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 734 |    import urllib.request | 
| Georg Brandl | 2442015 | 2008-05-26 16:32:26 +0000 | [diff] [blame] | 735 |    from http.cookiejar import CookieJar, DefaultCookiePolicy | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 736 |    policy = DefaultCookiePolicy( | 
 | 737 |        rfc2965=True, strict_ns_domain=Policy.DomainStrict, | 
 | 738 |        blocked_domains=["ads.net", ".ads.net"]) | 
 | 739 |    cj = CookieJar(policy) | 
| Georg Brandl | 029986a | 2008-06-23 11:44:14 +0000 | [diff] [blame] | 740 |    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 741 |    r = opener.open("http://example.com/") | 
 | 742 |  |