Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 1 | \section{\module{cookielib} --- |
| 2 | Cookie handling for HTTP clients} |
| 3 | |
| 4 | \declaremodule{standard}{cookielib} |
| 5 | \moduleauthor{John J. Lee}{jjl@pobox.com} |
| 6 | \sectionauthor{John J. Lee}{jjl@pobox.com} |
| 7 | |
| 8 | \modulesynopsis{Cookie handling for HTTP clients} |
| 9 | |
| 10 | The \module{cookielib} module defines classes for automatic handling |
| 11 | of HTTP cookies. It is useful for accessing web sites that require |
| 12 | small pieces of data -- \dfn{cookies} -- to be set on the client |
| 13 | machine by an HTTP response from a web server, and then returned to |
| 14 | the server in later HTTP requests. |
| 15 | |
| 16 | Both the regular Netscape cookie protocol and the protocol defined by |
| 17 | \rfc{2965} are handled. RFC 2965 handling is switched off by default. |
| 18 | \rfc{2109} cookies are parsed as Netscape cookies and subsequently |
| 19 | treated as RFC 2965 cookies. Note that the great majority of cookies |
| 20 | on the Internet are Netscape cookies. \module{cookielib} attempts to |
| 21 | follow the de-facto Netscape cookie protocol (which differs |
| 22 | substantially from that set out in the original Netscape |
| 23 | specification), including taking note of the \code{max-age} and |
| 24 | \code{port} cookie-attributes introduced with RFC 2109. \note{The |
| 25 | various named parameters found in \mailheader{Set-Cookie} and |
| 26 | \mailheader{Set-Cookie2} headers (eg. \code{domain} and |
| 27 | \code{expires}) are conventionally referred to as \dfn{attributes}. |
| 28 | To distinguish them from Python attributes, the documentation for this |
| 29 | module uses the term \dfn{cookie-attribute} instead}. |
| 30 | |
| 31 | |
| 32 | The module defines the following exception: |
| 33 | |
| 34 | \begin{excdesc}{LoadError} |
| 35 | Instances of \class{FileCookieJar} raise this exception on failure to |
| 36 | load cookies from a file. |
| 37 | \end{excdesc} |
| 38 | |
| 39 | |
| 40 | The following classes are provided: |
| 41 | |
| 42 | \begin{classdesc}{CookieJar}{policy=\constant{None}} |
| 43 | \var{policy} is an object implementing the \class{CookiePolicy} |
| 44 | interface. |
| 45 | |
| 46 | The \class{CookieJar} class stores HTTP cookies. It extracts cookies |
| 47 | from HTTP requests, and returns them in HTTP responses. |
| 48 | \class{CookieJar} instances automatically expire contained cookies |
| 49 | when necessary. Subclasses are also responsible for storing and |
| 50 | retrieving cookies from a file or database. |
| 51 | \end{classdesc} |
| 52 | |
| 53 | \begin{classdesc}{FileCookieJar}{filename, delayload=\constant{None}, |
| 54 | policy=\constant{None}} |
| 55 | \var{policy} is an object implementing the \class{CookiePolicy} |
| 56 | interface. For the other arguments, see the documentation for the |
| 57 | corresponding attributes. |
| 58 | |
| 59 | A \class{CookieJar} which can load cookies from, and perhaps save |
| 60 | cookies to, a file on disk. Cookies are \strong{NOT} loaded from the |
| 61 | named file until either the \method{load()} or \method{revert()} |
| 62 | method is called. Subclasses of this class are documented in section |
| 63 | \ref{file-cookie-jar-classes}. |
| 64 | \end{classdesc} |
| 65 | |
| 66 | \begin{classdesc}{CookiePolicy}{} |
| 67 | This class is responsible for deciding whether each cookie should be |
| 68 | accepted from / returned to the server. |
| 69 | \end{classdesc} |
| 70 | |
| 71 | \begin{classdesc}{DefaultCookiePolicy}{ |
| 72 | blocked_domains=\constant{None}, |
| 73 | allowed_domains=\constant{None}, |
| 74 | netscape=\constant{True}, rfc2965=\constant{False}, |
| 75 | hide_cookie2=\constant{False}, |
| 76 | strict_domain=\constant{False}, |
| 77 | strict_rfc2965_unverifiable=\constant{True}, |
| 78 | strict_ns_unverifiable=\constant{False}, |
| 79 | strict_ns_domain=\constant{DefaultCookiePolicy.DomainLiberal}, |
| 80 | strict_ns_set_initial_dollar=\constant{False}, |
| 81 | strict_ns_set_path=\constant{False} |
| 82 | } |
| 83 | |
| 84 | Constructor arguments should be passed as keyword arguments only. |
| 85 | \var{blocked_domains} is a sequence of domain names that we never |
| 86 | accept cookies from, nor return cookies to. \var{allowed_domains} if |
| 87 | not \constant{None}, this is a sequence of the only domains for which |
| 88 | we accept and return cookies. For all other arguments, see the |
| 89 | documentation for \class{CookiePolicy} and \class{DefaultCookiePolicy} |
| 90 | objects. |
| 91 | |
| 92 | \class{DefaultCookiePolicy} implements the standard accept / reject |
| 93 | rules for Netscape and RFC 2965 cookies. RFC 2109 cookies |
| 94 | (ie. cookies received in a \mailheader{Set-Cookie} header with a |
| 95 | version cookie-attribute of 1) are treated according to the RFC 2965 |
| 96 | rules. \class{DefaultCookiePolicy} also provides some parameters to |
| 97 | allow some fine-tuning of policy. |
| 98 | \end{classdesc} |
| 99 | |
| 100 | \begin{classdesc}{Cookie}{} |
| 101 | This class represents Netscape, RFC 2109 and RFC 2965 cookies. It is |
| 102 | not expected that users of \module{cookielib} construct their own |
| 103 | \class{Cookie} instances. Instead, if necessary, call |
| 104 | \method{make_cookies()} on a \class{CookieJar} instance. |
| 105 | \end{classdesc} |
| 106 | |
| 107 | \begin{seealso} |
| 108 | |
| 109 | \seemodule{urllib2}{URL opening with automatic cookie handling.} |
| 110 | |
| 111 | \seemodule{Cookie}{HTTP cookie classes, principally useful for |
| 112 | server-side code. The \module{cookielib} and \module{Cookie} modules |
| 113 | do not depend on each other.} |
| 114 | |
| 115 | \seeurl{http://wwwsearch.sf.net/ClientCookie/}{Extensions to this |
| 116 | module, including a class for reading Microsoft Internet Explorer |
| 117 | cookies on Windows.} |
| 118 | |
| 119 | \seeurl{http://www.netscape.com/newsref/std/cookie_spec.html}{The |
| 120 | specification of the original Netscape cookie protocol. Though this |
| 121 | is still the dominant protocol, the 'Netscape cookie protocol' |
| 122 | implemented by all the major browsers (and \module{cookielib}) only |
| 123 | bears a passing resemblance to the one sketched out in |
| 124 | \code{cookie_spec.html}.} |
| 125 | |
| 126 | \seerfc{2109}{HTTP State Management Mechanism}{Obsoleted by RFC 2965. |
| 127 | Uses \mailheader{Set-Cookie} with version=1.} |
| 128 | |
| 129 | \seerfc{2965}{HTTP State Management Mechanism}{The Netscape protocol |
| 130 | with the bugs fixed. Uses \mailheader{Set-Cookie2} in place of |
| 131 | \mailheader{Set-Cookie}. Not widely used.} |
| 132 | |
| 133 | \seeurl{http://kristol.org/cookie/errata.html}{Unfinished errata to |
| 134 | RFC 2965.} |
| 135 | |
| 136 | \seerfc{2964}{Use of HTTP State Management}{} |
| 137 | |
| 138 | \end{seealso} |
| 139 | |
| 140 | |
| 141 | \subsection{CookieJar and FileCookieJar Objects \label{cookie-jar-objects}} |
| 142 | |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 143 | \class{CookieJar} objects support the iterator protocol for iterating |
| 144 | over contained \class{Cookie} objects. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 145 | |
| 146 | \class{CookieJar} has the following methods: |
| 147 | |
| 148 | \begin{methoddesc}[CookieJar]{add_cookie_header}{request} |
| 149 | Add correct \mailheader{Cookie} header to \var{request}. |
| 150 | |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 151 | If policy allows (ie. the \member{rfc2965} and \member{hide_cookie2} |
| 152 | attributes of the \class{CookieJar}'s \class{CookiePolicy} instance |
| 153 | are true and false respectively), the \mailheader{Cookie2} header is |
| 154 | also added when appropriate. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 155 | |
| 156 | The \var{request} object (usually a \class{urllib2.Request} instance) |
| 157 | must support the methods \method{get_full_url()}, \method{get_host()}, |
| 158 | \method{get_type()}, \method{unverifiable()}, |
| 159 | \method{get_origin_req_host()}, \method{has_header()}, |
| 160 | \method{get_header()}, \method{header_items()}, and |
| 161 | \method{add_unredirected_header()},as documented by \module{urllib2}. |
| 162 | \end{methoddesc} |
| 163 | |
| 164 | \begin{methoddesc}[CookieJar]{extract_cookies}{response, request} |
| 165 | Extract cookies from HTTP \var{response} and store them in the |
| 166 | \class{CookieJar}, where allowed by policy. |
| 167 | |
| 168 | The \class{CookieJar} will look for allowable \mailheader{Set-Cookie} |
| 169 | and \mailheader{Set-Cookie2} headers in the \var{response} argument, |
| 170 | and store cookies as appropriate (subject to the |
| 171 | \method{CookiePolicy.set_ok()} method's approval). |
| 172 | |
| 173 | The \var{response} object (usually the result of a call to |
| 174 | \method{urllib2.urlopen()}, or similar) should support an |
| 175 | \method{info()} method, which returns an object with a |
| 176 | \method{getallmatchingheaders()} method (usually a |
| 177 | \class{mimetools.Message} instance). |
| 178 | |
| 179 | The \var{request} object (usually a \class{urllib2.Request} instance) |
| 180 | must support the methods \method{get_full_url()}, \method{get_host()}, |
| 181 | \method{unverifiable()}, and \method{get_origin_req_host()}, as |
| 182 | documented by \module{urllib2}. The request is used to set default |
| 183 | values for cookie-attributes as well as for checking that the cookie |
| 184 | is allowed to be set. |
| 185 | \end{methoddesc} |
| 186 | |
| 187 | \begin{methoddesc}[CookieJar]{set_policy}{policy} |
| 188 | Set the \class{CookiePolicy} instance to be used. |
| 189 | \end{methoddesc} |
| 190 | |
| 191 | \begin{methoddesc}[CookieJar]{make_cookies}{response, request} |
| 192 | Return sequence of \class{Cookie} objects extracted from |
| 193 | \var{response} object. |
| 194 | |
| 195 | See the documentation for \method{extract_cookies} for the interfaces |
| 196 | required of the \var{response} and \var{request} arguments. |
| 197 | \end{methoddesc} |
| 198 | |
| 199 | \begin{methoddesc}[CookieJar]{set_cookie_if_ok}{cookie, request} |
| 200 | Set a \class{Cookie} if policy says it's OK to do so. |
| 201 | \end{methoddesc} |
| 202 | |
| 203 | \begin{methoddesc}[CookieJar]{set_cookie}{cookie} |
| 204 | Set a \class{Cookie}, without checking with policy to see whether or |
| 205 | not it should be set. |
| 206 | \end{methoddesc} |
| 207 | |
| 208 | \begin{methoddesc}[CookieJar]{clear}{\optional{domain\optional{, |
| 209 | path\optional{, name}}}} |
| 210 | Clear some cookies. |
| 211 | |
| 212 | If invoked without arguments, clear all cookies. If given a single |
| 213 | argument, only cookies belonging to that \var{domain} will be removed. |
| 214 | If given two arguments, cookies belonging to the specified |
| 215 | \var{domain} and URL \var{path} are removed. If given three |
| 216 | arguments, then the cookie with the specified \var{domain}, \var{path} |
| 217 | and \var{name} is removed. |
| 218 | |
| 219 | Raises \exception{KeyError} if no matching cookie exists. |
| 220 | \end{methoddesc} |
| 221 | |
| 222 | \begin{methoddesc}[CookieJar]{clear_session_cookies}{} |
| 223 | Discard all session cookies. |
| 224 | |
| 225 | Discards all contained cookies that have a true \member{discard} |
| 226 | attribute (usually because they had either no \code{max-age} or |
| 227 | \code{expires} cookie-attribute, or an explicit \code{discard} |
| 228 | cookie-attribute). For interactive browsers, the end of a session |
| 229 | usually corresponds to closing the browser window. |
| 230 | |
| 231 | Note that the \method{save()} method won't save session cookies |
| 232 | anyway, unless you ask otherwise by passing a true |
| 233 | \var{ignore_discard} argument. |
| 234 | \end{methoddesc} |
| 235 | |
| 236 | \class{FileCookieJar} implements the following additional methods: |
| 237 | |
| 238 | \begin{methoddesc}[FileCookieJar]{save}{filename=\constant{None}, |
| 239 | ignore_discard=\constant{False}, ignore_expires=\constant{False}} |
| 240 | Save cookies to a file. |
| 241 | |
| 242 | This base class raises \class{NotImplementedError}. Subclasses may |
| 243 | leave this method unimplemented. |
| 244 | |
| 245 | \var{filename} is the name of file in which to save cookies. If |
| 246 | \var{filename} is not specified, \member{self.filename} is used (whose |
| 247 | default is the value passed to the constructor, if any); if |
| 248 | \member{self.filename} is \constant{None}, \exception{ValueError} is |
| 249 | raised. |
| 250 | |
| 251 | \var{ignore_discard}: save even cookies set to be discarded. |
| 252 | \var{ignore_expires}: save even cookies that have expired |
| 253 | |
| 254 | The file is overwritten if it already exists, thus wiping all the |
| 255 | cookies it contains. Saved cookies can be restored later using the |
| 256 | \method{load()} or \method{revert()} methods. |
| 257 | \end{methoddesc} |
| 258 | |
| 259 | \begin{methoddesc}[FileCookieJar]{load}{filename=\constant{None}, |
| 260 | ignore_discard=\constant{False}, ignore_expires=\constant{False}} |
| 261 | Load cookies from a file. |
| 262 | |
| 263 | Old cookies are kept unless overwritten by newly loaded ones. |
| 264 | |
| 265 | Arguments are as for \method{save()}. |
| 266 | |
| 267 | The named file must be in the format understood by the class, or |
| 268 | \exception{LoadError} will be raised. |
| 269 | \end{methoddesc} |
| 270 | |
| 271 | \begin{methoddesc}[FileCookieJar]{revert}{filename=\constant{None}, |
| 272 | ignore_discard=\constant{False}, ignore_expires=\constant{False}} |
| 273 | Clear all cookies and reload cookies from a saved file. |
| 274 | |
| 275 | Raises \exception{cookielib.LoadError} or \exception{IOError} if |
| 276 | reversion is not successful; the object's state will not be altered if |
| 277 | this happens. |
| 278 | \end{methoddesc} |
| 279 | |
| 280 | \class{FileCookieJar} instances have the following public attributes: |
| 281 | |
| 282 | \begin{memberdesc}{filename} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 283 | Filename of default file in which to keep cookies. This attribute may |
| 284 | be assigned to. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 285 | \end{memberdesc} |
| 286 | |
| 287 | \begin{memberdesc}{delayload} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 288 | If true, load cookies lazily from disk. This attribute should not be |
| 289 | assigned to. This is only a hint, since this only affects |
| 290 | performance, not behaviour (unless the cookies on disk are changing). |
| 291 | A \class{CookieJar} object may ignore it. None of the |
| 292 | \class{FileCookieJar} classes included in the standard library lazily |
| 293 | loads cookies. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 294 | \end{memberdesc} |
| 295 | |
| 296 | |
| 297 | \subsection{FileCookieJar subclasses and co-operation with web browsers |
| 298 | \label{file-cookie-jar-classes}} |
| 299 | |
| 300 | The following \class{CookieJar} subclasses are provided for reading |
| 301 | and writing . Further \class{CookieJar} subclasses, including one |
| 302 | that reads Microsoft Internet Explorer cookies, are available at |
| 303 | \url{http://wwwsearch.sf.net/ClientCookie/}. |
| 304 | |
| 305 | \begin{classdesc}{MozillaCookieJar}{filename, delayload=\constant{None}, |
| 306 | policy=\constant{None}} |
| 307 | A \class{FileCookieJar} that can load from and save cookies to disk in |
| 308 | the Mozilla \code{cookies.txt} file format (which is also used by the |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 309 | Lynx and Netscape browsers). \note{This loses information about RFC |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 310 | 2965 cookies, and also about newer or non-standard cookie-attributes |
| 311 | such as \code{port}.} |
| 312 | |
| 313 | \warning{Back up your cookies before saving if you have cookies whose |
| 314 | loss / corruption would be inconvenient (there are some subtleties |
| 315 | which may lead to slight changes in the file over a load / save |
| 316 | round-trip).} |
| 317 | |
| 318 | Also note that cookies saved while Mozilla is running will get |
| 319 | clobbered by Mozilla. |
| 320 | \end{classdesc} |
| 321 | |
| 322 | \begin{classdesc}{LWPCookieJar}{filename, delayload=\constant{None}, |
| 323 | policy=\constant{None}} |
| 324 | A \class{FileCookieJar} that can load from and save cookies to disk in |
| 325 | format compatible with the libwww-perl library's \code{Set-Cookie3} |
| 326 | file format. This is convenient if you want to store cookies in a |
| 327 | human-readable file. |
| 328 | \end{classdesc} |
| 329 | |
| 330 | |
| 331 | \subsection{CookiePolicy Objects \label{cookie-policy-objects}} |
| 332 | |
| 333 | Objects implementing the \class{CookiePolicy} interface have the |
| 334 | following methods: |
| 335 | |
| 336 | \begin{methoddesc}[CookiePolicy]{set_ok}{cookie, request} |
| 337 | Return boolean value indicating whether cookie should be accepted from server. |
| 338 | |
| 339 | \var{cookie} is a \class{cookielib.Cookie} instance. \var{request} is |
| 340 | an object implementing the interface defined by the documentation for |
| 341 | \method{CookieJar.extract_cookies()}. |
| 342 | \end{methoddesc} |
| 343 | |
| 344 | \begin{methoddesc}[CookiePolicy]{return_ok}{cookie, request} |
| 345 | Return boolean value indicating whether cookie should be returned to server. |
| 346 | |
| 347 | \var{cookie} is a \class{cookielib.Cookie} instance. \var{request} is |
| 348 | an object implementing the interface defined by the documentation for |
| 349 | \method{CookieJar.add_cookie_header()}. |
| 350 | \end{methoddesc} |
| 351 | |
| 352 | \begin{methoddesc}[CookiePolicy]{domain_return_ok}{domain, request} |
| 353 | Return false if cookies should not be returned, given cookie domain. |
| 354 | |
| 355 | This method is an optimization. It removes the need for checking |
| 356 | every cookie with a particular domain (which might involve reading |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 357 | many files). Returning true from \method{domain_return_ok()} and |
| 358 | \method{path_return_ok()} leaves all the work to \method{return_ok()}. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 359 | |
| 360 | If \method{domain_return_ok()} returns true for the cookie domain, |
| 361 | \method{path_return_ok()} is called for the cookie path. Otherwise, |
| 362 | \method{path_return_ok()} and \method{return_ok()} are never called |
| 363 | for that cookie domain. If \method{path_return_ok()} returns true, |
| 364 | \method{return_ok()} is called with the \class{Cookie} object itself |
| 365 | for a full check. Otherwise, \method{return_ok()} is never called for |
| 366 | that cookie path. |
| 367 | |
| 368 | Note that \method{domain_return_ok()} is called for every |
| 369 | \emph{cookie} domain, not just for the \emph{request} domain. For |
| 370 | example, the function might be called with both \code{".example.com"} |
| 371 | and \code{"www.example.com"} if the request domain is |
| 372 | \code{"www.example.com"}. The same goes for |
| 373 | \method{path_return_ok()}. |
| 374 | |
| 375 | The \var{request} argument is as documented for \method{return_ok()}. |
| 376 | \end{methoddesc} |
| 377 | |
| 378 | \begin{methoddesc}[CookiePolicy]{path_return_ok}{path, request} |
| 379 | Return false if cookies should not be returned, given cookie path. |
| 380 | |
| 381 | See the documentation for \method{domain_return_ok()}. |
| 382 | \end{methoddesc} |
| 383 | |
| 384 | |
| 385 | In addition to implementing the methods above, implementations of the |
| 386 | \class{CookiePolicy} interface must also supply the following |
| 387 | attributes, indicating which protocols should be used, and how. All |
| 388 | of these attributes may be assigned to. |
| 389 | |
| 390 | \begin{memberdesc}{netscape} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 391 | Implement Netscape protocol. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 392 | \end{memberdesc} |
| 393 | \begin{memberdesc}{rfc2965} |
| 394 | Implement RFC 2965 protocol. |
| 395 | \end{memberdesc} |
| 396 | \begin{memberdesc}{hide_cookie2} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 397 | Don't add \mailheader{Cookie2} header to requests (the presence of |
| 398 | this header indicates to the server that we understand RFC 2965 |
| 399 | cookies). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 400 | \end{memberdesc} |
| 401 | |
| 402 | The most useful way to define a \class{CookiePolicy} class is by |
| 403 | subclassing from \class{DefaultCookiePolicy} and overriding some or |
| 404 | all of the methods above. \class{CookiePolicy} itself may be used as |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 405 | a 'null policy' to allow setting and receiving any and all cookies |
| 406 | (this is unlikely to be useful). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 407 | |
| 408 | |
| 409 | \subsection{DefaultCookiePolicy Objects \label{default-cookie-policy-objects}} |
| 410 | |
| 411 | Implements the standard rules for accepting and returning cookies. |
| 412 | |
| 413 | Both RFC 2965 and Netscape cookies are covered. RFC 2965 handling is |
| 414 | switched off by default. |
| 415 | |
| 416 | The easiest way to provide your own policy is to override this class |
| 417 | and call its methods in your overriden implementations before adding |
| 418 | your own additional checks: |
| 419 | |
| 420 | \begin{verbatim} |
| 421 | import cookielib |
| 422 | class MyCookiePolicy(cookielib.DefaultCookiePolicy): |
| 423 | def set_ok(self, cookie, request): |
| 424 | if not cookielib.DefaultCookiePolicy.set_ok(self, cookie, request): |
| 425 | return False |
| 426 | if i_dont_want_to_store_this_cookie(cookie): |
| 427 | return False |
| 428 | return True |
| 429 | \end{verbatim} |
| 430 | |
| 431 | In addition to the features required to implement the |
| 432 | \class{CookiePolicy} interface, this class allows you to block and |
| 433 | allow domains from setting and receiving cookies. There are also some |
| 434 | strictness switches that allow you to tighten up the rather loose |
| 435 | Netscape protocol rules a little bit (at the cost of blocking some |
| 436 | benign cookies). |
| 437 | |
| 438 | A domain blacklist and whitelist is provided (both off by default). |
| 439 | Only domains not in the blacklist and present in the whitelist (if the |
| 440 | whitelist is active) participate in cookie setting and returning. Use |
| 441 | the \var{blocked_domains} constructor argument, and |
| 442 | \method{blocked_domains()} and \method{set_blocked_domains()} methods |
| 443 | (and the corresponding argument and methods for |
| 444 | \var{allowed_domains}). If you set a whitelist, you can turn it off |
| 445 | again by setting it to \constant{None}. |
| 446 | |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 447 | Domains in block or allow lists that do not start with a dot must |
| 448 | equal the cookie domain to be matched. For example, |
| 449 | \code{"example.com"} matches a blacklist entry of |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 450 | \code{"example.com"}, but \code{"www.example.com"} does not. Domains |
| 451 | that do start with a dot are matched by more specific domains too. |
| 452 | For example, both \code{"www.example.com"} and |
| 453 | \code{"www.coyote.example.com"} match \code{".example.com"} (but |
| 454 | \code{"example.com"} itself does not). IP addresses are an exception, |
| 455 | and must match exactly. For example, if blocked_domains contains |
| 456 | \code{"192.168.1.2"} and \code{".168.1.2"}, 192.168.1.2 is blocked, |
| 457 | but 193.168.1.2 is not. |
| 458 | |
| 459 | \class{DefaultCookiePolicy} implements the following additional |
| 460 | methods: |
| 461 | |
| 462 | \begin{methoddesc}[DefaultCookiePolicy]{blocked_domains}{} |
| 463 | Return the sequence of blocked domains (as a tuple). |
| 464 | \end{methoddesc} |
| 465 | |
| 466 | \begin{methoddesc}[DefaultCookiePolicy]{set_blocked_domains} |
| 467 | {blocked_domains} |
| 468 | Set the sequence of blocked domains. |
| 469 | \end{methoddesc} |
| 470 | |
| 471 | \begin{methoddesc}[DefaultCookiePolicy]{is_blocked}{domain} |
| 472 | Return whether \var{domain} is on the blacklist for setting or |
| 473 | receiving cookies. |
| 474 | \end{methoddesc} |
| 475 | |
| 476 | \begin{methoddesc}[DefaultCookiePolicy]{allowed_domains}{} |
| 477 | Return \constant{None}, or the sequence of allowed domains (as a tuple). |
| 478 | \end{methoddesc} |
| 479 | |
| 480 | \begin{methoddesc}[DefaultCookiePolicy]{set_allowed_domains} |
| 481 | {allowed_domains} |
| 482 | Set the sequence of allowed domains, or \constant{None}. |
| 483 | \end{methoddesc} |
| 484 | |
| 485 | \begin{methoddesc}[DefaultCookiePolicy]{is_not_allowed}{domain} |
| 486 | Return whether \var{domain} is not on the whitelist for setting or |
| 487 | receiving cookies. |
| 488 | \end{methoddesc} |
| 489 | |
| 490 | \class{DefaultCookiePolicy} instances have the following attributes, |
| 491 | which are all initialised from the constructor arguments of the same |
| 492 | name, and which may all be assigned to. |
| 493 | |
| 494 | General strictness switches: |
| 495 | |
| 496 | \begin{memberdesc}{strict_domain} |
| 497 | Don't allow sites to set two-component domains with country-code |
| 498 | top-level domains like \code{.co.uk}, \code{.gov.uk}, |
| 499 | \code{.co.nz}.etc. This is far from perfect and isn't guaranteed to |
| 500 | work! |
| 501 | \end{memberdesc} |
| 502 | |
| 503 | RFC 2965 protocol strictness switches: |
| 504 | |
| 505 | \begin{memberdesc}{strict_rfc2965_unverifiable} |
| 506 | Follow RFC 2965 rules on unverifiable transactions (usually, an |
| 507 | unverifiable transaction is one resulting from a redirect or a request |
| 508 | for an image hosted on another site). If this is false, cookies are |
| 509 | \emph{never} blocked on the basis of verifiability |
| 510 | \end{memberdesc} |
| 511 | |
| 512 | Netscape protocol strictness switches: |
| 513 | |
| 514 | \begin{memberdesc}{strict_ns_unverifiable} |
| 515 | apply RFC 2965 rules on unverifiable transactions even to Netscape |
| 516 | cookies |
| 517 | \end{memberdesc} |
| 518 | \begin{memberdesc}{strict_ns_domain} |
| 519 | Flags indicating how strict to be with domain-matching rules for |
| 520 | Netscape cookies. See below for acceptable values. |
| 521 | \end{memberdesc} |
| 522 | \begin{memberdesc}{strict_ns_set_initial_dollar} |
| 523 | Ignore cookies in Set-Cookie: headers that have names starting with |
| 524 | \code{'\$'}. |
| 525 | \end{memberdesc} |
| 526 | \begin{memberdesc}{strict_ns_set_path} |
| 527 | Don't allow setting cookies whose path doesn't path-match request URI. |
| 528 | \end{memberdesc} |
| 529 | |
| 530 | \member{strict_ns_domain} is a collection of flags. Its value is |
| 531 | constructed by or-ing together (for example, |
| 532 | \code{DomainStrictNoDots|DomainStrictNonDomain} means both flags are |
| 533 | set). |
| 534 | |
| 535 | \begin{memberdesc}{DomainStrictNoDots} |
| 536 | When setting cookies, the 'host prefix' must not contain a dot |
| 537 | (eg. \code{www.foo.bar.com} can't set a cookie for \code{.bar.com}, |
| 538 | because \code{www.foo} contains a dot). |
| 539 | \end{memberdesc} |
| 540 | \begin{memberdesc}{DomainStrictNonDomain} |
| 541 | Cookies that did not explicitly specify a \code{domain} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 542 | cookie-attribute can only be returned to a domain equal to the domain |
| 543 | that set the cookie (eg. \code{spam.example.com} won't be returned |
| 544 | cookies from \code{example.com} that had no \code{domain} |
| 545 | cookie-attribute). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 546 | \end{memberdesc} |
| 547 | \begin{memberdesc}{DomainRFC2965Match} |
| 548 | When setting cookies, require a full RFC 2965 domain-match. |
| 549 | \end{memberdesc} |
| 550 | |
| 551 | The following attributes are provided for convenience, and are the |
| 552 | most useful combinations of the above flags: |
| 553 | |
| 554 | \begin{memberdesc}{DomainLiberal} |
| 555 | Equivalent to 0 (ie. all of the above Netscape domain strictness flags |
| 556 | switched off). |
| 557 | \end{memberdesc} |
| 558 | \begin{memberdesc}{DomainStrict} |
| 559 | Equivalent to \code{DomainStrictNoDots|DomainStrictNonDomain}. |
| 560 | \end{memberdesc} |
| 561 | |
| 562 | |
| 563 | \subsection{Cookie Objects \label{cookie-jar-objects}} |
| 564 | |
| 565 | \class{Cookie} instances have Python attributes roughly corresponding |
| 566 | to the standard cookie-attributes specified in the various cookie |
| 567 | standards. The correspondence is not one-to-one, because there are |
| 568 | complicated rules for assigning default values, and because the |
| 569 | \code{max-age} and \code{expires} cookie-attributes contain equivalent |
| 570 | information. |
| 571 | |
| 572 | Assignment to these attributes should not be necessary other than in |
| 573 | rare circumstances in a \class{CookiePolicy} method. The class does |
| 574 | not enforce internal consistency, so you should know what you're |
| 575 | doing if you do that. |
| 576 | |
| 577 | \begin{memberdesc}[Cookie]{version} |
| 578 | Integer or \constant{None}. Netscape cookies have version 0. RFC |
| 579 | 2965 and RFC 2109 cookies have version 1. |
| 580 | \end{memberdesc} |
| 581 | \begin{memberdesc}[Cookie]{name} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 582 | Cookie name (a string). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 583 | \end{memberdesc} |
| 584 | \begin{memberdesc}[Cookie]{value} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 585 | Cookie value (a string), or \constant{None}. |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 586 | \end{memberdesc} |
| 587 | \begin{memberdesc}[Cookie]{port} |
| 588 | String representing a port or a set of ports (eg. '80', or '80,8080'), |
| 589 | or \constant{None}. |
| 590 | \end{memberdesc} |
| 591 | \begin{memberdesc}[Cookie]{path} |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 592 | Cookie path (a string, eg. \code{'/acme/rocket_launchers'}). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 593 | \end{memberdesc} |
| 594 | \begin{memberdesc}[Cookie]{secure} |
| 595 | True if cookie should only be returned over a secure connection. |
| 596 | \end{memberdesc} |
| 597 | \begin{memberdesc}[Cookie]{expires} |
| 598 | Integer expiry date in seconds since epoch, or \constant{None}. See |
| 599 | also the \method{is_expired()} method. |
| 600 | \end{memberdesc} |
| 601 | \begin{memberdesc}[Cookie]{discard} |
| 602 | True if this is a session cookie. |
| 603 | \end{memberdesc} |
| 604 | \begin{memberdesc}[Cookie]{comment} |
| 605 | String comment from the server explaining the function of this cookie, |
| 606 | or \constant{None}. |
| 607 | \end{memberdesc} |
| 608 | \begin{memberdesc}[Cookie]{comment_url} |
| 609 | URL linking to a comment from the server explaining the function of |
| 610 | this cookie, or \constant{None}. |
| 611 | \end{memberdesc} |
| 612 | |
| 613 | \begin{memberdesc}[Cookie]{port_specified} |
| 614 | True if a port or set of ports was explicitly specified by the server |
| 615 | (in the \mailheader{Set-Cookie} / \mailheader{Set-Cookie2} header). |
| 616 | \end{memberdesc} |
| 617 | \begin{memberdesc}[Cookie]{domain_specified} |
| 618 | True if a domain was explicitly specified by the server. |
| 619 | \end{memberdesc} |
| 620 | \begin{memberdesc}[Cookie]{domain_initial_dot} |
| 621 | True if the domain explicitly specified by the server began with a |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 622 | dot (\code{'.'}). |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 623 | \end{memberdesc} |
| 624 | |
| 625 | Cookies may have additional non-standard cookie-attributes. These may |
| 626 | be accessed using the following methods: |
| 627 | |
| 628 | \begin{methoddesc}[Cookie]{has_nonstandard_attr}{name} |
| 629 | Return true if cookie has the named cookie-attribute. |
| 630 | \end{methoddesc} |
| 631 | \begin{methoddesc}[Cookie]{get_nonstandard_attr}{name, default=\constant{None}} |
| 632 | If cookie has the named cookie-attribute, return its value. |
| 633 | Otherwise, return \var{default}. |
| 634 | \end{methoddesc} |
| 635 | \begin{methoddesc}[Cookie]{set_nonstandard_attr}{name, value} |
| 636 | Set the value of the named cookie-attribute. |
| 637 | \end{methoddesc} |
| 638 | |
| 639 | The \class{Cookie} class also defines the following method: |
| 640 | |
| 641 | \begin{methoddesc}[Cookie]{is_expired}{\optional{now=\constant{None}}} |
| 642 | True if cookie has passed the time at which the server requested it |
| 643 | should expire. If \var{now} is given (in seconds since the epoch), |
| 644 | return whether the cookie has expired at the specified time. |
| 645 | \end{methoddesc} |
| 646 | |
| 647 | |
| 648 | \subsection{Examples \label{cookielib-examples}} |
| 649 | |
| 650 | The first example shows the most common usage of \module{cookielib}: |
| 651 | |
| 652 | \begin{verbatim} |
| 653 | import cookielib, urllib2 |
| 654 | cj = cookielib.CookieJar() |
| 655 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) |
| 656 | r = opener.open("http://example.com/") |
| 657 | \end{verbatim} |
| 658 | |
| 659 | This example illustrates how to open a URL using your Netscape, |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 660 | Mozilla, or Lynx cookies (assumes \UNIX{}/Netscape convention for |
| 661 | location of the cookies file): |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 662 | |
| 663 | \begin{verbatim} |
| 664 | import os, cookielib, urllib2 |
| 665 | cj = cookielib.MozillaCookieJar() |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 666 | cj.load(os.path.join(os.environ["HOME"], ".netscape/cookies.txt")) |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 667 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) |
| 668 | r = opener.open("http://example.com/") |
| 669 | \end{verbatim} |
| 670 | |
| 671 | The next example illustrates the use of \class{DefaultCookiePolicy}. |
| 672 | Turn on RFC 2965 cookies, be more strict about domains when setting |
| 673 | and returning Netscape cookies, and block some domains from setting |
| 674 | cookies or having them returned: |
| 675 | |
| 676 | \begin{verbatim} |
| 677 | import urllib2 |
Andrew M. Kuchling | 3a2418a | 2004-07-10 18:41:28 +0000 | [diff] [blame] | 678 | from cookielib import CookieJar, DefaultCookiePolicy |
| 679 | policy = DefaultCookiePolicy( |
| 680 | rfc2965=True, strict_ns_domain=Policy.DomainStrict, |
| 681 | blocked_domains=["ads.net", ".ads.net"]) |
Martin v. Löwis | 2a6ba90 | 2004-05-31 18:22:40 +0000 | [diff] [blame] | 682 | cj = CookieJar(policy) |
| 683 | opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) |
| 684 | r = opener.open("http://example.com/") |
| 685 | \end{verbatim} |