Blame - Doc/library/urllib.robotparser.rst - platform/external/python/cpython3

blob: f179de2f9263da6303e765a134b54e2033e4ebd2 [file] [log] [blame]

Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	1	:mod:`urllib.robotparser` --- Parser for robots.txt
				2	====================================================
				3
				4	.. module:: urllib.robotparser
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	5	:synopsis: Load a robots.txt file and answer questions about
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	6	fetchability of other URLs.
				7	.. sectionauthor:: Skip Montanaro <skip@pobox.com>
				8
				9
				10	.. index::
				11	single: WWW
				12	single: World Wide Web
				13	single: URL
				14	single: robots.txt
				15
				16	This module provides a single class, :class:`RobotFileParser`, which answers
				17	questions about whether or not a particular user agent can fetch a URL on the
				18	Web site that published the :file:`robots.txt` file. For more details on the
				19	structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
				20
				21
Terry Jan Reedy	f3f0681	2013-03-15 16:50:23 -0400	[diff] [blame]	22	.. class:: RobotFileParser(url='')
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	23
Terry Jan Reedy	f3f0681	2013-03-15 16:50:23 -0400	[diff] [blame]	24	This class provides methods to read, parse and answer questions about the
				25	:file:`robots.txt` file at url.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	26
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	27	.. method:: set_url(url)
				28
				29	Sets the URL referring to a :file:`robots.txt` file.
				30
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	31	.. method:: read()
				32
				33	Reads the :file:`robots.txt` URL and feeds it to the parser.
				34
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	35	.. method:: parse(lines)
				36
				37	Parses the lines argument.
				38
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	39	.. method:: can_fetch(useragent, url)
				40
				41	Returns ``True`` if the useragent is allowed to fetch the url
				42	according to the rules contained in the parsed :file:`robots.txt`
				43	file.
				44
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	45	.. method:: mtime()
				46
				47	Returns the time the ``robots.txt`` file was last fetched. This is
				48	useful for long-running web spiders that need to check for new
				49	``robots.txt`` files periodically.
				50
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	51	.. method:: modified()
				52
				53	Sets the time the ``robots.txt`` file was last fetched to the current
				54	time.
				55
Georg Brandl	0f7ede4	2008-06-23 11:23:31 +0000	[diff] [blame]	56
				57	The following example demonstrates basic use of the RobotFileParser class.
Senthil Kumaran	aca8fd7	2008-06-23 04:41:59 +0000	[diff] [blame]	58
				59	>>> import urllib.robotparser
				60	>>> rp = urllib.robotparser.RobotFileParser()
				61	>>> rp.set_url("http://www.musi-cal.com/robots.txt")
				62	>>> rp.read()
				63	>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
				64	False
				65	>>> rp.can_fetch("*", "http://www.musi-cal.com/")
				66	True
				67