blob: 9153738156b132d148f640d1af4c4c189354440b [file] [log] [blame]
Éric Araujo3a9f58f2011-06-01 20:42:49 +02001:mod:`packaging.pypi.simple` --- Crawler using the PyPI "simple" interface
2==========================================================================
3
4.. module:: packaging.pypi.simple
5 :synopsis: Crawler using the screen-scraping "simple" interface to fetch info
6 and distributions.
7
8
Éric Araujoe043b6b2011-06-19 19:23:48 +02009The class provided by :mod:`packaging.pypi.simple` can access project indexes
10and provide useful information about distributions. PyPI, other indexes and
11local indexes are supported.
Éric Araujo3a9f58f2011-06-01 20:42:49 +020012
Éric Araujoe043b6b2011-06-19 19:23:48 +020013You should use this module to search distributions by name and versions, process
14index external pages and download distributions. It is not suited for things
15that will end up in too long index processing (like "finding all distributions
16with a specific version, no matter the name"); use :mod:`packaging.pypi.xmlrpc`
17for that.
Éric Araujo3a9f58f2011-06-01 20:42:49 +020018
19
20API
21---
22
Éric Araujoe043b6b2011-06-19 19:23:48 +020023.. class:: Crawler(index_url=DEFAULT_SIMPLE_INDEX_URL, \
24 prefer_final=False, prefer_source=True, \
25 hosts=('*',), follow_externals=False, \
26 mirrors_url=None, mirrors=None, timeout=15, \
Éric Araujodd2d55c2011-09-21 16:28:03 +020027 mirrors_max_tries=0)
Éric Araujoe043b6b2011-06-19 19:23:48 +020028
29 *index_url* is the address of the index to use for requests.
30
31 The first two parameters control the query results. *prefer_final*
32 indicates whether a final version (not alpha, beta or candidate) is to be
33 prefered over a newer but non-final version (for example, whether to pick
34 up 1.0 over 2.0a3). It is used only for queries that don't give a version
35 argument. Likewise, *prefer_source* tells whether to prefer a source
36 distribution over a binary one, if no distribution argument was prodived.
37
38 Other parameters are related to external links (that is links that go
39 outside the simple index): *hosts* is a list of hosts allowed to be
40 processed if *follow_externals* is true (default behavior is to follow all
41 hosts), *follow_externals* enables or disables following external links
42 (default is false, meaning disabled).
43
44 The remaining parameters are related to the mirroring infrastructure
45 defined in :PEP:`381`. *mirrors_url* gives a URL to look on for DNS
46 records giving mirror adresses; *mirrors* is a list of mirror URLs (see
47 the PEP). If both *mirrors* and *mirrors_url* are given, *mirrors_url*
48 will only be used if *mirrors* is set to ``None``. *timeout* is the time
49 (in seconds) to wait before considering a URL has timed out;
50 *mirrors_max_tries"* is the number of times to try requesting informations
51 on mirrors before switching.
52
53 The following methods are defined:
54
55 .. method:: get_distributions(project_name, version)
56
57 Return the distributions found in the index for the given release.
58
59 .. method:: get_metadata(project_name, version)
60
61 Return the metadata found on the index for this project name and
62 version. Currently downloads and unpacks a distribution to read the
63 PKG-INFO file.
64
65 .. method:: get_release(requirements, prefer_final=None)
66
67 Return one release that fulfills the given requirements.
68
69 .. method:: get_releases(requirements, prefer_final=None, force_update=False)
70
71 Search for releases and return a
72 :class:`~packaging.pypi.dist.ReleasesList` object containing the
73 results.
74
75 .. method:: search_projects(name=None)
76
77 Search the index for projects containing the given name and return a
78 list of matching names.
79
80 See also the base class :class:`packaging.pypi.base.BaseClient` for inherited
81 methods.
82
83
84.. data:: DEFAULT_SIMPLE_INDEX_URL
85
86 The address used by default by the crawler class. It is currently
87 ``'http://a.pypi.python.org/simple/'``, the main PyPI installation.
88
89
Éric Araujo3a9f58f2011-06-01 20:42:49 +020090
91
92Usage Exemples
93---------------
94
95To help you understand how using the `Crawler` class, here are some basic
96usages.
97
98Request the simple index to get a specific distribution
99^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
100
101Supposing you want to scan an index to get a list of distributions for
102the "foobar" project. You can use the "get_releases" method for that.
103The get_releases method will browse the project page, and return
104:class:`ReleaseInfo` objects for each found link that rely on downloads. ::
105
106 >>> from packaging.pypi.simple import Crawler
107 >>> crawler = Crawler()
108 >>> crawler.get_releases("FooBar")
109 [<ReleaseInfo "Foobar 1.1">, <ReleaseInfo "Foobar 1.2">]
110
111
112Note that you also can request the client about specific versions, using version
113specifiers (described in `PEP 345
114<http://www.python.org/dev/peps/pep-0345/#version-specifiers>`_)::
115
116 >>> client.get_releases("FooBar < 1.2")
117 [<ReleaseInfo "FooBar 1.1">, ]
118
119
120`get_releases` returns a list of :class:`ReleaseInfo`, but you also can get the
121best distribution that fullfil your requirements, using "get_release"::
122
123 >>> client.get_release("FooBar < 1.2")
124 <ReleaseInfo "FooBar 1.1">
125
126
127Download distributions
128^^^^^^^^^^^^^^^^^^^^^^
129
130As it can get the urls of distributions provided by PyPI, the `Crawler`
131client also can download the distributions and put it for you in a temporary
132destination::
133
134 >>> client.download("foobar")
135 /tmp/temp_dir/foobar-1.2.tar.gz
136
137
138You also can specify the directory you want to download to::
139
140 >>> client.download("foobar", "/path/to/my/dir")
141 /path/to/my/dir/foobar-1.2.tar.gz
142
143
144While downloading, the md5 of the archive will be checked, if not matches, it
145will try another time, then if fails again, raise `MD5HashDoesNotMatchError`.
146
147Internally, that's not the Crawler which download the distributions, but the
148`DistributionInfo` class. Please refer to this documentation for more details.
149
150
151Following PyPI external links
152^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
153
154The default behavior for packaging is to *not* follow the links provided
155by HTML pages in the "simple index", to find distributions related
156downloads.
157
158It's possible to tell the PyPIClient to follow external links by setting the
159`follow_externals` attribute, on instantiation or after::
160
161 >>> client = Crawler(follow_externals=True)
162
163or ::
164
165 >>> client = Crawler()
166 >>> client.follow_externals = True
167
168
169Working with external indexes, and mirrors
170^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
171
172The default `Crawler` behavior is to rely on the Python Package index stored
173on PyPI (http://pypi.python.org/simple).
174
175As you can need to work with a local index, or private indexes, you can specify
176it using the index_url parameter::
177
178 >>> client = Crawler(index_url="file://filesystem/path/")
179
180or ::
181
182 >>> client = Crawler(index_url="http://some.specific.url/")
183
184
185You also can specify mirrors to fallback on in case the first index_url you
186provided doesnt respond, or not correctly. The default behavior for
187`Crawler` is to use the list provided by Python.org DNS records, as
188described in the :PEP:`381` about mirroring infrastructure.
189
190If you don't want to rely on these, you could specify the list of mirrors you
191want to try by specifying the `mirrors` attribute. It's a simple iterable::
192
193 >>> mirrors = ["http://first.mirror","http://second.mirror"]
194 >>> client = Crawler(mirrors=mirrors)
195
196
197Searching in the simple index
198^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
199
200It's possible to search for projects with specific names in the package index.
201Assuming you want to find all projects containing the "distutils" keyword::
202
203 >>> c.search_projects("distutils")
204 [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project
205 "Packaging">, <Project "distutilscross">, <Project "lpdistutils">, <Project
206 "taras.recipe.distutils">, <Project "zerokspot.recipe.distutils">]
207
208
209You can also search the projects starting with a specific text, or ending with
210that text, using a wildcard::
211
212 >>> c.search_projects("distutils*")
213 [<Project "Distutils">, <Project "Packaging">, <Project "distutilscross">]
214
215 >>> c.search_projects("*distutils")
216 [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project
217 "lpdistutils">, <Project "taras.recipe.distutils">, <Project
218 "zerokspot.recipe.distutils">]