blob: ea5edca9c12a7ce266c551f9442d2f0efed15e2c [file] [log] [blame]
Éric Araujo3a9f58f2011-06-01 20:42:49 +02001:mod:`packaging.pypi.simple` --- Crawler using the PyPI "simple" interface
2==========================================================================
3
4.. module:: packaging.pypi.simple
5 :synopsis: Crawler using the screen-scraping "simple" interface to fetch info
6 and distributions.
7
8
9`packaging.pypi.simple` can process Python Package Indexes and provides
10useful information about distributions. It also can crawl local indexes, for
11instance.
12
13You should use `packaging.pypi.simple` for:
14
15 * Search distributions by name and versions.
16 * Process index external pages.
17 * Download distributions by name and versions.
18
19And should not be used for:
20
21 * Things that will end up in too long index processing (like "finding all
22 distributions with a specific version, no matters the name")
23
24
25API
26---
27
28.. class:: Crawler
29
30
31Usage Exemples
32---------------
33
34To help you understand how using the `Crawler` class, here are some basic
35usages.
36
37Request the simple index to get a specific distribution
38^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39
40Supposing you want to scan an index to get a list of distributions for
41the "foobar" project. You can use the "get_releases" method for that.
42The get_releases method will browse the project page, and return
43:class:`ReleaseInfo` objects for each found link that rely on downloads. ::
44
45 >>> from packaging.pypi.simple import Crawler
46 >>> crawler = Crawler()
47 >>> crawler.get_releases("FooBar")
48 [<ReleaseInfo "Foobar 1.1">, <ReleaseInfo "Foobar 1.2">]
49
50
51Note that you also can request the client about specific versions, using version
52specifiers (described in `PEP 345
53<http://www.python.org/dev/peps/pep-0345/#version-specifiers>`_)::
54
55 >>> client.get_releases("FooBar < 1.2")
56 [<ReleaseInfo "FooBar 1.1">, ]
57
58
59`get_releases` returns a list of :class:`ReleaseInfo`, but you also can get the
60best distribution that fullfil your requirements, using "get_release"::
61
62 >>> client.get_release("FooBar < 1.2")
63 <ReleaseInfo "FooBar 1.1">
64
65
66Download distributions
67^^^^^^^^^^^^^^^^^^^^^^
68
69As it can get the urls of distributions provided by PyPI, the `Crawler`
70client also can download the distributions and put it for you in a temporary
71destination::
72
73 >>> client.download("foobar")
74 /tmp/temp_dir/foobar-1.2.tar.gz
75
76
77You also can specify the directory you want to download to::
78
79 >>> client.download("foobar", "/path/to/my/dir")
80 /path/to/my/dir/foobar-1.2.tar.gz
81
82
83While downloading, the md5 of the archive will be checked, if not matches, it
84will try another time, then if fails again, raise `MD5HashDoesNotMatchError`.
85
86Internally, that's not the Crawler which download the distributions, but the
87`DistributionInfo` class. Please refer to this documentation for more details.
88
89
90Following PyPI external links
91^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
92
93The default behavior for packaging is to *not* follow the links provided
94by HTML pages in the "simple index", to find distributions related
95downloads.
96
97It's possible to tell the PyPIClient to follow external links by setting the
98`follow_externals` attribute, on instantiation or after::
99
100 >>> client = Crawler(follow_externals=True)
101
102or ::
103
104 >>> client = Crawler()
105 >>> client.follow_externals = True
106
107
108Working with external indexes, and mirrors
109^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
110
111The default `Crawler` behavior is to rely on the Python Package index stored
112on PyPI (http://pypi.python.org/simple).
113
114As you can need to work with a local index, or private indexes, you can specify
115it using the index_url parameter::
116
117 >>> client = Crawler(index_url="file://filesystem/path/")
118
119or ::
120
121 >>> client = Crawler(index_url="http://some.specific.url/")
122
123
124You also can specify mirrors to fallback on in case the first index_url you
125provided doesnt respond, or not correctly. The default behavior for
126`Crawler` is to use the list provided by Python.org DNS records, as
127described in the :PEP:`381` about mirroring infrastructure.
128
129If you don't want to rely on these, you could specify the list of mirrors you
130want to try by specifying the `mirrors` attribute. It's a simple iterable::
131
132 >>> mirrors = ["http://first.mirror","http://second.mirror"]
133 >>> client = Crawler(mirrors=mirrors)
134
135
136Searching in the simple index
137^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
138
139It's possible to search for projects with specific names in the package index.
140Assuming you want to find all projects containing the "distutils" keyword::
141
142 >>> c.search_projects("distutils")
143 [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project
144 "Packaging">, <Project "distutilscross">, <Project "lpdistutils">, <Project
145 "taras.recipe.distutils">, <Project "zerokspot.recipe.distutils">]
146
147
148You can also search the projects starting with a specific text, or ending with
149that text, using a wildcard::
150
151 >>> c.search_projects("distutils*")
152 [<Project "Distutils">, <Project "Packaging">, <Project "distutilscross">]
153
154 >>> c.search_projects("*distutils")
155 [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project
156 "lpdistutils">, <Project "taras.recipe.distutils">, <Project
157 "zerokspot.recipe.distutils">]