Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 1 | :mod:`packaging.pypi.simple` --- Crawler using the PyPI "simple" interface |
| 2 | ========================================================================== |
| 3 | |
| 4 | .. module:: packaging.pypi.simple |
| 5 | :synopsis: Crawler using the screen-scraping "simple" interface to fetch info |
| 6 | and distributions. |
| 7 | |
| 8 | |
Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 9 | The class provided by :mod:`packaging.pypi.simple` can access project indexes |
| 10 | and provide useful information about distributions. PyPI, other indexes and |
| 11 | local indexes are supported. |
Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 12 | |
Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 13 | You should use this module to search distributions by name and versions, process |
| 14 | index external pages and download distributions. It is not suited for things |
| 15 | that will end up in too long index processing (like "finding all distributions |
| 16 | with a specific version, no matter the name"); use :mod:`packaging.pypi.xmlrpc` |
| 17 | for that. |
Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 18 | |
| 19 | |
| 20 | API |
| 21 | --- |
| 22 | |
Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 23 | .. class:: Crawler(index_url=DEFAULT_SIMPLE_INDEX_URL, \ |
| 24 | prefer_final=False, prefer_source=True, \ |
| 25 | hosts=('*',), follow_externals=False, \ |
| 26 | mirrors_url=None, mirrors=None, timeout=15, \ |
Éric Araujo | dd2d55c | 2011-09-21 16:28:03 +0200 | [diff] [blame] | 27 | mirrors_max_tries=0) |
Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 28 | |
| 29 | *index_url* is the address of the index to use for requests. |
| 30 | |
| 31 | The first two parameters control the query results. *prefer_final* |
| 32 | indicates whether a final version (not alpha, beta or candidate) is to be |
| 33 | prefered over a newer but non-final version (for example, whether to pick |
| 34 | up 1.0 over 2.0a3). It is used only for queries that don't give a version |
| 35 | argument. Likewise, *prefer_source* tells whether to prefer a source |
| 36 | distribution over a binary one, if no distribution argument was prodived. |
| 37 | |
| 38 | Other parameters are related to external links (that is links that go |
| 39 | outside the simple index): *hosts* is a list of hosts allowed to be |
| 40 | processed if *follow_externals* is true (default behavior is to follow all |
| 41 | hosts), *follow_externals* enables or disables following external links |
| 42 | (default is false, meaning disabled). |
| 43 | |
| 44 | The remaining parameters are related to the mirroring infrastructure |
| 45 | defined in :PEP:`381`. *mirrors_url* gives a URL to look on for DNS |
| 46 | records giving mirror adresses; *mirrors* is a list of mirror URLs (see |
| 47 | the PEP). If both *mirrors* and *mirrors_url* are given, *mirrors_url* |
| 48 | will only be used if *mirrors* is set to ``None``. *timeout* is the time |
| 49 | (in seconds) to wait before considering a URL has timed out; |
| 50 | *mirrors_max_tries"* is the number of times to try requesting informations |
| 51 | on mirrors before switching. |
| 52 | |
| 53 | The following methods are defined: |
| 54 | |
| 55 | .. method:: get_distributions(project_name, version) |
| 56 | |
| 57 | Return the distributions found in the index for the given release. |
| 58 | |
| 59 | .. method:: get_metadata(project_name, version) |
| 60 | |
| 61 | Return the metadata found on the index for this project name and |
| 62 | version. Currently downloads and unpacks a distribution to read the |
| 63 | PKG-INFO file. |
| 64 | |
| 65 | .. method:: get_release(requirements, prefer_final=None) |
| 66 | |
| 67 | Return one release that fulfills the given requirements. |
| 68 | |
| 69 | .. method:: get_releases(requirements, prefer_final=None, force_update=False) |
| 70 | |
| 71 | Search for releases and return a |
| 72 | :class:`~packaging.pypi.dist.ReleasesList` object containing the |
| 73 | results. |
| 74 | |
| 75 | .. method:: search_projects(name=None) |
| 76 | |
| 77 | Search the index for projects containing the given name and return a |
| 78 | list of matching names. |
| 79 | |
| 80 | See also the base class :class:`packaging.pypi.base.BaseClient` for inherited |
| 81 | methods. |
| 82 | |
| 83 | |
| 84 | .. data:: DEFAULT_SIMPLE_INDEX_URL |
| 85 | |
| 86 | The address used by default by the crawler class. It is currently |
| 87 | ``'http://a.pypi.python.org/simple/'``, the main PyPI installation. |
| 88 | |
| 89 | |
Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 90 | |
| 91 | |
| 92 | Usage Exemples |
| 93 | --------------- |
| 94 | |
| 95 | To help you understand how using the `Crawler` class, here are some basic |
| 96 | usages. |
| 97 | |
| 98 | Request the simple index to get a specific distribution |
| 99 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 100 | |
| 101 | Supposing you want to scan an index to get a list of distributions for |
| 102 | the "foobar" project. You can use the "get_releases" method for that. |
| 103 | The get_releases method will browse the project page, and return |
| 104 | :class:`ReleaseInfo` objects for each found link that rely on downloads. :: |
| 105 | |
| 106 | >>> from packaging.pypi.simple import Crawler |
| 107 | >>> crawler = Crawler() |
| 108 | >>> crawler.get_releases("FooBar") |
| 109 | [<ReleaseInfo "Foobar 1.1">, <ReleaseInfo "Foobar 1.2">] |
| 110 | |
| 111 | |
| 112 | Note that you also can request the client about specific versions, using version |
| 113 | specifiers (described in `PEP 345 |
| 114 | <http://www.python.org/dev/peps/pep-0345/#version-specifiers>`_):: |
| 115 | |
| 116 | >>> client.get_releases("FooBar < 1.2") |
| 117 | [<ReleaseInfo "FooBar 1.1">, ] |
| 118 | |
| 119 | |
| 120 | `get_releases` returns a list of :class:`ReleaseInfo`, but you also can get the |
| 121 | best distribution that fullfil your requirements, using "get_release":: |
| 122 | |
| 123 | >>> client.get_release("FooBar < 1.2") |
| 124 | <ReleaseInfo "FooBar 1.1"> |
| 125 | |
| 126 | |
| 127 | Download distributions |
| 128 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 129 | |
| 130 | As it can get the urls of distributions provided by PyPI, the `Crawler` |
| 131 | client also can download the distributions and put it for you in a temporary |
| 132 | destination:: |
| 133 | |
| 134 | >>> client.download("foobar") |
| 135 | /tmp/temp_dir/foobar-1.2.tar.gz |
| 136 | |
| 137 | |
| 138 | You also can specify the directory you want to download to:: |
| 139 | |
| 140 | >>> client.download("foobar", "/path/to/my/dir") |
| 141 | /path/to/my/dir/foobar-1.2.tar.gz |
| 142 | |
| 143 | |
| 144 | While downloading, the md5 of the archive will be checked, if not matches, it |
| 145 | will try another time, then if fails again, raise `MD5HashDoesNotMatchError`. |
| 146 | |
| 147 | Internally, that's not the Crawler which download the distributions, but the |
| 148 | `DistributionInfo` class. Please refer to this documentation for more details. |
| 149 | |
| 150 | |
| 151 | Following PyPI external links |
| 152 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 153 | |
| 154 | The default behavior for packaging is to *not* follow the links provided |
| 155 | by HTML pages in the "simple index", to find distributions related |
| 156 | downloads. |
| 157 | |
| 158 | It's possible to tell the PyPIClient to follow external links by setting the |
| 159 | `follow_externals` attribute, on instantiation or after:: |
| 160 | |
| 161 | >>> client = Crawler(follow_externals=True) |
| 162 | |
| 163 | or :: |
| 164 | |
| 165 | >>> client = Crawler() |
| 166 | >>> client.follow_externals = True |
| 167 | |
| 168 | |
| 169 | Working with external indexes, and mirrors |
| 170 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 171 | |
| 172 | The default `Crawler` behavior is to rely on the Python Package index stored |
| 173 | on PyPI (http://pypi.python.org/simple). |
| 174 | |
| 175 | As you can need to work with a local index, or private indexes, you can specify |
| 176 | it using the index_url parameter:: |
| 177 | |
| 178 | >>> client = Crawler(index_url="file://filesystem/path/") |
| 179 | |
| 180 | or :: |
| 181 | |
| 182 | >>> client = Crawler(index_url="http://some.specific.url/") |
| 183 | |
| 184 | |
| 185 | You also can specify mirrors to fallback on in case the first index_url you |
| 186 | provided doesnt respond, or not correctly. The default behavior for |
| 187 | `Crawler` is to use the list provided by Python.org DNS records, as |
| 188 | described in the :PEP:`381` about mirroring infrastructure. |
| 189 | |
| 190 | If you don't want to rely on these, you could specify the list of mirrors you |
| 191 | want to try by specifying the `mirrors` attribute. It's a simple iterable:: |
| 192 | |
| 193 | >>> mirrors = ["http://first.mirror","http://second.mirror"] |
| 194 | >>> client = Crawler(mirrors=mirrors) |
| 195 | |
| 196 | |
| 197 | Searching in the simple index |
| 198 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 199 | |
| 200 | It's possible to search for projects with specific names in the package index. |
| 201 | Assuming you want to find all projects containing the "distutils" keyword:: |
| 202 | |
| 203 | >>> c.search_projects("distutils") |
| 204 | [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project |
| 205 | "Packaging">, <Project "distutilscross">, <Project "lpdistutils">, <Project |
| 206 | "taras.recipe.distutils">, <Project "zerokspot.recipe.distutils">] |
| 207 | |
| 208 | |
| 209 | You can also search the projects starting with a specific text, or ending with |
| 210 | that text, using a wildcard:: |
| 211 | |
| 212 | >>> c.search_projects("distutils*") |
| 213 | [<Project "Distutils">, <Project "Packaging">, <Project "distutilscross">] |
| 214 | |
| 215 | >>> c.search_projects("*distutils") |
| 216 | [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project |
| 217 | "lpdistutils">, <Project "taras.recipe.distutils">, <Project |
| 218 | "zerokspot.recipe.distutils">] |