| Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 1 | :mod:`packaging.pypi.simple` --- Crawler using the PyPI "simple" interface | 
 | 2 | ========================================================================== | 
 | 3 |  | 
 | 4 | .. module:: packaging.pypi.simple | 
 | 5 |    :synopsis: Crawler using the screen-scraping "simple" interface to fetch info | 
 | 6 |               and distributions. | 
 | 7 |  | 
 | 8 |  | 
| Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 9 | The class provided by :mod:`packaging.pypi.simple` can access project indexes | 
 | 10 | and provide useful information about distributions.  PyPI, other indexes and | 
 | 11 | local indexes are supported. | 
| Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 12 |  | 
| Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 13 | You should use this module to search distributions by name and versions, process | 
 | 14 | index external pages and download distributions.  It is not suited for things | 
 | 15 | that will end up in too long index processing (like "finding all distributions | 
 | 16 | with a specific version, no matter the name"); use :mod:`packaging.pypi.xmlrpc` | 
 | 17 | for that. | 
| Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 18 |  | 
 | 19 |  | 
 | 20 | API | 
 | 21 | --- | 
 | 22 |  | 
| Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 23 | .. class:: Crawler(index_url=DEFAULT_SIMPLE_INDEX_URL, \ | 
 | 24 |                    prefer_final=False, prefer_source=True, \ | 
 | 25 |                    hosts=('*',), follow_externals=False, \ | 
 | 26 |                    mirrors_url=None, mirrors=None, timeout=15, \ | 
| Éric Araujo | dd2d55c | 2011-09-21 16:28:03 +0200 | [diff] [blame] | 27 |                    mirrors_max_tries=0) | 
| Éric Araujo | e043b6b | 2011-06-19 19:23:48 +0200 | [diff] [blame] | 28 |  | 
 | 29 |       *index_url* is the address of the index to use for requests. | 
 | 30 |  | 
 | 31 |       The first two parameters control the query results.  *prefer_final* | 
 | 32 |       indicates whether a final version (not alpha, beta or candidate) is to be | 
 | 33 |       prefered over a newer but non-final version (for example, whether to pick | 
 | 34 |       up 1.0 over 2.0a3).  It is used only for queries that don't give a version | 
 | 35 |       argument.  Likewise, *prefer_source* tells whether to prefer a source | 
 | 36 |       distribution over a binary one, if no distribution argument was prodived. | 
 | 37 |  | 
 | 38 |       Other parameters are related to external links (that is links that go | 
 | 39 |       outside the simple index): *hosts* is a list of hosts allowed to be | 
 | 40 |       processed if *follow_externals* is true (default behavior is to follow all | 
 | 41 |       hosts), *follow_externals* enables or disables following external links | 
 | 42 |       (default is false, meaning disabled). | 
 | 43 |  | 
 | 44 |       The remaining parameters are related to the mirroring infrastructure | 
 | 45 |       defined in :PEP:`381`.  *mirrors_url* gives a URL to look on for DNS | 
 | 46 |       records giving mirror adresses; *mirrors* is a list of mirror URLs (see | 
 | 47 |       the PEP).  If both *mirrors* and *mirrors_url* are given, *mirrors_url* | 
 | 48 |       will only be used if *mirrors* is set to ``None``.  *timeout* is the time | 
 | 49 |       (in seconds) to wait before considering a URL has timed out; | 
 | 50 |       *mirrors_max_tries"* is the number of times to try requesting informations | 
 | 51 |       on mirrors before switching. | 
 | 52 |  | 
 | 53 |       The following methods are defined: | 
 | 54 |  | 
 | 55 |       .. method:: get_distributions(project_name, version) | 
 | 56 |  | 
 | 57 |          Return the distributions found in the index for the given release. | 
 | 58 |  | 
 | 59 |       .. method:: get_metadata(project_name, version) | 
 | 60 |  | 
 | 61 |          Return the metadata found on the index for this project name and | 
 | 62 |          version.  Currently downloads and unpacks a distribution to read the | 
 | 63 |          PKG-INFO file. | 
 | 64 |  | 
 | 65 |       .. method:: get_release(requirements, prefer_final=None) | 
 | 66 |  | 
 | 67 |          Return one release that fulfills the given requirements. | 
 | 68 |  | 
 | 69 |       .. method:: get_releases(requirements, prefer_final=None, force_update=False) | 
 | 70 |  | 
 | 71 |          Search for releases and return a | 
 | 72 |          :class:`~packaging.pypi.dist.ReleasesList` object containing the | 
 | 73 |          results. | 
 | 74 |  | 
 | 75 |       .. method:: search_projects(name=None) | 
 | 76 |  | 
 | 77 |          Search the index for projects containing the given name and return a | 
 | 78 |          list of matching names. | 
 | 79 |  | 
 | 80 |    See also the base class :class:`packaging.pypi.base.BaseClient` for inherited | 
 | 81 |    methods. | 
 | 82 |  | 
 | 83 |  | 
 | 84 | .. data:: DEFAULT_SIMPLE_INDEX_URL | 
 | 85 |  | 
 | 86 |    The address used by default by the crawler class.  It is currently | 
 | 87 |    ``'http://a.pypi.python.org/simple/'``, the main PyPI installation. | 
 | 88 |  | 
 | 89 |  | 
| Éric Araujo | 3a9f58f | 2011-06-01 20:42:49 +0200 | [diff] [blame] | 90 |  | 
 | 91 |  | 
 | 92 | Usage Exemples | 
 | 93 | --------------- | 
 | 94 |  | 
 | 95 | To help you understand how using the `Crawler` class, here are some basic | 
 | 96 | usages. | 
 | 97 |  | 
 | 98 | Request the simple index to get a specific distribution | 
 | 99 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | 100 |  | 
 | 101 | Supposing you want to scan an index to get a list of distributions for | 
 | 102 | the "foobar" project. You can use the "get_releases" method for that. | 
 | 103 | The get_releases method will browse the project page, and return | 
 | 104 | :class:`ReleaseInfo`  objects for each found link that rely on downloads. :: | 
 | 105 |  | 
 | 106 |    >>> from packaging.pypi.simple import Crawler | 
 | 107 |    >>> crawler = Crawler() | 
 | 108 |    >>> crawler.get_releases("FooBar") | 
 | 109 |    [<ReleaseInfo "Foobar 1.1">, <ReleaseInfo "Foobar 1.2">] | 
 | 110 |  | 
 | 111 |  | 
 | 112 | Note that you also can request the client about specific versions, using version | 
 | 113 | specifiers (described in `PEP 345 | 
 | 114 | <http://www.python.org/dev/peps/pep-0345/#version-specifiers>`_):: | 
 | 115 |  | 
 | 116 |    >>> client.get_releases("FooBar < 1.2") | 
 | 117 |    [<ReleaseInfo "FooBar 1.1">, ] | 
 | 118 |  | 
 | 119 |  | 
 | 120 | `get_releases` returns a list of :class:`ReleaseInfo`, but you also can get the | 
 | 121 | best distribution that fullfil your requirements, using "get_release":: | 
 | 122 |  | 
 | 123 |    >>> client.get_release("FooBar < 1.2") | 
 | 124 |    <ReleaseInfo "FooBar 1.1"> | 
 | 125 |  | 
 | 126 |  | 
 | 127 | Download distributions | 
 | 128 | ^^^^^^^^^^^^^^^^^^^^^^ | 
 | 129 |  | 
 | 130 | As it can get the urls of distributions provided by PyPI, the `Crawler` | 
 | 131 | client also can download the distributions and put it for you in a temporary | 
 | 132 | destination:: | 
 | 133 |  | 
 | 134 |    >>> client.download("foobar") | 
 | 135 |    /tmp/temp_dir/foobar-1.2.tar.gz | 
 | 136 |  | 
 | 137 |  | 
 | 138 | You also can specify the directory you want to download to:: | 
 | 139 |  | 
 | 140 |    >>> client.download("foobar", "/path/to/my/dir") | 
 | 141 |    /path/to/my/dir/foobar-1.2.tar.gz | 
 | 142 |  | 
 | 143 |  | 
 | 144 | While downloading, the md5 of the archive will be checked, if not matches, it | 
 | 145 | will try another time, then if fails again, raise `MD5HashDoesNotMatchError`. | 
 | 146 |  | 
 | 147 | Internally, that's not the Crawler which download the distributions, but the | 
 | 148 | `DistributionInfo` class. Please refer to this documentation for more details. | 
 | 149 |  | 
 | 150 |  | 
 | 151 | Following PyPI external links | 
 | 152 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | 153 |  | 
 | 154 | The default behavior for packaging is to *not* follow the links provided | 
 | 155 | by HTML pages in the "simple index", to find distributions related | 
 | 156 | downloads. | 
 | 157 |  | 
 | 158 | It's possible to tell the PyPIClient to follow external links by setting the | 
 | 159 | `follow_externals` attribute, on instantiation or after:: | 
 | 160 |  | 
 | 161 |    >>> client = Crawler(follow_externals=True) | 
 | 162 |  | 
 | 163 | or :: | 
 | 164 |  | 
 | 165 |    >>> client = Crawler() | 
 | 166 |    >>> client.follow_externals = True | 
 | 167 |  | 
 | 168 |  | 
 | 169 | Working with external indexes, and mirrors | 
 | 170 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | 171 |  | 
 | 172 | The default `Crawler` behavior is to rely on the Python Package index stored | 
 | 173 | on PyPI (http://pypi.python.org/simple). | 
 | 174 |  | 
 | 175 | As you can need to work with a local index, or private indexes, you can specify | 
 | 176 | it using the index_url parameter:: | 
 | 177 |  | 
 | 178 |    >>> client = Crawler(index_url="file://filesystem/path/") | 
 | 179 |  | 
 | 180 | or :: | 
 | 181 |  | 
 | 182 |    >>> client = Crawler(index_url="http://some.specific.url/") | 
 | 183 |  | 
 | 184 |  | 
 | 185 | You also can specify mirrors to fallback on in case the first index_url you | 
 | 186 | provided doesnt respond, or not correctly. The default behavior for | 
 | 187 | `Crawler` is to use the list provided by Python.org DNS records, as | 
 | 188 | described in the :PEP:`381` about mirroring infrastructure. | 
 | 189 |  | 
 | 190 | If you don't want to rely on these, you could specify the list of mirrors you | 
 | 191 | want to try by specifying the `mirrors` attribute. It's a simple iterable:: | 
 | 192 |  | 
 | 193 |    >>> mirrors = ["http://first.mirror","http://second.mirror"] | 
 | 194 |    >>> client = Crawler(mirrors=mirrors) | 
 | 195 |  | 
 | 196 |  | 
 | 197 | Searching in the simple index | 
 | 198 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 | 199 |  | 
 | 200 | It's possible to search for projects with specific names in the package index. | 
 | 201 | Assuming you want to find all projects containing the "distutils" keyword:: | 
 | 202 |  | 
 | 203 |    >>> c.search_projects("distutils") | 
 | 204 |    [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project | 
 | 205 |    "Packaging">, <Project "distutilscross">, <Project "lpdistutils">, <Project | 
 | 206 |    "taras.recipe.distutils">, <Project "zerokspot.recipe.distutils">] | 
 | 207 |  | 
 | 208 |  | 
 | 209 | You can also search the projects starting with a specific text, or ending with | 
 | 210 | that text, using a wildcard:: | 
 | 211 |  | 
 | 212 |    >>> c.search_projects("distutils*") | 
 | 213 |    [<Project "Distutils">, <Project "Packaging">, <Project "distutilscross">] | 
 | 214 |  | 
 | 215 |    >>> c.search_projects("*distutils") | 
 | 216 |    [<Project "collective.recipe.distutils">, <Project "Distutils">, <Project | 
 | 217 |    "lpdistutils">, <Project "taras.recipe.distutils">, <Project | 
 | 218 |    "zerokspot.recipe.distutils">] |