| Nan Zhang | 8539a2a | 2018-05-15 14:00:05 -0700 | [diff] [blame] | 1 | ===================================== |
| 2 | The Internal Structure of Python Eggs |
| 3 | ===================================== |
| 4 | |
| 5 | STOP! This is not the first document you should read! |
| 6 | |
| 7 | |
| 8 | |
| Nan Zhang | 8539a2a | 2018-05-15 14:00:05 -0700 | [diff] [blame] | 9 | ---------------------- |
| 10 | Eggs and their Formats |
| 11 | ---------------------- |
| 12 | |
| 13 | A "Python egg" is a logical structure embodying the release of a |
| 14 | specific version of a Python project, comprising its code, resources, |
| 15 | and metadata. There are multiple formats that can be used to physically |
| 16 | encode a Python egg, and others can be developed. However, a key |
| 17 | principle of Python eggs is that they should be discoverable and |
| 18 | importable. That is, it should be possible for a Python application to |
| 19 | easily and efficiently find out what eggs are present on a system, and |
| 20 | to ensure that the desired eggs' contents are importable. |
| 21 | |
| 22 | There are two basic formats currently implemented for Python eggs: |
| 23 | |
| 24 | 1. ``.egg`` format: a directory or zipfile *containing* the project's |
| 25 | code and resources, along with an ``EGG-INFO`` subdirectory that |
| 26 | contains the project's metadata |
| 27 | |
| 28 | 2. ``.egg-info`` format: a file or directory placed *adjacent* to the |
| 29 | project's code and resources, that directly contains the project's |
| 30 | metadata. |
| 31 | |
| 32 | Both formats can include arbitrary Python code and resources, including |
| 33 | static data files, package and non-package directories, Python |
| 34 | modules, C extension modules, and so on. But each format is optimized |
| 35 | for different purposes. |
| 36 | |
| 37 | The ``.egg`` format is well-suited to distribution and the easy |
| 38 | uninstallation or upgrades of code, since the project is essentially |
| 39 | self-contained within a single directory or file, unmingled with any |
| 40 | other projects' code or resources. It also makes it possible to have |
| 41 | multiple versions of a project simultaneously installed, such that |
| 42 | individual programs can select the versions they wish to use. |
| 43 | |
| 44 | The ``.egg-info`` format, on the other hand, was created to support |
| 45 | backward-compatibility, performance, and ease of installation for system |
| 46 | packaging tools that expect to install all projects' code and resources |
| 47 | to a single directory (e.g. ``site-packages``). Placing the metadata |
| 48 | in that same directory simplifies the installation process, since it |
| 49 | isn't necessary to create ``.pth`` files or otherwise modify |
| 50 | ``sys.path`` to include each installed egg. |
| 51 | |
| 52 | Its disadvantage, however, is that it provides no support for clean |
| 53 | uninstallation or upgrades, and of course only a single version of a |
| 54 | project can be installed to a given directory. Thus, support from a |
| 55 | package management tool is required. (This is why setuptools' "install" |
| 56 | command refers to this type of egg installation as "single-version, |
| 57 | externally managed".) Also, they lack sufficient data to allow them to |
| 58 | be copied from their installation source. easy_install can "ship" an |
| 59 | application by copying ``.egg`` files or directories to a target |
| 60 | location, but it cannot do this for ``.egg-info`` installs, because |
| 61 | there is no way to tell what code and resources belong to a particular |
| 62 | egg -- there may be several eggs "scrambled" together in a single |
| 63 | installation location, and the ``.egg-info`` format does not currently |
| 64 | include a way to list the files that were installed. (This may change |
| 65 | in a future version.) |
| 66 | |
| 67 | |
| 68 | Code and Resources |
| 69 | ================== |
| 70 | |
| 71 | The layout of the code and resources is dictated by Python's normal |
| 72 | import layout, relative to the egg's "base location". |
| 73 | |
| 74 | For the ``.egg`` format, the base location is the ``.egg`` itself. That |
| 75 | is, adding the ``.egg`` filename or directory name to ``sys.path`` |
| 76 | makes its contents importable. |
| 77 | |
| 78 | For the ``.egg-info`` format, however, the base location is the |
| 79 | directory that *contains* the ``.egg-info``, and thus it is the |
| 80 | directory that must be added to ``sys.path`` to make the egg importable. |
| 81 | (Note that this means that the "normal" installation of a package to a |
| 82 | ``sys.path`` directory is sufficient to make it an "egg" if it has an |
| 83 | ``.egg-info`` file or directory installed alongside of it.) |
| 84 | |
| 85 | |
| 86 | Project Metadata |
| 87 | ================= |
| 88 | |
| 89 | If eggs contained only code and resources, there would of course be |
| 90 | no difference between them and any other directory or zip file on |
| 91 | ``sys.path``. Thus, metadata must also be included, using a metadata |
| 92 | file or directory. |
| 93 | |
| 94 | For the ``.egg`` format, the metadata is placed in an ``EGG-INFO`` |
| 95 | subdirectory, directly within the ``.egg`` file or directory. For the |
| 96 | ``.egg-info`` format, metadata is stored directly within the |
| 97 | ``.egg-info`` directory itself. |
| 98 | |
| 99 | The minimum project metadata that all eggs must have is a standard |
| 100 | Python ``PKG-INFO`` file, named ``PKG-INFO`` and placed within the |
| 101 | metadata directory appropriate to the format. Because it's possible for |
| 102 | this to be the only metadata file included, ``.egg-info`` format eggs |
| 103 | are not required to be a directory; they can just be a ``.egg-info`` |
| 104 | file that directly contains the ``PKG-INFO`` metadata. This eliminates |
| 105 | the need to create a directory just to store one file. This option is |
| 106 | *not* available for ``.egg`` formats, since setuptools always includes |
| 107 | other metadata. (In fact, setuptools itself never generates |
| 108 | ``.egg-info`` files, either; the support for using files was added so |
| 109 | that the requirement could easily be satisfied by other tools, such |
| 110 | as distutils). |
| 111 | |
| 112 | In addition to the ``PKG-INFO`` file, an egg's metadata directory may |
| 113 | also include files and directories representing various forms of |
| 114 | optional standard metadata (see the section on `Standard Metadata`_, |
| 115 | below) or user-defined metadata required by the project. For example, |
| 116 | some projects may define a metadata format to describe their application |
| 117 | plugins, and metadata in this format would then be included by plugin |
| 118 | creators in their projects' metadata directories. |
| 119 | |
| 120 | |
| 121 | Filename-Embedded Metadata |
| 122 | ========================== |
| 123 | |
| 124 | To allow introspection of installed projects and runtime resolution of |
| 125 | inter-project dependencies, a certain amount of information is embedded |
| 126 | in egg filenames. At a minimum, this includes the project name, and |
| 127 | ideally will also include the project version number. Optionally, it |
| 128 | can also include the target Python version and required runtime |
| 129 | platform if platform-specific C code is included. The syntax of an |
| 130 | egg filename is as follows:: |
| 131 | |
| 132 | name ["-" version ["-py" pyver ["-" required_platform]]] "." ext |
| 133 | |
| 134 | The "name" and "version" should be escaped using the ``to_filename()`` |
| 135 | function provided by ``pkg_resources``, after first processing them with |
| 136 | ``safe_name()`` and ``safe_version()`` respectively. These latter two |
| 137 | functions can also be used to later "unescape" these parts of the |
| 138 | filename. (For a detailed description of these transformations, please |
| 139 | see the "Parsing Utilities" section of the ``pkg_resources`` manual.) |
| 140 | |
| 141 | The "pyver" string is the Python major version, as found in the first |
| 142 | 3 characters of ``sys.version``. "required_platform" is essentially |
| 143 | a distutils ``get_platform()`` string, but with enhancements to properly |
| 144 | distinguish Mac OS versions. (See the ``get_build_platform()`` |
| 145 | documentation in the "Platform Utilities" section of the |
| 146 | ``pkg_resources`` manual for more details.) |
| 147 | |
| 148 | Finally, the "ext" is either ``.egg`` or ``.egg-info``, as appropriate |
| 149 | for the egg's format. |
| 150 | |
| 151 | Normally, an egg's filename should include at least the project name and |
| 152 | version, as this allows the runtime system to find desired project |
| 153 | versions without having to read the egg's PKG-INFO to determine its |
| 154 | version number. |
| 155 | |
| 156 | Setuptools, however, only includes the version number in the filename |
| 157 | when an ``.egg`` file is built using the ``bdist_egg`` command, or when |
| 158 | an ``.egg-info`` directory is being installed by the |
| 159 | ``install_egg_info`` command. When generating metadata for use with the |
| 160 | original source tree, it only includes the project name, so that the |
| 161 | directory will not have to be renamed each time the project's version |
| 162 | changes. |
| 163 | |
| 164 | This is especially important when version numbers change frequently, and |
| 165 | the source metadata directory is kept under version control with the |
| 166 | rest of the project. (As would be the case when the project's source |
| 167 | includes project-defined metadata that is not generated from by |
| 168 | setuptools from data in the setup script.) |
| 169 | |
| 170 | |
| 171 | Egg Links |
| 172 | ========= |
| 173 | |
| 174 | In addition to the ``.egg`` and ``.egg-info`` formats, there is a third |
| 175 | egg-related extension that you may encounter on occasion: ``.egg-link`` |
| 176 | files. |
| 177 | |
| 178 | These files are not eggs, strictly speaking. They simply provide a way |
| 179 | to reference an egg that is not physically installed in the desired |
| 180 | location. They exist primarily as a cross-platform alternative to |
| 181 | symbolic links, to support "installing" code that is being developed in |
| 182 | a different location than the desired installation location. For |
| 183 | example, if a user is developing an application plugin in their home |
| 184 | directory, but the plugin needs to be "installed" in an application |
| 185 | plugin directory, running "setup.py develop -md /path/to/app/plugins" |
| 186 | will install an ``.egg-link`` file in ``/path/to/app/plugins``, that |
| 187 | tells the egg runtime system where to find the actual egg (the user's |
| 188 | project source directory and its ``.egg-info`` subdirectory). |
| 189 | |
| 190 | ``.egg-link`` files are named following the format for ``.egg`` and |
| 191 | ``.egg-info`` names, but only the project name is included; no version, |
| 192 | Python version, or platform information is included. When the runtime |
| 193 | searches for available eggs, ``.egg-link`` files are opened and the |
| 194 | actual egg file/directory name is read from them. |
| 195 | |
| 196 | Each ``.egg-link`` file should contain a single file or directory name, |
| 197 | with no newlines. This filename should be the base location of one or |
| 198 | more eggs. That is, the name must either end in ``.egg``, or else it |
| 199 | should be the parent directory of one or more ``.egg-info`` format eggs. |
| 200 | |
| 201 | As of setuptools 0.6c6, the path may be specified as a platform-independent |
| 202 | (i.e. ``/``-separated) relative path from the directory containing the |
| 203 | ``.egg-link`` file, and a second line may appear in the file, specifying a |
| 204 | platform-independent relative path from the egg's base directory to its |
| 205 | setup script directory. This allows installation tools such as EasyInstall |
| 206 | to find the project's setup directory and build eggs or perform other setup |
| 207 | commands on it. |
| 208 | |
| 209 | |
| 210 | ----------------- |
| 211 | Standard Metadata |
| 212 | ----------------- |
| 213 | |
| 214 | In addition to the minimum required ``PKG-INFO`` metadata, projects can |
| 215 | include a variety of standard metadata files or directories, as |
| 216 | described below. Except as otherwise noted, these files and directories |
| 217 | are automatically generated by setuptools, based on information supplied |
| 218 | in the setup script or through analysis of the project's code and |
| 219 | resources. |
| 220 | |
| 221 | Most of these files and directories are generated via "egg-info |
| 222 | writers" during execution of the setuptools ``egg_info`` command, and |
| 223 | are listed in the ``egg_info.writers`` entry point group defined by |
| 224 | setuptools' own ``setup.py`` file. |
| 225 | |
| 226 | Project authors can register their own metadata writers as entry points |
| 227 | in this group (as described in the setuptools manual under "Adding new |
| 228 | EGG-INFO Files") to cause setuptools to generate project-specific |
| 229 | metadata files or directories during execution of the ``egg_info`` |
| 230 | command. It is up to project authors to document these new metadata |
| 231 | formats, if they create any. |
| 232 | |
| 233 | |
| 234 | ``.txt`` File Formats |
| 235 | ===================== |
| 236 | |
| 237 | Files described in this section that have ``.txt`` extensions have a |
| 238 | simple lexical format consisting of a sequence of text lines, each line |
| 239 | terminated by a linefeed character (regardless of platform). Leading |
| 240 | and trailing whitespace on each line is ignored, as are blank lines and |
| 241 | lines whose first nonblank character is a ``#`` (comment symbol). (This |
| 242 | is the parsing format defined by the ``yield_lines()`` function of |
| 243 | the ``pkg_resources`` module.) |
| 244 | |
| 245 | All ``.txt`` files defined by this section follow this format, but some |
| 246 | are also "sectioned" files, meaning that their contents are divided into |
| 247 | sections, using square-bracketed section headers akin to Windows |
| 248 | ``.ini`` format. Note that this does *not* imply that the lines within |
| 249 | the sections follow an ``.ini`` format, however. Please see an |
| 250 | individual metadata file's documentation for a description of what the |
| 251 | lines and section names mean in that particular file. |
| 252 | |
| 253 | Sectioned files can be parsed using the ``split_sections()`` function; |
| 254 | see the "Parsing Utilities" section of the ``pkg_resources`` manual for |
| 255 | for details. |
| 256 | |
| 257 | |
| 258 | Dependency Metadata |
| 259 | =================== |
| 260 | |
| 261 | |
| 262 | ``requires.txt`` |
| 263 | ---------------- |
| 264 | |
| 265 | This is a "sectioned" text file. Each section is a sequence of |
| 266 | "requirements", as parsed by the ``parse_requirements()`` function; |
| 267 | please see the ``pkg_resources`` manual for the complete requirement |
| 268 | parsing syntax. |
| 269 | |
| 270 | The first, unnamed section (i.e., before the first section header) in |
| 271 | this file is the project's core requirements, which must be installed |
| 272 | for the project to function. (Specified using the ``install_requires`` |
| 273 | keyword to ``setup()``). |
| 274 | |
| 275 | The remaining (named) sections describe the project's "extra" |
| 276 | requirements, as specified using the ``extras_require`` keyword to |
| 277 | ``setup()``. The section name is the name of the optional feature, and |
| 278 | the section body lists that feature's dependencies. |
| 279 | |
| 280 | Note that it is not normally necessary to inspect this file directly; |
| 281 | ``pkg_resources.Distribution`` objects have a ``requires()`` method |
| 282 | that can be used to obtain ``Requirement`` objects describing the |
| 283 | project's core and optional dependencies. |
| 284 | |
| 285 | |
| 286 | ``setup_requires.txt`` |
| 287 | ---------------------- |
| 288 | |
| 289 | Much like ``requires.txt`` except represents the requirements |
| 290 | specified by the ``setup_requires`` parameter to the Distribution. |
| 291 | |
| 292 | |
| 293 | ``dependency_links.txt`` |
| 294 | ------------------------ |
| 295 | |
| 296 | A list of dependency URLs, one per line, as specified using the |
| 297 | ``dependency_links`` keyword to ``setup()``. These may be direct |
| 298 | download URLs, or the URLs of web pages containing direct download |
| Dan Willemsen | adad21e | 2022-03-25 17:22:05 -0700 | [diff] [blame] | 299 | links. Please see the setuptools manual for more information on |
| 300 | specifying this option. |
| Nan Zhang | 8539a2a | 2018-05-15 14:00:05 -0700 | [diff] [blame] | 301 | |
| 302 | |
| 303 | ``depends.txt`` -- Obsolete, do not create! |
| 304 | ------------------------------------------- |
| 305 | |
| 306 | This file follows an identical format to ``requires.txt``, but is |
| 307 | obsolete and should not be used. The earliest versions of setuptools |
| 308 | required users to manually create and maintain this file, so the runtime |
| 309 | still supports reading it, if it exists. The new filename was created |
| 310 | so that it could be automatically generated from ``setup()`` information |
| 311 | without overwriting an existing hand-created ``depends.txt``, if one |
| 312 | was already present in the project's source ``.egg-info`` directory. |
| 313 | |
| 314 | |
| 315 | ``namespace_packages.txt`` -- Namespace Package Metadata |
| 316 | ======================================================== |
| 317 | |
| 318 | A list of namespace package names, one per line, as supplied to the |
| 319 | ``namespace_packages`` keyword to ``setup()``. Please see the manuals |
| 320 | for setuptools and ``pkg_resources`` for more information about |
| 321 | namespace packages. |
| 322 | |
| 323 | |
| 324 | ``entry_points.txt`` -- "Entry Point"/Plugin Metadata |
| 325 | ===================================================== |
| 326 | |
| 327 | This is a "sectioned" text file, whose contents encode the |
| 328 | ``entry_points`` keyword supplied to ``setup()``. All sections are |
| 329 | named, as the section names specify the entry point groups in which the |
| 330 | corresponding section's entry points are registered. |
| 331 | |
| 332 | Each section is a sequence of "entry point" lines, each parseable using |
| 333 | the ``EntryPoint.parse`` classmethod; please see the ``pkg_resources`` |
| 334 | manual for the complete entry point parsing syntax. |
| 335 | |
| 336 | Note that it is not necessary to parse this file directly; the |
| 337 | ``pkg_resources`` module provides a variety of APIs to locate and load |
| 338 | entry points automatically. Please see the setuptools and |
| 339 | ``pkg_resources`` manuals for details on the nature and uses of entry |
| 340 | points. |
| 341 | |
| 342 | |
| 343 | The ``scripts`` Subdirectory |
| 344 | ============================ |
| 345 | |
| 346 | This directory is currently only created for ``.egg`` files built by |
| 347 | the setuptools ``bdist_egg`` command. It will contain copies of all |
| 348 | of the project's "traditional" scripts (i.e., those specified using the |
| 349 | ``scripts`` keyword to ``setup()``). This is so that they can be |
| 350 | reconstituted when an ``.egg`` file is installed. |
| 351 | |
| 352 | The scripts are placed here using the distutils' standard |
| 353 | ``install_scripts`` command, so any ``#!`` lines reflect the Python |
| 354 | installation where the egg was built. But instead of copying the |
| 355 | scripts to the local script installation directory, EasyInstall writes |
| 356 | short wrapper scripts that invoke the original scripts from inside the |
| 357 | egg, after ensuring that sys.path includes the egg and any eggs it |
| 358 | depends on. For more about `script wrappers`_, see the section below on |
| 359 | `Installation and Path Management Issues`_. |
| 360 | |
| 361 | |
| 362 | Zip Support Metadata |
| 363 | ==================== |
| 364 | |
| 365 | |
| 366 | ``native_libs.txt`` |
| 367 | ------------------- |
| 368 | |
| 369 | A list of C extensions and other dynamic link libraries contained in |
| 370 | the egg, one per line. Paths are ``/``-separated and relative to the |
| 371 | egg's base location. |
| 372 | |
| 373 | This file is generated as part of ``bdist_egg`` processing, and as such |
| 374 | only appears in ``.egg`` files (and ``.egg`` directories created by |
| 375 | unpacking them). It is used to ensure that all libraries are extracted |
| 376 | from a zipped egg at the same time, in case there is any direct linkage |
| 377 | between them. Please see the `Zip File Issues`_ section below for more |
| 378 | information on library and resource extraction from ``.egg`` files. |
| 379 | |
| 380 | |
| 381 | ``eager_resources.txt`` |
| 382 | ----------------------- |
| 383 | |
| 384 | A list of resource files and/or directories, one per line, as specified |
| 385 | via the ``eager_resources`` keyword to ``setup()``. Paths are |
| 386 | ``/``-separated and relative to the egg's base location. |
| 387 | |
| 388 | Resource files or directories listed here will be extracted |
| 389 | simultaneously, if any of the named resources are extracted, or if any |
| 390 | native libraries listed in ``native_libs.txt`` are extracted. Please |
| 391 | see the setuptools manual for details on what this feature is used for |
| 392 | and how it works, as well as the `Zip File Issues`_ section below. |
| 393 | |
| 394 | |
| 395 | ``zip-safe`` and ``not-zip-safe`` |
| 396 | --------------------------------- |
| 397 | |
| 398 | These are zero-length files, and either one or the other should exist. |
| 399 | If ``zip-safe`` exists, it means that the project will work properly |
| 400 | when installed as an ``.egg`` zipfile, and conversely the existence of |
| 401 | ``not-zip-safe`` means the project should not be installed as an |
| 402 | ``.egg`` file. The ``zip_safe`` option to setuptools' ``setup()`` |
| 403 | determines which file will be written. If the option isn't provided, |
| 404 | setuptools attempts to make its own assessment of whether the package |
| 405 | can work, based on code and content analysis. |
| 406 | |
| 407 | If neither file is present at installation time, EasyInstall defaults |
| 408 | to assuming that the project should be unzipped. (Command-line options |
| 409 | to EasyInstall, however, take precedence even over an existing |
| 410 | ``zip-safe`` or ``not-zip-safe`` file.) |
| 411 | |
| 412 | Note that these flag files appear only in ``.egg`` files generated by |
| 413 | ``bdist_egg``, and in ``.egg`` directories created by unpacking such an |
| 414 | ``.egg`` file. |
| 415 | |
| 416 | |
| 417 | |
| 418 | ``top_level.txt`` -- Conflict Management Metadata |
| 419 | ================================================= |
| 420 | |
| 421 | This file is a list of the top-level module or package names provided |
| 422 | by the project, one Python identifier per line. |
| 423 | |
| 424 | Subpackages are not included; a project containing both a ``foo.bar`` |
| 425 | and a ``foo.baz`` would include only one line, ``foo``, in its |
| 426 | ``top_level.txt``. |
| 427 | |
| 428 | This data is used by ``pkg_resources`` at runtime to issue a warning if |
| 429 | an egg is added to ``sys.path`` when its contained packages may have |
| 430 | already been imported. |
| 431 | |
| 432 | (It was also once used to detect conflicts with non-egg packages at |
| 433 | installation time, but in more recent versions, setuptools installs eggs |
| 434 | in such a way that they always override non-egg packages, thus |
| 435 | preventing a problem from arising.) |
| 436 | |
| 437 | |
| 438 | ``SOURCES.txt`` -- Source Files Manifest |
| 439 | ======================================== |
| 440 | |
| 441 | This file is roughly equivalent to the distutils' ``MANIFEST`` file. |
| 442 | The differences are as follows: |
| 443 | |
| 444 | * The filenames always use ``/`` as a path separator, which must be |
| 445 | converted back to a platform-specific path whenever they are read. |
| 446 | |
| 447 | * The file is automatically generated by setuptools whenever the |
| 448 | ``egg_info`` or ``sdist`` commands are run, and it is *not* |
| 449 | user-editable. |
| 450 | |
| 451 | Although this metadata is included with distributed eggs, it is not |
| 452 | actually used at runtime for any purpose. Its function is to ensure |
| 453 | that setuptools-built *source* distributions can correctly discover |
| 454 | what files are part of the project's source, even if the list had been |
| 455 | generated using revision control metadata on the original author's |
| 456 | system. |
| 457 | |
| 458 | In other words, ``SOURCES.txt`` has little or no runtime value for being |
| 459 | included in distributed eggs, and it is possible that future versions of |
| 460 | the ``bdist_egg`` and ``install_egg_info`` commands will strip it before |
| 461 | installation or distribution. Therefore, do not rely on its being |
| 462 | available outside of an original source directory or source |
| 463 | distribution. |
| 464 | |
| 465 | |
| 466 | ------------------------------ |
| 467 | Other Technical Considerations |
| 468 | ------------------------------ |
| 469 | |
| 470 | |
| 471 | Zip File Issues |
| 472 | =============== |
| 473 | |
| 474 | Although zip files resemble directories, they are not fully |
| 475 | substitutable for them. Most platforms do not support loading dynamic |
| 476 | link libraries contained in zipfiles, so it is not possible to directly |
| 477 | import C extensions from ``.egg`` zipfiles. Similarly, there are many |
| 478 | existing libraries -- whether in Python or C -- that require actual |
| 479 | operating system filenames, and do not work with arbitrary "file-like" |
| 480 | objects or in-memory strings, and thus cannot operate directly on the |
| 481 | contents of zip files. |
| 482 | |
| 483 | To address these issues, the ``pkg_resources`` module provides a |
| 484 | "resource API" to support obtaining either the contents of a resource, |
| 485 | or a true operating system filename for the resource. If the egg |
| 486 | containing the resource is a directory, the resource's real filename |
| 487 | is simply returned. However, if the egg is a zipfile, then the |
| 488 | resource is first extracted to a cache directory, and the filename |
| 489 | within the cache is returned. |
| 490 | |
| 491 | The cache directory is determined by the ``pkg_resources`` API; please |
| 492 | see the ``set_cache_path()`` and ``get_default_cache()`` documentation |
| 493 | for details. |
| 494 | |
| 495 | |
| 496 | The Extraction Process |
| 497 | ---------------------- |
| 498 | |
| 499 | Resources are extracted to a cache subdirectory whose name is based |
| 500 | on the enclosing ``.egg`` filename and the path to the resource. If |
| 501 | there is already a file of the correct name, size, and timestamp, its |
| 502 | filename is returned to the requester. Otherwise, the desired file is |
| 503 | extracted first to a temporary name generated using |
| 504 | ``mkstemp(".$extract",target_dir)``, and then its timestamp is set to |
| 505 | match the one in the zip file, before renaming it to its final name. |
| 506 | (Some collision detection and resolution code is used to handle the |
| 507 | fact that Windows doesn't overwrite files when renaming.) |
| 508 | |
| 509 | If a resource directory is requested, all of its contents are |
| 510 | recursively extracted in this fashion, to ensure that the directory |
| 511 | name can be used as if it were valid all along. |
| 512 | |
| 513 | If the resource requested for extraction is listed in the |
| 514 | ``native_libs.txt`` or ``eager_resources.txt`` metadata files, then |
| 515 | *all* resources listed in *either* file will be extracted before the |
| 516 | requested resource's filename is returned, thus ensuring that all |
| 517 | C extensions and data used by them will be simultaneously available. |
| 518 | |
| 519 | |
| 520 | Extension Import Wrappers |
| 521 | ------------------------- |
| 522 | |
| 523 | Since Python's built-in zip import feature does not support loading |
| 524 | C extension modules from zipfiles, the setuptools ``bdist_egg`` command |
| 525 | generates special import wrappers to make it work. |
| 526 | |
| 527 | The wrappers are ``.py`` files (along with corresponding ``.pyc`` |
| 528 | and/or ``.pyo`` files) that have the same module name as the |
| 529 | corresponding C extension. These wrappers are located in the same |
| 530 | package directory (or top-level directory) within the zipfile, so that |
| 531 | say, ``foomodule.so`` will get a corresponding ``foo.py``, while |
| 532 | ``bar/baz.pyd`` will get a corresponding ``bar/baz.py``. |
| 533 | |
| 534 | These wrapper files contain a short stanza of Python code that asks |
| 535 | ``pkg_resources`` for the filename of the corresponding C extension, |
| 536 | then reloads the module using the obtained filename. This will cause |
| 537 | ``pkg_resources`` to first ensure that all of the egg's C extensions |
| 538 | (and any accompanying "eager resources") are extracted to the cache |
| 539 | before attempting to link to the C library. |
| 540 | |
| 541 | Note, by the way, that ``.egg`` directories will also contain these |
| 542 | wrapper files. However, Python's default import priority is such that |
| 543 | C extensions take precedence over same-named Python modules, so the |
| 544 | import wrappers are ignored unless the egg is a zipfile. |
| 545 | |
| 546 | |
| 547 | Installation and Path Management Issues |
| 548 | ======================================= |
| 549 | |
| 550 | Python's initial setup of ``sys.path`` is very dependent on the Python |
| 551 | version and installation platform, as well as how Python was started |
| 552 | (i.e., script vs. ``-c`` vs. ``-m`` vs. interactive interpreter). |
| 553 | In fact, Python also provides only two relatively robust ways to affect |
| 554 | ``sys.path`` outside of direct manipulation in code: the ``PYTHONPATH`` |
| 555 | environment variable, and ``.pth`` files. |
| 556 | |
| 557 | However, with no cross-platform way to safely and persistently change |
| 558 | environment variables, this leaves ``.pth`` files as EasyInstall's only |
| 559 | real option for persistent configuration of ``sys.path``. |
| 560 | |
| 561 | But ``.pth`` files are rather strictly limited in what they are allowed |
| 562 | to do normally. They add directories only to the *end* of ``sys.path``, |
| 563 | after any locally-installed ``site-packages`` directory, and they are |
| 564 | only processed *in* the ``site-packages`` directory to start with. |
| 565 | |
| 566 | This is a double whammy for users who lack write access to that |
| 567 | directory, because they can't create a ``.pth`` file that Python will |
| 568 | read, and even if a sympathetic system administrator adds one for them |
| 569 | that calls ``site.addsitedir()`` to allow some other directory to |
| 570 | contain ``.pth`` files, they won't be able to install newer versions of |
| 571 | anything that's installed in the systemwide ``site-packages``, because |
| 572 | their paths will still be added *after* ``site-packages``. |
| 573 | |
| 574 | So EasyInstall applies two workarounds to solve these problems. |
| 575 | |
| 576 | The first is that EasyInstall leverages ``.pth`` files' "import" feature |
| 577 | to manipulate ``sys.path`` and ensure that anything EasyInstall adds |
| 578 | to a ``.pth`` file will always appear before both the standard library |
| 579 | and the local ``site-packages`` directories. Thus, it is always |
| 580 | possible for a user who can write a Python-read ``.pth`` file to ensure |
| 581 | that their packages come first in their own environment. |
| 582 | |
| 583 | Second, when installing to a ``PYTHONPATH`` directory (as opposed to |
| 584 | a "site" directory like ``site-packages``) EasyInstall will also install |
| 585 | a special version of the ``site`` module. Because it's in a |
| 586 | ``PYTHONPATH`` directory, this module will get control before the |
| 587 | standard library version of ``site`` does. It will record the state of |
| 588 | ``sys.path`` before invoking the "real" ``site`` module, and then |
| 589 | afterwards it processes any ``.pth`` files found in ``PYTHONPATH`` |
| 590 | directories, including all the fixups needed to ensure that eggs always |
| 591 | appear before the standard library in sys.path, but are in a relative |
| 592 | order to one another that is defined by their ``PYTHONPATH`` and |
| 593 | ``.pth``-prescribed sequence. |
| 594 | |
| 595 | The net result of these changes is that ``sys.path`` order will be |
| 596 | as follows at runtime: |
| 597 | |
| 598 | 1. The ``sys.argv[0]`` directory, or an empty string if no script |
| 599 | is being executed. |
| 600 | |
| 601 | 2. All eggs installed by EasyInstall in any ``.pth`` file in each |
| 602 | ``PYTHONPATH`` directory, in order first by ``PYTHONPATH`` order, |
| 603 | then normal ``.pth`` processing order (which is to say alphabetical |
| 604 | by ``.pth`` filename, then by the order of listing within each |
| 605 | ``.pth`` file). |
| 606 | |
| 607 | 3. All eggs installed by EasyInstall in any ``.pth`` file in each "site" |
| 608 | directory (such as ``site-packages``), following the same ordering |
| 609 | rules as for the ones on ``PYTHONPATH``. |
| 610 | |
| 611 | 4. The ``PYTHONPATH`` directories themselves, in their original order |
| 612 | |
| 613 | 5. Any paths from ``.pth`` files found on ``PYTHONPATH`` that were *not* |
| 614 | eggs installed by EasyInstall, again following the same relative |
| 615 | ordering rules. |
| 616 | |
| 617 | 6. The standard library and "site" directories, along with the contents |
| 618 | of any ``.pth`` files found in the "site" directories. |
| 619 | |
| 620 | Notice that sections 1, 4, and 6 comprise the "normal" Python setup for |
| 621 | ``sys.path``. Sections 2 and 3 are inserted to support eggs, and |
| 622 | section 5 emulates what the "normal" semantics of ``.pth`` files on |
| 623 | ``PYTHONPATH`` would be if Python natively supported them. |
| 624 | |
| 625 | For further discussion of the tradeoffs that went into this design, as |
| 626 | well as notes on the actual magic inserted into ``.pth`` files to make |
| 627 | them do these things, please see also the following messages to the |
| 628 | distutils-SIG mailing list: |
| 629 | |
| 630 | * http://mail.python.org/pipermail/distutils-sig/2006-February/006026.html |
| 631 | * http://mail.python.org/pipermail/distutils-sig/2006-March/006123.html |
| 632 | |
| 633 | |
| 634 | Script Wrappers |
| 635 | --------------- |
| 636 | |
| 637 | EasyInstall never directly installs a project's original scripts to |
| 638 | a script installation directory. Instead, it writes short wrapper |
| 639 | scripts that first ensure that the project's dependencies are active |
| 640 | on sys.path, before invoking the original script. These wrappers |
| 641 | have a #! line that points to the version of Python that was used to |
| 642 | install them, and their second line is always a comment that indicates |
| 643 | the type of script wrapper, the project version required for the script |
| 644 | to run, and information identifying the script to be invoked. |
| 645 | |
| 646 | The format of this marker line is:: |
| 647 | |
| 648 | "# EASY-INSTALL-" script_type ": " tuple_of_strings "\n" |
| 649 | |
| 650 | The ``script_type`` is one of ``SCRIPT``, ``DEV-SCRIPT``, or |
| 651 | ``ENTRY-SCRIPT``. The ``tuple_of_strings`` is a comma-separated |
| 652 | sequence of Python string constants. For ``SCRIPT`` and ``DEV-SCRIPT`` |
| 653 | wrappers, there are two strings: the project version requirement, and |
| 654 | the script name (as a filename within the ``scripts`` metadata |
| 655 | directory). For ``ENTRY-SCRIPT`` wrappers, there are three: |
| 656 | the project version requirement, the entry point group name, and the |
| 657 | entry point name. (See the "Automatic Script Creation" section in the |
| 658 | setuptools manual for more information about entry point scripts.) |
| 659 | |
| 660 | In each case, the project version requirement string will be a string |
| 661 | parseable with the ``pkg_resources`` modules' ``Requirement.parse()`` |
| 662 | classmethod. The only difference between a ``SCRIPT`` wrapper and a |
| 663 | ``DEV-SCRIPT`` is that a ``DEV-SCRIPT`` actually executes the original |
| 664 | source script in the project's source tree, and is created when the |
| 665 | "setup.py develop" command is run. A ``SCRIPT`` wrapper, on the other |
| 666 | hand, uses the "installed" script written to the ``EGG-INFO/scripts`` |
| 667 | subdirectory of the corresponding ``.egg`` zipfile or directory. |
| 668 | (``.egg-info`` eggs do not have script wrappers associated with them, |
| 669 | except in the "setup.py develop" case.) |
| 670 | |
| 671 | The purpose of including the marker line in generated script wrappers is |
| 672 | to facilitate introspection of installed scripts, and their relationship |
| 673 | to installed eggs. For example, an uninstallation tool could use this |
| 674 | data to identify what scripts can safely be removed, and/or identify |
| 675 | what scripts would stop working if a particular egg is uninstalled. |