closes bpo-31650: PEP 552 (Deterministic pycs) implementation (#4575)
Python now supports checking bytecode cache up-to-dateness with a hash of the
source contents rather than volatile source metadata. See the PEP for details.
While a fairly straightforward idea, quite a lot of code had to be modified due
to the pervasiveness of pyc implementation details in the codebase. Changes in
this commit include:
- The core changes to importlib to understand how to read, validate, and
regenerate hash-based pycs.
- Support for generating hash-based pycs in py_compile and compileall.
- Modifications to our siphash implementation to support passing a custom
key. We then expose it to importlib through _imp.
- Updates to all places in the interpreter, standard library, and tests that
manually generate or parse pyc files to grok the new format.
- Support in the interpreter command line code for long options like
--check-hash-based-pycs.
- Tests and documentation for all of the above.
diff --git a/Doc/reference/import.rst b/Doc/reference/import.rst
index 881e0ae..45d4172 100644
--- a/Doc/reference/import.rst
+++ b/Doc/reference/import.rst
@@ -675,6 +675,33 @@
:meth:`~importlib.abc.Loader.module_repr` method, if defined, before
trying either approach described above. However, the method is deprecated.
+.. _pyc-invalidation:
+
+Cached bytecode invalidation
+----------------------------
+
+Before Python loads cached bytecode from ``.pyc`` file, it checks whether the
+cache is up-to-date with the source ``.py`` file. By default, Python does this
+by storing the source's last-modified timestamp and size in the cache file when
+writing it. At runtime, the import system then validates the cache file by
+checking the stored metadata in the cache file against at source's
+metadata.
+
+Python also supports "hash-based" cache files, which store a hash of the source
+file's contents rather than its metadata. There are two variants of hash-based
+``.pyc`` files: checked and unchecked. For checked hash-based ``.pyc`` files,
+Python validates the cache file by hashing the source file and comparing the
+resulting hash with the hash in the cache file. If a checked hash-based cache
+file is found to be invalid, Python regenerates it and writes a new checked
+hash-based cache file. For unchecked hash-based ``.pyc`` files, Python simply
+assumes the cache file is valid if it exists. Hash-based ``.pyc`` files
+validation behavior may be overridden with the :option:`--check-hash-based-pycs`
+flag.
+
+.. versionchanged:: 3.7
+ Added hash-based ``.pyc`` files. Previously, Python only supported
+ timestamp-based invalidation of bytecode caches.
+
The Path Based Finder
=====================