memoized: fix for bug where wrong cached results were returned

memoized() used id() to get "unique" representations for arguments
passed to a function, in order to ensure that results for the same
function called with different arguments are treated differently.

However, Python object identities are only guaranteed to be unique at a
particular point in time. It is possible than a particular ID gets
reused for a different object if the previous object associated with
that ID no longer exists. In particular, in CPython, the IDs are just
addresses of the corresponding PyObject's in memory. If a PyObject gets
garbage collected, another object may get allocated in its place, and
the new object will "inherit" the ID by virtue of being in the same
memory location. If the new object is then used to call a memoized
function that was previously called with the old object, the old cached
result will be incorrectly returned.

To get around this issue, the cache is now indexed not only by the ID of
an object but also but the first few bytes of its value.

NOTE: there is still a potential issue it two relatively large objects
gets allocated in the same place and happen to share the first few
(currently 32) bytes, and are then both used as parameters to the same
memoized function. The only way around that is to hash the entire value
of the object. However, given the performance penalty that would incur
for larger object, and the unlikeliness of this situation actually
arising, it is currently deemed not worth it.
1 file changed
tree: 9e170f24d958c0ad98b51cbd3493cc8bec56a098
  1. devlib/
  2. doc/
  3. src/
  4. .gitignore
  5. README.rst
  6. setup.py