| Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs. |
| Segfaults in Zope3 resulted. |
| |
| weakrefs in Python are designed to, at worst, let *other* objects learn |
| that a given object has died, via a callback function. The weakly |
| referenced object itself is not passed to the callback, and the presumption |
| is that the weakly referenced object is unreachable trash at the time the |
| callback is invoked. |
| |
| That's usually true, but not always. Suppose a weakly referenced object |
| becomes part of a clump of cyclic trash. When enough cycles are broken by |
| cyclic gc that the object is reclaimed, the callback is invoked. If it's |
| possible for the callback to get at objects in the cycle(s), then it may be |
| possible for those objects to access (via strong references in the cycle) |
| the weakly referenced object being torn down, or other objects in the cycle |
| that have already suffered a tp_clear() call. There's no guarantee that an |
| object is in a sane state after tp_clear(). Bad things (including |
| segfaults) can happen right then, during the callback's execution, or can |
| happen at any later time if the callback manages to resurrect an insane |
| object. |
| |
| Note that if it's possible for the callback to get at objects in the trash |
| cycles, it must also be the case that the callback itself is part of the |
| trash cycles. Else the callback would have acted as an external root to |
| the current collection, and nothing reachable from it would be in cyclic |
| trash either. |
| |
| More, if the callback itself is in cyclic trash, then the weakref to which |
| the callback is attached must also be trash, and for the same kind of |
| reason: if the weakref acted as an external root, then the callback could |
| not have been cyclic trash. |
| |
| So a problem here requires that a weakref, that weakref's callback, and the |
| weakly referenced object, all be in cyclic trash at the same time. This |
| isn't easy to stumble into by accident while Python is running, and, indeed, |
| it took quite a while to dream up failing test cases. Zope3 saw segfaults |
| during shutdown, during the second call of gc in Py_Finalize, after most |
| modules had been torn down. That creates many trash cycles (esp. those |
| involving new-style classes), making the problem much more likely. Once you |
| know what's required to provoke the problem, though, it's easy to create |
| tests that segfault before shutdown. |
| |
| In 2.3.3, before breaking cycles, we first clear all the weakrefs with |
| callbacks in cyclic trash. Since the weakrefs *are* trash, and there's no |
| defined-- or even predictable --order in which tp_clear() gets called on |
| cyclic trash, it's defensible to first clear weakrefs with callbacks. It's |
| a feature of Python's weakrefs too that when a weakref goes away, the |
| callback (if any) associated with it is thrown away too, unexecuted. |
| |
| Just that much is almost enough to prevent problems, by throwing away |
| *almost* all the weakref callbacks that could get triggered by gc. The |
| problem remaining is that clearing a weakref with a callback decrefs the |
| callback object, and the callback object may *itself* be weakly referenced, |
| via another weakref with another callback. So the process of clearing |
| weakrefs can trigger callbacks attached to other weakrefs, and those |
| latter weakrefs may or may not be part of cyclic trash. |
| |
| So, to prevent any Python code from running while gc is invoking tp_clear() |
| on all the objects in cyclic trash, it's not quite enough just to invoke |
| tp_clear() on weakrefs with callbacks first. Instead the weakref module |
| grew a new private function (_PyWeakref_ClearRef) that does only part of |
| tp_clear(): it removes the weakref from the weakly-referenced object's list |
| of weakrefs, but does not decref the callback object. So calling |
| _PyWeakref_ClearRef(wr) ensures that wr's callback object will never |
| trigger, and (unlike weakref's tp_clear()) also prevents any callback |
| associated *with* wr's callback object from triggering. |
| |
| Then we can call tp_clear on all the cyclic objects and never trigger |
| Python code. |
| |
| After we do that, the callback objects still need to be decref'ed. Callbacks |
| (if any) *on* the callback objects that were also part of cyclic trash won't |
| get invoked, because we cleared all trash weakrefs with callbacks at the |
| start. Callbacks on the callback objects that were not part of cyclic trash |
| acted as external roots to everything reachable from them, so nothing |
| reachable from them was part of cyclic trash, so gc didn't do any damage to |
| objects reachable from them, and it's safe to call them at the end of gc. |
| |
| An alternative would have been to treat objects with callbacks like objects |
| with __del__ methods, refusing to collect them, appending them to gc.garbage |
| instead. That would have been much easier. Jim Fulton gave a strong |
| argument against that (on Python-Dev): |
| |
| There's a big difference between __del__ and weakref callbacks. |
| The __del__ method is "internal" to a design. When you design a |
| class with a del method, you know you have to avoid including the |
| class in cycles. |
| |
| Now, suppose you have a design that makes has no __del__ methods but |
| that does use cyclic data structures. You reason about the design, |
| run tests, and convince yourself you don't have a leak. |
| |
| Now, suppose some external code creates a weakref to one of your |
| objects. All of a sudden, you start leaking. You can look at your |
| code all you want and you won't find a reason for the leak. |
| |
| IOW, a class designer can out-think __del__ problems, but has no control |
| over who creates weakrefs to his classes or class instances. The class |
| user has little chance either of predicting when the weakrefs he creates |
| may end up in cycles. |
| |
| Callbacks on weakref callbacks are executed in an arbitrary order, and |
| that's not good (a primary reason not to collect cycles with objects with |
| __del__ methods is to avoid running finalizers in an arbitrary order). |
| However, a weakref callback on a weakref callback has got to be rare. |
| It's possible to do such a thing, so gc has to be robust against it, but |
| I doubt anyone has done it outside the test case I wrote for it. |