| Tim Peters | 403a203 | 2003-11-20 21:21:46 +0000 | [diff] [blame] | 1 | Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs. | 
 | 2 | Segfaults in Zope3 resulted. | 
 | 3 |  | 
 | 4 | weakrefs in Python are designed to, at worst, let *other* objects learn | 
 | 5 | that a given object has died, via a callback function.  The weakly | 
 | 6 | referenced object itself is not passed to the callback, and the presumption | 
 | 7 | is that the weakly referenced object is unreachable trash at the time the | 
 | 8 | callback is invoked. | 
 | 9 |  | 
 | 10 | That's usually true, but not always.  Suppose a weakly referenced object | 
 | 11 | becomes part of a clump of cyclic trash.  When enough cycles are broken by | 
 | 12 | cyclic gc that the object is reclaimed, the callback is invoked.  If it's | 
 | 13 | possible for the callback to get at objects in the cycle(s), then it may be | 
 | 14 | possible for those objects to access (via strong references in the cycle) | 
 | 15 | the weakly referenced object being torn down, or other objects in the cycle | 
 | 16 | that have already suffered a tp_clear() call.  There's no guarantee that an | 
 | 17 | object is in a sane state after tp_clear().  Bad things (including | 
 | 18 | segfaults) can happen right then, during the callback's execution, or can | 
 | 19 | happen at any later time if the callback manages to resurrect an insane | 
 | 20 | object. | 
 | 21 |  | 
 | 22 | Note that if it's possible for the callback to get at objects in the trash | 
 | 23 | cycles, it must also be the case that the callback itself is part of the | 
 | 24 | trash cycles.  Else the callback would have acted as an external root to | 
 | 25 | the current collection, and nothing reachable from it would be in cyclic | 
 | 26 | trash either. | 
 | 27 |  | 
 | 28 | More, if the callback itself is in cyclic trash, then the weakref to which | 
 | 29 | the callback is attached must also be trash, and for the same kind of | 
 | 30 | reason:  if the weakref acted as an external root, then the callback could | 
 | 31 | not have been cyclic trash. | 
 | 32 |  | 
 | 33 | So a problem here requires that a weakref, that weakref's callback, and the | 
 | 34 | weakly referenced object, all be in cyclic trash at the same time.  This | 
 | 35 | isn't easy to stumble into by accident while Python is running, and, indeed, | 
 | 36 | it took quite a while to dream up failing test cases.  Zope3 saw segfaults | 
 | 37 | during shutdown, during the second call of gc in Py_Finalize, after most | 
 | 38 | modules had been torn down.  That creates many trash cycles (esp. those | 
 | 39 | involving new-style classes), making the problem much more likely.  Once you | 
 | 40 | know what's required to provoke the problem, though, it's easy to create | 
 | 41 | tests that segfault before shutdown. | 
 | 42 |  | 
 | 43 | In 2.3.3, before breaking cycles, we first clear all the weakrefs with | 
 | 44 | callbacks in cyclic trash.  Since the weakrefs *are* trash, and there's no | 
 | 45 | defined-- or even predictable --order in which tp_clear() gets called on | 
 | 46 | cyclic trash, it's defensible to first clear weakrefs with callbacks.  It's | 
 | 47 | a feature of Python's weakrefs too that when a weakref goes away, the | 
 | 48 | callback (if any) associated with it is thrown away too, unexecuted. | 
 | 49 |  | 
 | 50 | Just that much is almost enough to prevent problems, by throwing away | 
 | 51 | *almost* all the weakref callbacks that could get triggered by gc.  The | 
 | 52 | problem remaining is that clearing a weakref with a callback decrefs the | 
 | 53 | callback object, and the callback object may *itself* be weakly referenced, | 
 | 54 | via another weakref with another callback.  So the process of clearing | 
 | 55 | weakrefs can trigger callbacks attached to other weakrefs, and those | 
 | 56 | latter weakrefs may or may not be part of cyclic trash. | 
 | 57 |  | 
 | 58 | So, to prevent any Python code from running while gc is invoking tp_clear() | 
 | 59 | on all the objects in cyclic trash, it's not quite enough just to invoke | 
 | 60 | tp_clear() on weakrefs with callbacks first.  Instead the weakref module | 
 | 61 | grew a new private function (_PyWeakref_ClearRef) that does only part of | 
 | 62 | tp_clear():  it removes the weakref from the weakly-referenced object's list | 
 | 63 | of weakrefs, but does not decref the callback object.  So calling | 
 | 64 | _PyWeakref_ClearRef(wr) ensures that wr's callback object will never | 
 | 65 | trigger, and (unlike weakref's tp_clear()) also prevents any callback | 
 | 66 | associated *with* wr's callback object from triggering. | 
 | 67 |  | 
 | 68 | Then we can call tp_clear on all the cyclic objects and never trigger | 
 | 69 | Python code. | 
 | 70 |  | 
 | 71 | After we do that, the callback objects still need to be decref'ed.  Callbacks | 
 | 72 | (if any) *on* the callback objects that were also part of cyclic trash won't | 
 | 73 | get invoked, because we cleared all trash weakrefs with callbacks at the | 
 | 74 | start.  Callbacks on the callback objects that were not part of cyclic trash | 
 | 75 | acted as external roots to everything reachable from them, so nothing | 
 | 76 | reachable from them was part of cyclic trash, so gc didn't do any damage to | 
 | 77 | objects reachable from them, and it's safe to call them at the end of gc. | 
 | 78 |  | 
 | 79 | An alternative would have been to treat objects with callbacks like objects | 
 | 80 | with __del__ methods, refusing to collect them, appending them to gc.garbage | 
 | 81 | instead.  That would have been much easier.  Jim Fulton gave a strong | 
 | 82 | argument against that (on Python-Dev): | 
 | 83 |  | 
 | 84 |     There's a big difference between __del__ and weakref callbacks. | 
 | 85 |     The __del__ method is "internal" to a design.  When you design a | 
 | 86 |     class with a del method, you know you have to avoid including the | 
 | 87 |     class in cycles. | 
 | 88 |  | 
 | 89 |     Now, suppose you have a design that makes has no __del__ methods but | 
 | 90 |     that does use cyclic data structures.  You reason about the design, | 
 | 91 |     run tests, and convince yourself you don't have a leak. | 
 | 92 |  | 
 | 93 |     Now, suppose some external code creates a weakref to one of your | 
 | 94 |     objects.  All of a sudden, you start leaking.  You can look at your | 
 | 95 |     code all you want and you won't find a reason for the leak. | 
 | 96 |  | 
 | 97 | IOW, a class designer can out-think __del__ problems, but has no control | 
 | 98 | over who creates weakrefs to his classes or class instances.  The class | 
 | 99 | user has little chance either of predicting when the weakrefs he creates | 
 | 100 | may end up in cycles. | 
 | 101 |  | 
 | 102 | Callbacks on weakref callbacks are executed in an arbitrary order, and | 
 | 103 | that's not good (a primary reason not to collect cycles with objects with | 
 | 104 | __del__ methods is to avoid running finalizers in an arbitrary order). | 
 | 105 | However, a weakref callback on a weakref callback has got to be rare. | 
 | 106 | It's possible to do such a thing, so gc has to be robust against it, but | 
 | 107 | I doubt anyone has done it outside the test case I wrote for it. |