Modules/gc_weakref.txt - platform/external/python/cpython3 - Gitiles

 Intro
 =====

 The basic rule for dealing with weakref callbacks (and __del__ methods too,
 for that matter) during cyclic gc:

     Once gc has computed the set of unreachable objects, no Python-level
     code can be allowed to access an unreachable object.

 If that can happen, then the Python code can resurrect unreachable objects
 too, and gc can't detect that without starting over.  Since gc eventually
 runs tp_clear on all unreachable objects, if an unreachable object is
 resurrected then tp_clear will eventually be called on it (or may already
 have been called before resurrection).  At best (and this has been an
 historically common bug), tp_clear empties an instance's __dict__, and
 "impossible" AttributeErrors result.  At worst, tp_clear leaves behind an
 insane object at the C level, and segfaults result (historically, most
 often by setting a class's mro pointer to NULL, after which attribute
 lookups performed by the class can segfault).

 OTOH, it's OK to run Python-level code that can't access unreachable
 objects, and sometimes that's necessary.  The chief example is the callback
 attached to a reachable weakref W to an unreachable object O.  Since O is
 going away, and W is still alive, the callback must be invoked.  Because W
 is still alive, everything reachable from its callback is also reachable,
 so it's also safe to invoke the callback (although that's trickier than it
 sounds, since other reachable weakrefs to other unreachable objects may
 still exist, and be accessible to the callback -- there are lots of painful
 details like this covered in the rest of this file).

 Python 2.4/2.3.5
 ================

 The "Before 2.3.3" section below turned out to be wrong in some ways, but
 I'm leaving it as-is because it's more right than wrong, and serves as a
 wonderful example of how painful analysis can miss not only the forest for
 the trees, but also miss the trees for the aphids sucking the trees
 dry <wink>.

 The primary thing it missed is that when a weakref to a piece of cyclic
 trash (CT) exists, then any call to any Python code whatsoever can end up
 materializing a strong reference to that weakref's CT referent, and so
 possibly resurrect an insane object (one for which cyclic gc has called-- or
 will call before it's done --tp_clear()).  It's not even necessarily that a
 weakref callback or __del__ method does something nasty on purpose:  as
 soon as we execute Python code, threads other than the gc thread can run
 too, and they can do ordinary things with weakrefs that end up resurrecting
 CT while gc is running.

     http://www.python.org/sf/1055820

 shows how innocent it can be, and also how nasty.  Variants of the three
 focussed test cases attached to that bug report are now part of Python's
 standard Lib/test/test_gc.py.

 Jim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5)
 approach:

     Clearing cyclic trash can call Python code.  If there are weakrefs to
     any of the cyclic trash, then those weakrefs can be used to resurrect
     the objects.  Therefore, *before* clearing cyclic trash, we need to
     remove any weakrefs.  If any of the weakrefs being removed have
     callbacks, then we need to save the callbacks and call them *after* all
     of the weakrefs have been cleared.

 Alas, doing just that much doesn't work, because it overlooks what turned
 out to be the much subtler problems that were fixed earlier, and described
 below.  We do clear all weakrefs to CT now before breaking cycles, but not
 all callbacks encountered can be run later.  That's explained in horrid
 detail below.

 Older text follows, with a some later comments in [] brackets:

 Before 2.3.3
 ============

 Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs.
 Segfaults in Zope3 resulted.

 weakrefs in Python are designed to, at worst, let *other* objects learn
 that a given object has died, via a callback function.  The weakly
 referenced object itself is not passed to the callback, and the presumption
 is that the weakly referenced object is unreachable trash at the time the
 callback is invoked.

 That's usually true, but not always.  Suppose a weakly referenced object
 becomes part of a clump of cyclic trash.  When enough cycles are broken by
 cyclic gc that the object is reclaimed, the callback is invoked.  If it's
 possible for the callback to get at objects in the cycle(s), then it may be
 possible for those objects to access (via strong references in the cycle)
 the weakly referenced object being torn down, or other objects in the cycle
 that have already suffered a tp_clear() call.  There's no guarantee that an
 object is in a sane state after tp_clear().  Bad things (including
 segfaults) can happen right then, during the callback's execution, or can
 happen at any later time if the callback manages to resurrect an insane
 object.

 [That missed that, in addition, a weakref to CT can exist outside CT, and
  any callback into Python can use such a non-CT weakref to resurrect its CT
  referent.  The same bad kinds of things can happen then.]

 Note that if it's possible for the callback to get at objects in the trash
 cycles, it must also be the case that the callback itself is part of the
 trash cycles.  Else the callback would have acted as an external root to
 the current collection, and nothing reachable from it would be in cyclic
 trash either.

 [Except that a non-CT callback can also use a non-CT weakref to get at
  CT objects.]

 More, if the callback itself is in cyclic trash, then the weakref to which
 the callback is attached must also be trash, and for the same kind of
 reason:  if the weakref acted as an external root, then the callback could
 not have been cyclic trash.

 So a problem here requires that a weakref, that weakref's callback, and the
 weakly referenced object, all be in cyclic trash at the same time.  This
 isn't easy to stumble into by accident while Python is running, and, indeed,
 it took quite a while to dream up failing test cases.  Zope3 saw segfaults
 during shutdown, during the second call of gc in Py_Finalize, after most
 modules had been torn down.  That creates many trash cycles (esp. those
 involving classes), making the problem much more likely.  Once you
 know what's required to provoke the problem, though, it's easy to create
 tests that segfault before shutdown.

 In 2.3.3, before breaking cycles, we first clear all the weakrefs with
 callbacks in cyclic trash.  Since the weakrefs *are* trash, and there's no
 defined-- or even predictable --order in which tp_clear() gets called on
 cyclic trash, it's defensible to first clear weakrefs with callbacks.  It's
 a feature of Python's weakrefs too that when a weakref goes away, the
 callback (if any) associated with it is thrown away too, unexecuted.

 [In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not
  those weakrefs are themselves CT, and whether or not they have callbacks.
  The callbacks (if any) on non-CT weakrefs (if any) are invoked later,
  after all weakrefs-to-CT have been cleared.  The callbacks (if any) on CT
  weakrefs (if any) are never invoked, for the excruciating reasons
  explained here.]

 Just that much is almost enough to prevent problems, by throwing away
 *almost* all the weakref callbacks that could get triggered by gc.  The
 problem remaining is that clearing a weakref with a callback decrefs the
 callback object, and the callback object may *itself* be weakly referenced,
 via another weakref with another callback.  So the process of clearing
 weakrefs can trigger callbacks attached to other weakrefs, and those
 latter weakrefs may or may not be part of cyclic trash.

 So, to prevent any Python code from running while gc is invoking tp_clear()
 on all the objects in cyclic trash,

 [That was always wrong:  we can't stop Python code from running when gc
  is breaking cycles.  If an object with a __del__ method is not itself in
  a cycle, but is reachable only from CT, then breaking cycles will, as a
  matter of course, drop the refcount on that object to 0, and its __del__
  will run right then.  What we can and must stop is running any Python
  code that could access CT.]
                                      it's not quite enough just to invoke
 tp_clear() on weakrefs with callbacks first.  Instead the weakref module
 grew a new private function (_PyWeakref_ClearRef) that does only part of
 tp_clear():  it removes the weakref from the weakly-referenced object's list
 of weakrefs, but does not decref the callback object.  So calling
 _PyWeakref_ClearRef(wr) ensures that wr's callback object will never
 trigger, and (unlike weakref's tp_clear()) also prevents any callback
 associated *with* wr's callback object from triggering.

 [Although we may trigger such callbacks later, as explained below.]

 Then we can call tp_clear on all the cyclic objects and never trigger
 Python code.

 [As above, not so:  it means never trigger Python code that can access CT.]

 After we do that, the callback objects still need to be decref'ed.  Callbacks
 (if any) *on* the callback objects that were also part of cyclic trash won't
 get invoked, because we cleared all trash weakrefs with callbacks at the
 start.  Callbacks on the callback objects that were not part of cyclic trash
 acted as external roots to everything reachable from them, so nothing
 reachable from them was part of cyclic trash, so gc didn't do any damage to
 objects reachable from them, and it's safe to call them at the end of gc.

 [That's so.  In addition, now we also invoke (if any) the callbacks on
  non-CT weakrefs to CT objects, during the same pass that decrefs the
  callback objects.]

 An alternative would have been to treat objects with callbacks like objects
 with __del__ methods, refusing to collect them, appending them to gc.garbage
 instead.  That would have been much easier.  Jim Fulton gave a strong
 argument against that (on Python-Dev):

     There's a big difference between __del__ and weakref callbacks.
     The __del__ method is "internal" to a design.  When you design a
     class with a del method, you know you have to avoid including the
     class in cycles.

     Now, suppose you have a design that makes has no __del__ methods but
     that does use cyclic data structures.  You reason about the design,
     run tests, and convince yourself you don't have a leak.

     Now, suppose some external code creates a weakref to one of your
     objects.  All of a sudden, you start leaking.  You can look at your
     code all you want and you won't find a reason for the leak.

 IOW, a class designer can out-think __del__ problems, but has no control
 over who creates weakrefs to his classes or class instances.  The class
 user has little chance either of predicting when the weakrefs he creates
 may end up in cycles.

 Callbacks on weakref callbacks are executed in an arbitrary order, and
 that's not good (a primary reason not to collect cycles with objects with
 __del__ methods is to avoid running finalizers in an arbitrary order).
 However, a weakref callback on a weakref callback has got to be rare.
 It's possible to do such a thing, so gc has to be robust against it, but
 I doubt anyone has done it outside the test case I wrote for it.

 [The callbacks (if any) on non-CT weakrefs to CT objects are also executed
  in an arbitrary order now.  But they were before too, depending on the
  vagaries of when tp_clear() happened to break enough cycles to trigger
  them.  People simply shouldn't try to use __del__ or weakref callbacks to
  do fancy stuff.]
	Intro
	=====

	The basic rule for dealing with weakref callbacks (and __del__ methods too,
	for that matter) during cyclic gc:

	Once gc has computed the set of unreachable objects, no Python-level
	code can be allowed to access an unreachable object.

	If that can happen, then the Python code can resurrect unreachable objects
	too, and gc can't detect that without starting over. Since gc eventually
	runs tp_clear on all unreachable objects, if an unreachable object is
	resurrected then tp_clear will eventually be called on it (or may already
	have been called before resurrection). At best (and this has been an
	historically common bug), tp_clear empties an instance's __dict__, and
	"impossible" AttributeErrors result. At worst, tp_clear leaves behind an
	insane object at the C level, and segfaults result (historically, most
	often by setting a class's mro pointer to NULL, after which attribute
	lookups performed by the class can segfault).

	OTOH, it's OK to run Python-level code that can't access unreachable
	objects, and sometimes that's necessary. The chief example is the callback
	attached to a reachable weakref W to an unreachable object O. Since O is
	going away, and W is still alive, the callback must be invoked. Because W
	is still alive, everything reachable from its callback is also reachable,
	so it's also safe to invoke the callback (although that's trickier than it
	sounds, since other reachable weakrefs to other unreachable objects may
	still exist, and be accessible to the callback -- there are lots of painful
	details like this covered in the rest of this file).

	Python 2.4/2.3.5
	================

	The "Before 2.3.3" section below turned out to be wrong in some ways, but
	I'm leaving it as-is because it's more right than wrong, and serves as a
	wonderful example of how painful analysis can miss not only the forest for
	the trees, but also miss the trees for the aphids sucking the trees
	dry <wink>.

	The primary thing it missed is that when a weakref to a piece of cyclic
	trash (CT) exists, then any call to any Python code whatsoever can end up
	materializing a strong reference to that weakref's CT referent, and so
	possibly resurrect an insane object (one for which cyclic gc has called-- or
	will call before it's done --tp_clear()). It's not even necessarily that a
	weakref callback or __del__ method does something nasty on purpose: as
	soon as we execute Python code, threads other than the gc thread can run
	too, and they can do ordinary things with weakrefs that end up resurrecting
	CT while gc is running.

	http://www.python.org/sf/1055820

	shows how innocent it can be, and also how nasty. Variants of the three
	focussed test cases attached to that bug report are now part of Python's
	standard Lib/test/test_gc.py.

	Jim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5)
	approach:

	Clearing cyclic trash can call Python code. If there are weakrefs to
	any of the cyclic trash, then those weakrefs can be used to resurrect
	the objects. Therefore, before clearing cyclic trash, we need to
	remove any weakrefs. If any of the weakrefs being removed have
	callbacks, then we need to save the callbacks and call them after all
	of the weakrefs have been cleared.

	Alas, doing just that much doesn't work, because it overlooks what turned
	out to be the much subtler problems that were fixed earlier, and described
	below. We do clear all weakrefs to CT now before breaking cycles, but not
	all callbacks encountered can be run later. That's explained in horrid
	detail below.

	Older text follows, with a some later comments in [] brackets:

	Before 2.3.3
	============

	Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs.
	Segfaults in Zope3 resulted.

	weakrefs in Python are designed to, at worst, let other objects learn
	that a given object has died, via a callback function. The weakly
	referenced object itself is not passed to the callback, and the presumption
	is that the weakly referenced object is unreachable trash at the time the
	callback is invoked.

	That's usually true, but not always. Suppose a weakly referenced object
	becomes part of a clump of cyclic trash. When enough cycles are broken by
	cyclic gc that the object is reclaimed, the callback is invoked. If it's
	possible for the callback to get at objects in the cycle(s), then it may be
	possible for those objects to access (via strong references in the cycle)
	the weakly referenced object being torn down, or other objects in the cycle
	that have already suffered a tp_clear() call. There's no guarantee that an
	object is in a sane state after tp_clear(). Bad things (including
	segfaults) can happen right then, during the callback's execution, or can
	happen at any later time if the callback manages to resurrect an insane
	object.

	[That missed that, in addition, a weakref to CT can exist outside CT, and
	any callback into Python can use such a non-CT weakref to resurrect its CT
	referent. The same bad kinds of things can happen then.]

	Note that if it's possible for the callback to get at objects in the trash
	cycles, it must also be the case that the callback itself is part of the
	trash cycles. Else the callback would have acted as an external root to
	the current collection, and nothing reachable from it would be in cyclic
	trash either.

	[Except that a non-CT callback can also use a non-CT weakref to get at
	CT objects.]

	More, if the callback itself is in cyclic trash, then the weakref to which
	the callback is attached must also be trash, and for the same kind of
	reason: if the weakref acted as an external root, then the callback could
	not have been cyclic trash.

	So a problem here requires that a weakref, that weakref's callback, and the
	weakly referenced object, all be in cyclic trash at the same time. This
	isn't easy to stumble into by accident while Python is running, and, indeed,
	it took quite a while to dream up failing test cases. Zope3 saw segfaults
	during shutdown, during the second call of gc in Py_Finalize, after most
	modules had been torn down. That creates many trash cycles (esp. those
	involving classes), making the problem much more likely. Once you
	know what's required to provoke the problem, though, it's easy to create
	tests that segfault before shutdown.

	In 2.3.3, before breaking cycles, we first clear all the weakrefs with
	callbacks in cyclic trash. Since the weakrefs are trash, and there's no
	defined-- or even predictable --order in which tp_clear() gets called on
	cyclic trash, it's defensible to first clear weakrefs with callbacks. It's
	a feature of Python's weakrefs too that when a weakref goes away, the
	callback (if any) associated with it is thrown away too, unexecuted.

	[In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not
	those weakrefs are themselves CT, and whether or not they have callbacks.
	The callbacks (if any) on non-CT weakrefs (if any) are invoked later,
	after all weakrefs-to-CT have been cleared. The callbacks (if any) on CT
	weakrefs (if any) are never invoked, for the excruciating reasons
	explained here.]

	Just that much is almost enough to prevent problems, by throwing away
	almost all the weakref callbacks that could get triggered by gc. The
	problem remaining is that clearing a weakref with a callback decrefs the
	callback object, and the callback object may itself be weakly referenced,
	via another weakref with another callback. So the process of clearing
	weakrefs can trigger callbacks attached to other weakrefs, and those
	latter weakrefs may or may not be part of cyclic trash.

	So, to prevent any Python code from running while gc is invoking tp_clear()
	on all the objects in cyclic trash,

	[That was always wrong: we can't stop Python code from running when gc
	is breaking cycles. If an object with a __del__ method is not itself in
	a cycle, but is reachable only from CT, then breaking cycles will, as a
	matter of course, drop the refcount on that object to 0, and its __del__
	will run right then. What we can and must stop is running any Python
	code that could access CT.]
	it's not quite enough just to invoke
	tp_clear() on weakrefs with callbacks first. Instead the weakref module
	grew a new private function (_PyWeakref_ClearRef) that does only part of
	tp_clear(): it removes the weakref from the weakly-referenced object's list
	of weakrefs, but does not decref the callback object. So calling
	_PyWeakref_ClearRef(wr) ensures that wr's callback object will never
	trigger, and (unlike weakref's tp_clear()) also prevents any callback
	associated with wr's callback object from triggering.

	[Although we may trigger such callbacks later, as explained below.]

	Then we can call tp_clear on all the cyclic objects and never trigger
	Python code.

	[As above, not so: it means never trigger Python code that can access CT.]

	After we do that, the callback objects still need to be decref'ed. Callbacks
	(if any) on the callback objects that were also part of cyclic trash won't
	get invoked, because we cleared all trash weakrefs with callbacks at the
	start. Callbacks on the callback objects that were not part of cyclic trash
	acted as external roots to everything reachable from them, so nothing
	reachable from them was part of cyclic trash, so gc didn't do any damage to
	objects reachable from them, and it's safe to call them at the end of gc.

	[That's so. In addition, now we also invoke (if any) the callbacks on
	non-CT weakrefs to CT objects, during the same pass that decrefs the
	callback objects.]

	An alternative would have been to treat objects with callbacks like objects
	with __del__ methods, refusing to collect them, appending them to gc.garbage
	instead. That would have been much easier. Jim Fulton gave a strong
	argument against that (on Python-Dev):

	There's a big difference between __del__ and weakref callbacks.
	The __del__ method is "internal" to a design. When you design a
	class with a del method, you know you have to avoid including the
	class in cycles.

	Now, suppose you have a design that makes has no __del__ methods but
	that does use cyclic data structures. You reason about the design,
	run tests, and convince yourself you don't have a leak.

	Now, suppose some external code creates a weakref to one of your
	objects. All of a sudden, you start leaking. You can look at your
	code all you want and you won't find a reason for the leak.

	IOW, a class designer can out-think __del__ problems, but has no control
	over who creates weakrefs to his classes or class instances. The class
	user has little chance either of predicting when the weakrefs he creates
	may end up in cycles.

	Callbacks on weakref callbacks are executed in an arbitrary order, and
	that's not good (a primary reason not to collect cycles with objects with
	__del__ methods is to avoid running finalizers in an arbitrary order).
	However, a weakref callback on a weakref callback has got to be rare.
	It's possible to do such a thing, so gc has to be robust against it, but
	I doubt anyone has done it outside the test case I wrote for it.

	[The callbacks (if any) on non-CT weakrefs to CT objects are also executed
	in an arbitrary order now. But they were before too, depending on the
	vagaries of when tp_clear() happened to break enough cycles to trigger
	them. People simply shouldn't try to use __del__ or weakref callbacks to
	do fancy stuff.]