A modest speedup of object deallocation.  call_finalizer() did rather
a lot of work: it had to save and restore the current exception around
a call to lookup_maybe(), because that could fail in rare cases, and
most objects don't have a __del__ method, so the whole exercise was
usually a waste of time.  Changed this to cache the __del__ method in
the type object just like all other special methods, in a new slot
tp_del.  So now subtype_dealloc() can test whether tp_del is NULL and
skip the whole exercise if it is.  The new slot doesn't need a new
flag bit: subtype_dealloc() is only called if the type was dynamically
allocated by type_new(), so it's guaranteed to have all current slots.
Types defined in C cannot fill in tp_del with a function of their own,
so there's no corresponding "wrapper".  (That functionality is already
available through tp_dealloc.)
diff --git a/Include/object.h b/Include/object.h
index 19460fe..d045be1 100644
--- a/Include/object.h
+++ b/Include/object.h
@@ -315,6 +315,7 @@
 	PyObject *tp_cache;
 	PyObject *tp_subclasses;
 	PyObject *tp_weaklist;
+	destructor tp_del;
 
 #ifdef COUNT_ALLOCS
 	/* these must be last and never explicitly initialized */