IB/mthca: Fix race in reference counting
Fix races in in destroying various objects. If a destroy routine
waits for an object to become free by doing
wait_event(&obj->wait, !atomic_read(&obj->refcount));
/* now clean up and destroy the object */
and another place drops a reference to the object by doing
if (atomic_dec_and_test(&obj->refcount))
wake_up(&obj->wait);
then this is susceptible to a race where the wait_event() and final
freeing of the object occur between the atomic_dec_and_test() and the
wake_up(). And this is a use-after-free, since wake_up() will be
called on part of the already-freed object.
Fix this in mthca by replacing the atomic_t refcounts with plain old
integers protected by a spinlock. This makes it possible to do the
decrement of the reference count and the wake_up() so that it appears
as a single atomic operation to the code waiting on the wait queue.
While touching this code, also simplify mthca_cq_clean(): the CQ being
cleaned cannot go away, because it still has a QP attached to it. So
there's no reason to be paranoid and look up the CQ by number; it's
perfectly safe to use the pointer that the callers already have.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.h b/drivers/infiniband/hw/mthca/mthca_provider.h
index 6676a78..179a8f6 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.h
+++ b/drivers/infiniband/hw/mthca/mthca_provider.h
@@ -139,11 +139,12 @@
* a qp may be locked, with the send cq locked first. No other
* nesting should be done.
*
- * Each struct mthca_cq/qp also has an atomic_t ref count. The
- * pointer from the cq/qp_table to the struct counts as one reference.
- * This reference also is good for access through the consumer API, so
- * modifying the CQ/QP etc doesn't need to take another reference.
- * Access because of a completion being polled does need a reference.
+ * Each struct mthca_cq/qp also has an ref count, protected by the
+ * corresponding table lock. The pointer from the cq/qp_table to the
+ * struct counts as one reference. This reference also is good for
+ * access through the consumer API, so modifying the CQ/QP etc doesn't
+ * need to take another reference. Access to a QP because of a
+ * completion being polled does not need a reference either.
*
* Finally, each struct mthca_cq/qp has a wait_queue_head_t for the
* destroy function to sleep on.
@@ -159,8 +160,9 @@
* - decrement ref count; if zero, wake up waiters
*
* To destroy a CQ/QP, we can do the following:
- * - lock cq/qp_table, remove pointer, unlock cq/qp_table lock
- * - decrement ref count
+ * - lock cq/qp_table
+ * - remove pointer and decrement ref count
+ * - unlock cq/qp_table lock
* - wait_event until ref count is zero
*
* It is the consumer's responsibilty to make sure that no QP
@@ -197,7 +199,7 @@
struct mthca_cq {
struct ib_cq ibcq;
spinlock_t lock;
- atomic_t refcount;
+ int refcount;
int cqn;
u32 cons_index;
struct mthca_cq_buf buf;
@@ -217,7 +219,7 @@
struct mthca_srq {
struct ib_srq ibsrq;
spinlock_t lock;
- atomic_t refcount;
+ int refcount;
int srqn;
int max;
int max_gs;
@@ -254,7 +256,7 @@
struct mthca_qp {
struct ib_qp ibqp;
- atomic_t refcount;
+ int refcount;
u32 qpn;
int is_direct;
u8 port; /* for SQP and memfree use only */